camspiers/statistical-classifier

A PHP implementation of Complement Naive Bayes and SVM statistical classifiers, including a structure for building other classifier, multiple data sources and multiple caching backends

0.8.0 2014-01-05 22:29 UTC

This package is auto-updated.

Last update: 2024-03-19 12:48:55 UTC


README

Build Status Latest Stable Version

PHP Classifier uses semantic versioning, it is currently at major version 0, so the public API should not be considered stable.

What is it?

PHP Classifier is a text classification library with a focus on reuse, customizability and performance. Classifiers can be used for many purposes, but are particularly useful in detecting spam.

Features

  • Complement Naive Bayes Classifier
  • SVM (libsvm) Classifier
  • Highly customizable (easily modify or build your own classifier)
  • Command-line interface via separate library (phar archive)
  • Multiple data import types to get your data into the classifier (Directory of files, Database queries, Json, Serialized arrays)
  • Multiple types of model caching
  • Compatible with HipHop VM

Installation

$ composer require camspiers/statistical-classifier

SVM Support

For SVM Support both libsvm and php-svm are required. For installation intructions refer to php-svm.

Usage

Non-cached Naive Bayes

use Camspiers\StatisticalClassifier\Classifier\ComplementNaiveBayes;
use Camspiers\StatisticalClassifier\DataSource\DataArray;

$source = new DataArray();
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');

$classifier = new ComplementNaiveBayes($source);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"

Non-cached SVM

use Camspiers\StatisticalClassifier\Classifier\SVM;
use Camspiers\StatisticalClassifier\DataSource\DataArray;

$source = new DataArray()
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');

$classifier = new SVM($source);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"

Caching models

Caching models requires maximebf/CacheCache which can be installed via packagist. Additional caching systems can be easily integrated.

Cached Naive Bayes

use Camspiers\StatisticalClassifier\Classifier\ComplementNaiveBayes;
use Camspiers\StatisticalClassifier\Model\CachedModel;
use Camspiers\StatisticalClassifier\DataSource\DataArray;

$source = new DataArray();
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');

$model = new CachedModel(
	'mycachename',
	new CacheCache\Cache(
		new CacheCache\Backends\File(
			array(
				'dir' => __DIR__
			)
		)
	)
);

$classifier = new ComplementNaiveBayes($source, $model);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"

Cached SVM

use Camspiers\StatisticalClassifier\Classifier\SVM;
use Camspiers\StatisticalClassifier\Model\SVMCachedModel;
use Camspiers\StatisticalClassifier\DataSource\DataArray;

$source = new DataArray();
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');

$model = new Model\SVMCachedModel(
	__DIR__ . '/model.svm',
	new CacheCache\Cache(
		new CacheCache\Backends\File(
			array(
				'dir' => __DIR__
			)
		)
	)
);

$classifier = new SVM($source, $model);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"

Unit testing

statistical-classifier/ $ composer install --dev
statistical-classifier/ $ phpunit