camspiers / statistical-classifier
A PHP implementation of Complement Naive Bayes and SVM statistical classifiers, including a structure for building other classifier, multiple data sources and multiple caching backends
Installs: 36 966
Dependents: 1
Suggesters: 0
Security: 0
Stars: 175
Watchers: 22
Forks: 26
Open Issues: 6
Requires
- php: >=5.3.3
- symfony/config: ~2.2
- symfony/options-resolver: ~2.2
Requires (Dev)
- maximebf/cachecache: ~1.0
- mikey179/vfsstream: ~1.2
- phpunit/phpunit: ~3.7
Suggests
- camspiers/porter-stemmer: Using a stemmer can help with language based classification
- maximebf/cachecache: Using caching will help improve performance on large datasets
README
PHP Classifier uses semantic versioning, it is currently at major version 0, so the public API should not be considered stable.
What is it?
PHP Classifier is a text classification library with a focus on reuse, customizability and performance. Classifiers can be used for many purposes, but are particularly useful in detecting spam.
Features
- Complement Naive Bayes Classifier
- SVM (libsvm) Classifier
- Highly customizable (easily modify or build your own classifier)
- Command-line interface via separate library (phar archive)
- Multiple data import types to get your data into the classifier (Directory of files, Database queries, Json, Serialized arrays)
- Multiple types of model caching
- Compatible with HipHop VM
Installation
$ composer require camspiers/statistical-classifier
SVM Support
For SVM Support both libsvm and php-svm are required. For installation intructions refer to php-svm.
Usage
Non-cached Naive Bayes
use Camspiers\StatisticalClassifier\Classifier\ComplementNaiveBayes; use Camspiers\StatisticalClassifier\DataSource\DataArray; $source = new DataArray(); $source->addDocument('spam', 'Some spam document'); $source->addDocument('spam', 'Another spam document'); $source->addDocument('ham', 'Some ham document'); $source->addDocument('ham', 'Another ham document'); $classifier = new ComplementNaiveBayes($source); $classifier->is('ham', 'Some ham document'); // bool(true) $classifier->classify('Some ham document'); // string "ham"
Non-cached SVM
use Camspiers\StatisticalClassifier\Classifier\SVM; use Camspiers\StatisticalClassifier\DataSource\DataArray; $source = new DataArray() $source->addDocument('spam', 'Some spam document'); $source->addDocument('spam', 'Another spam document'); $source->addDocument('ham', 'Some ham document'); $source->addDocument('ham', 'Another ham document'); $classifier = new SVM($source); $classifier->is('ham', 'Some ham document'); // bool(true) $classifier->classify('Some ham document'); // string "ham"
Caching models
Caching models requires maximebf/CacheCache which can be installed via packagist. Additional caching systems can be easily integrated.
Cached Naive Bayes
use Camspiers\StatisticalClassifier\Classifier\ComplementNaiveBayes; use Camspiers\StatisticalClassifier\Model\CachedModel; use Camspiers\StatisticalClassifier\DataSource\DataArray; $source = new DataArray(); $source->addDocument('spam', 'Some spam document'); $source->addDocument('spam', 'Another spam document'); $source->addDocument('ham', 'Some ham document'); $source->addDocument('ham', 'Another ham document'); $model = new CachedModel( 'mycachename', new CacheCache\Cache( new CacheCache\Backends\File( array( 'dir' => __DIR__ ) ) ) ); $classifier = new ComplementNaiveBayes($source, $model); $classifier->is('ham', 'Some ham document'); // bool(true) $classifier->classify('Some ham document'); // string "ham"
Cached SVM
use Camspiers\StatisticalClassifier\Classifier\SVM; use Camspiers\StatisticalClassifier\Model\SVMCachedModel; use Camspiers\StatisticalClassifier\DataSource\DataArray; $source = new DataArray(); $source->addDocument('spam', 'Some spam document'); $source->addDocument('spam', 'Another spam document'); $source->addDocument('ham', 'Some ham document'); $source->addDocument('ham', 'Another ham document'); $model = new Model\SVMCachedModel( __DIR__ . '/model.svm', new CacheCache\Cache( new CacheCache\Backends\File( array( 'dir' => __DIR__ ) ) ) ); $classifier = new SVM($source, $model); $classifier->is('ham', 'Some ham document'); // bool(true) $classifier->classify('Some ham document'); // string "ham"
Unit testing
statistical-classifier/ $ composer install --dev
statistical-classifier/ $ phpunit