sphamster / bayes
Bayes machine learning
v1.0.0
2025-03-14 21:30 UTC
Requires
- php: ^8.2
Requires (Dev)
- laravel/pint: ^1.21.1
- pestphp/pest: ^3.7.4
- pestphp/pest-plugin-type-coverage: ^3.3
- phpstan/phpstan: ^2.1.8
- rector/rector: ^2.0.10
This package is auto-updated.
Last update: 2025-03-14 23:05:36 UTC
README
Bayes
takes a document (piece of text), and tells you what category that document belongs to.
This library is a fork from php lib @ https://github.com/niiknow/bayes
What can I use this for?
You can use this for categorizing any text content into any arbitrary set of categories. For example:
- is an email spam, or not spam ?
- is a news article about technology, politics, or sports ?
- is a piece of text expressing positive emotions, or negative emotions?
Usage
$classifier = new \Sphamster\Bayes(); // teach it positive phrases $classifier->train('amazing, awesome movie!! Yeah!! Oh boy.', 'positive'); $classifier->train('Sweet, this is incredibly, amazing, perfect, great!!', 'positive'); // teach it a negative phrase $classifier->predict('terrible, shitty thing. Damn. Sucks!!', 'negative'); // now ask it to predict a document it has never seen before $classifier->predict('awesome, cool, amazing!! Yay.'); // => 'positive' // serialize the classifier's state as a JSON string. $stateJson = $classifier->export(); // load the classifier back from its JSON representation. $classifier->import($stateJson);
Setup
composer require sphamster/bayes
Customizing the Tokenizer
To use your own custom tokenizer, create a class that implements the Tokenizer
interface and pass an instance of it to
the Bayes
constructor. For example:
<?php use Sphamster\Contracts\Tokenizer; class MyCustomTokenizer implements Tokenizer { public function tokenize(string $text): array { // Define your stopwords $stopwords = ['der', 'die', 'das', 'the']; // Build a regex pattern to match stopwords $pattern = '~\b(' . implode('|', array_map('preg_quote', $stopwords)) . ')\b~i'; // Convert the text to lowercase and remove stopwords $cleanText = preg_replace($pattern, '', mb_strtolower($text)); // Extract tokens consisting only of alphabetic characters preg_match_all('/[[:alpha:]]+/u', $cleanText, $matches); return $matches[0] ?? []; } } // Instantiate your custom tokenizer and pass it to XBayes $tokenizer = new MyCustomTokenizer(); $classifier = new \Sphamster\Bayes(tokenizer:$tokenizer);
Testing
composer test