kudashevs / rake-php
A PHP implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm.
Requires
- php: ^8.1
Requires (Dev)
- phpunit/phpunit: ^10.1|^11.0
README
A PHP implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm for extracting relevant keywords from individual documents.
Installation
You can install the package via composer:
composer require kudashevs/rake-php
Example
Here is a common usage example:
use Kudashevs\RakePhp\Rake; $text = "Compatibility of systems of linear constraints over the set of natural numbers."; $text .= "Criteria of compatibility of a system of linear Diophantine equations, strict inequations, and nonstrict inequations are considered."; $text .= "Upper bounds for components of a minimal set of solutions and algorithms of construction of minimal generating sets of solutions for all types of systems are given."; $text .= "These criteria and the corresponding algorithms for constructing a minimal supporting set of solutions can be used in solving all the considered types of systems and systems of mixed types"; $rake = new Rake(); $keywords = $rake->extract($text); print_r($keywords); // will result in Array ( [minimal generating sets] => 8.6666666666667 [linear diophantine equations] => 8.5 [minimal supporting set] => 7.6666666666667 [minimal set] => 4.6666666666667 [linear constraints] => 4.5 [natural numbers] => 4 [strict inequations] => 4 [nonstrict inequations] => 4 [upper bounds] => 4 [mixed types] => 3.6666666666667 [considered types] => 3.1666666666667 [set] => 2 [types] => 1.6666666666667 [considered] => 1.5 [compatibility] => 1 [systems] => 1 [criteria] => 1 [system] => 1 [components] => 1 [solutions] => 1 [algorithms] => 1 [construction] => 1 [constructing] => 1 [solving] => 1 )
More information about RAKE and its usage, you can find in the original paper.
Options
The Rake
class accepts some configuration options:
'modifiers' => [] # A string, an instance or an array of Modifiers
'stoplist' => Stoplist::class # A Stoplist instance that provides a list of stop words
'sorter' => Sorter::class # A Sorter instance that sorts the output of the algorithm
'exclude' => [] # An array of words or regex that will be excluded from a stoplist
'include' => [] # An array of words or regexes that will be included in a stoplist
Note: the configuration options exclude
and include
accept simple regexes.
Note: the configuration option exclude
has a higher priority than the include
option.
Note: At the moment of instantiation, the Rake
class can throw an InvalidOptionType
exception. This exception
extends a built-in InvalidArgumentException
class, so it is easy to deal with.
Simple regular expressions
The configuration options exclude
and include
accept regular expressions. The current expressions are currently supported:
.+(ly)
- a one or more match with groupingword(s)
- a match with alternation at the end of a word(word|letter)
- an alternation of words
Testing
composer test
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Note: Please make sure to update tests as appropriate.
License
The MIT License (MIT). Please see the License file for more information.