keyword-extractor / keyword-extractor
A package to extract keywords from text
Installs: 7 646
Dependents: 0
Suggesters: 0
Security: 0
Stars: 10
Watchers: 2
Forks: 5
Open Issues: 1
Requires
- php: ^7.4|^8.0
- nlp-tools/nlp-tools: ^0.1.3
Requires (Dev)
- phpmd/phpmd: ^2.6
- phpunit/phpunit: ^9.2
- squizlabs/php_codesniffer: ^3.2
README
A package to extract keywords from text
Server Requirements
- PHP >= 7.3
Usage
- To install ths package:
composer require keyword-extractor/keyword-extractor
- Extract the keywords:
$keywordExtractor = new KeywordExtractor();
$text = 'This is a simple sentence.';
$result = $keywordExtractor->run($text);
The result with the default modifiers and no sorting values will be:
Array
(
[simpl] => Array
(
[frequency] => 1
[occurrences] => Array
(
[0] => Array
(
[ngram] => simple
[indexes] => Array
(
[0] => 3
)
)
)
)
[sentenc] => Array
(
[frequency] => 1
[occurrences] => Array
(
[0] => Array
(
[ngram] => sentence.
[indexes] => Array
(
[0] => 4
)
)
)
)
)
Currently, the default modifiers are as follow (they will be applied to the tokens in order):
[
new EmailFilter(),
new PunctuationFilter(),
new WhitelistFilter($this->getWhitelist()),
new BlacklistFilter($this->getBlacklist()),
new StopWordFilter(),
new NumberFilter(),
new StemFilter(),
// run the blacklist even after stemming too
new BlacklistFilter($this->getBlacklist()),
]
Obviously, you can set your own modifiers:
$keywordExtractor->setModifiers([new PunctuationFilter()]);
Also, whitelist can be used as follow:
$keywordExtractor = new KeywordExtractor();
$text = 'This is a simple sentence and simple sentence.';
$keywordExtractor->setWhitelist(['simple']);
$result = $keywordExtractor->run($text);
Which results in:
Array
(
[simple] => Array
(
[frequency] => 2
[occurrences] => Array
(
[0] => Array
(
[ngram] => simple
[indexes] => Array
(
[0] => 3
)
)
[1] => Array
(
[ngram] => simple
[indexes] => Array
(
[0] => 6
)
)
)
)
[sentenc] => Array
(
[frequency] => 2
[occurrences] => Array
(
[0] => Array
(
[ngram] => sentence
[indexes] => Array
(
[0] => 4
)
)
[1] => Array
(
[ngram] => sentence.
[indexes] => Array
(
[0] => 7
)
)
)
)
)
Blacklist can also be used in the same way as whitelist:
$keywordExtractor = new KeywordExtractor();
$text = 'This is a simple sentence.';
$keywordExtractor->setBlacklist(['simple']);
$result = $keywordExtractor->run($text);
The result is:
Array
(
[sentenc] => Array
(
[frequency] => 1
[occurrences] => Array
(
[0] => Array
(
[ngram] => sentence.
[indexes] => Array
(
[0] => 4
)
)
)
)
)
To sort by frequency in descending order:
$keywordExtractor->run($text, Sorter::SORT_BY_FREQUENCY, Sorter::SORT_DIR_DESC);
To sort by min occurrences distance:
$text = 'sentence and sentence';
$result = $this->keywordExtractor->run($text, Sorter::SORT_BY_MIN_OCCURRENCE_DISTANCE);
Array
(
[sentenc] => Array
(
[frequency] => 2
[occurrences] => Array
(
[0] => Array
(
[ngram] => sentence
[indexes] => Array
(
[0] => 0
)
)
[1] => Array
(
[ngram] => sentence
[indexes] => Array
(
[0] => 2
)
)
)
[minOccurrencesDistance] => 1
)
)