kudashevs / keywords-extractor
A flexible and customizable PHP library that extracts relevant keywords from text.
Requires
- php: ^8.1
- ext-mbstring: *
- kudashevs/rake-php: ^2.4
Requires (Dev)
- phpunit/phpunit: ^10.1|^11.0
This package is auto-updated.
Last update: 2025-02-07 08:06:00 UTC
README
The keywords-extractor
is a flexible and customizable PHP library that extracts relevant keywords from text.
Installation
You can install the package via composer:
composer require kudashevs/keywords-extractor
Usage
The key feature of the keywords-extractor
library is the possibility to extract not only individual nouns, but meaningful
sequences of words that make more sense in some contexts. This possibility is provided by the RAKE PHP
library that is used as the default extraction algorithm. If this library doesn't suit your needs, it can be easily
substituted with something more appropriate or relevant.
The usage of the KeywordsExtractor
class is pretty straightforward. To extract keywords call the extract
method:
use Kudashevs\KeywordsExtractor\KeywordsExtractor; $extractor = new KeywordsExtractor([ 'add_words' => 'some', 'remove_words' => 'interesting' ]); $keywords = $extractor->extract('this is some interesting text'); print_r($keywords); // some, text
To make this library even more convenient to use in some specific cases, it provides a fluent interface.
use Kudashevs\KeywordsExtractor\KeywordsExtractor; $extractor = new KeywordsExtractor(); $keywords = $extractor->addWords(['this', 'example']) ->extract('this is a usage example'); print_r($keywords); // usage example, this
Words collections
Sometimes, there may be a necessity to provide a big number of words to be excluded or included. It can be done with the
add_words
and remove_words
options. In some cases, the options are not convenient to use. For these cases the library
introduces a concept of words collections and correspondent asset files. To start using
them, provide an assets_path
option to the KeywordsExtractor
class. The instantiation process is going to create two
different folders in the provided assets
folder (once created, these files won't be modified):
words
folder - contains default files used for words exclusions and words inclusionscache
folder - contains cached words collections (clean it whenever thewords
folder is updated)
If you want to include some words to the generated keywords, update the rake_exclude.txt
file. If you want to exclude
some words from the generated keywords, update the rake_include.txt
file. The naming logic may seem wierd, but it is
because the words are included to a list of stop words and excluded from a list of stop words.
Result length
By default, the returning result is limitless, meaning that the length of the result is not limited. However, in some cases the length of the result does matter. For these cases the package introduces the concept of a Limiter.
The library provides two possibilities to limit the result:
- using the
limit_length
option (for more information please refer to options) - using a custom Limiter with a pre-defined max length (the library contains a
LengthLimiter
class that limits by length and aPercentLimiter
class that limits by a text's percent that can be configured)
Options
The KeywordsExtractor
class accepts some configuration options:
'extractor' # An Extractor instance that does all of the extraction work.
'assets_path' # A string with a valid path with write permissions to keep assets.
'add_words' => value # A string or an array of words to add to the result (if they are ignored by an Extractor).
'remove_words' => value # A string or an array of words to remove from the result (if they are not ignored by an Extractor).
'limiter' # A Limiter instance that limits the length of the end result.
'limit_length' # An integer defines the maximum length of the result (is used only when a Limiter is not provided).
Note: At the moment of instantiation, the KeywordsExtractor
class can throw a few exceptions: InvalidOptionType
,
InvalidOptionValue
. These exceptions extend a built-in InvalidArgumentException
class, so they are easy to deal with.
Testing
composer test
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Note: Please make sure to update tests as appropriate.
License
The MIT License (MIT). Please see the License file for more information