keyword-extractor/keyword-extractor

A package to extract keywords from text

v1.0.7 2020-06-29 13:38 UTC

This package is auto-updated.

Last update: 2024-03-29 03:44:50 UTC


README

A package to extract keywords from text

Latest Stable Version Build Status Build Status Code Climate Test Coverage Code Coverage Scrutinizer Code Quality Issue Count License StyleCI Codacy Badge Packagist GitHub license

Server Requirements

  • PHP >= 7.3

Usage

  • To install ths package:
composer require keyword-extractor/keyword-extractor
  • Extract the keywords:
$keywordExtractor = new KeywordExtractor();
$text = 'This is a simple sentence.';
$result = $keywordExtractor->run($text);

The result with the default modifiers and no sorting values will be:

Array
(
    [simpl] => Array
        (
            [frequency] => 1
            [occurrences] => Array
                (
                    [0] => Array
                        (
                            [ngram] => simple
                            [indexes] => Array
                                (
                                    [0] => 3
                                )

                        )

                )

        )

    [sentenc] => Array
        (
            [frequency] => 1
            [occurrences] => Array
                (
                    [0] => Array
                        (
                            [ngram] => sentence.
                            [indexes] => Array
                                (
                                    [0] => 4
                                )

                        )

                )

        )

)

Currently, the default modifiers are as follow (they will be applied to the tokens in order):

[
    new EmailFilter(),
    new PunctuationFilter(),
    new WhitelistFilter($this->getWhitelist()),
    new BlacklistFilter($this->getBlacklist()),
    new StopWordFilter(),
    new NumberFilter(),
    new StemFilter(),
    // run the blacklist even after stemming too
    new BlacklistFilter($this->getBlacklist()),
]

Obviously, you can set your own modifiers:

$keywordExtractor->setModifiers([new PunctuationFilter()]);

Also, whitelist can be used as follow:

$keywordExtractor = new KeywordExtractor();
$text = 'This is a simple sentence and simple sentence.';
$keywordExtractor->setWhitelist(['simple']);
$result = $keywordExtractor->run($text);

Which results in:

Array
(
    [simple] => Array
        (
            [frequency] => 2
            [occurrences] => Array
                (
                    [0] => Array
                        (
                            [ngram] => simple
                            [indexes] => Array
                                (
                                    [0] => 3
                                )

                        )

                    [1] => Array
                        (
                            [ngram] => simple
                            [indexes] => Array
                                (
                                    [0] => 6
                                )

                        )

                )

        )

    [sentenc] => Array
        (
            [frequency] => 2
            [occurrences] => Array
                (
                    [0] => Array
                        (
                            [ngram] => sentence
                            [indexes] => Array
                                (
                                    [0] => 4
                                )

                        )

                    [1] => Array
                        (
                            [ngram] => sentence.
                            [indexes] => Array
                                (
                                    [0] => 7
                                )

                        )

                )

        )

)

Blacklist can also be used in the same way as whitelist:

$keywordExtractor = new KeywordExtractor();
$text = 'This is a simple sentence.';
$keywordExtractor->setBlacklist(['simple']);
$result = $keywordExtractor->run($text);

The result is:

Array
(
    [sentenc] => Array
        (
            [frequency] => 1
            [occurrences] => Array
                (
                    [0] => Array
                        (
                            [ngram] => sentence.
                            [indexes] => Array
                                (
                                    [0] => 4
                                )

                        )

                )

        )

)

To sort by frequency in descending order:

$keywordExtractor->run($text, Sorter::SORT_BY_FREQUENCY, Sorter::SORT_DIR_DESC);

To sort by min occurrences distance:

$text = 'sentence and sentence';
$result = $this->keywordExtractor->run($text, Sorter::SORT_BY_MIN_OCCURRENCE_DISTANCE);

Array
(
    [sentenc] => Array
        (
            [frequency] => 2
            [occurrences] => Array
                (
                    [0] => Array
                        (
                            [ngram] => sentence
                            [indexes] => Array
                                (
                                    [0] => 0
                                )

                        )

                    [1] => Array
                        (
                            [ngram] => sentence
                            [indexes] => Array
                                (
                                    [0] => 2
                                )

                        )

                )

            [minOccurrencesDistance] => 1
        )

)