php-science / textrank
TextRank (automatic text summarization) for PHP.
Installs: 1 389 311
Dependents: 3
Suggesters: 0
Security: 0
Stars: 241
Watchers: 13
Forks: 39
Open Issues: 1
Requires
- php: >=7.2
- ext-ctype: *
- ext-mbstring: *
Requires (Dev)
- phpunit/phpunit: 9.*
This package is auto-updated.
Last update: 2024-08-29 17:02:39 UTC
README
TextRank
This source code is an implementation of TextRank algorithm in PHP programming language, under MIT licence.
TextRank vs. ChatGPT
GPTs like ChatGPT are supervised language models that understand the context and generate new content from the given input using vast resources while TextRank is a cost-efficient/low-cost text extraction algorithm. TextRank algorithm also can be used as a pre-processor to a GPT model to reduce the text size to save on resource consumption.
TextRank or Automatic summarization
Automatic summarization is the process of reducing a text document with a computer program in order to create a summary that retains the most important points of the original document. Technologies that can make a coherent summary take into account variables such as length, writing style and syntax. Automatic data summarization is part of machine learning and data mining. The main idea of summarization is to find a representative subset of the data, which contains the information of the entire set. Summarization technologies are used in a large number of sectors in industry today. - Wikipedia
The algorithm of this implementation is:
- Extracts sentences,
- Removes stopwords,
- Adds integer values to words by finding and counting the matching words,
- Weights the values of the words,
- Normalizes values to get the scores,
- Sorts by scores
Install to use it in your project
cd your-project-folder
composer require php-science/textrank
Install for contributing
cd git-project-folder
docker-compose build
docker-compose up -d
composer install
composer test
Examples
use PhpScience\TextRank\Tool\StopWords\English; // String contains a long text, see the /res/sample1.txt file. $text = "Lorem ipsum..."; $api = new TextRankFacade(); // English implementation for stopwords/junk words: $stopWords = new English(); $api->setStopWords($stopWords); // Array of the most important keywords: $result = $api->getOnlyKeyWords($text); // Array of the sentences from the most important part of the text: $result = $api->getHighlights($text); // Array of the most important sentences from the text: $result = $api->summarizeTextBasic($text);
More examples: