php-science/textrank

TextRank (automatic text summarization) for PHP7 and HHVM.

1.2.1 2021-02-27 07:54 UTC

This package is auto-updated.

Last update: 2021-05-27 08:28:28 UTC


README

badge.svg 68747470733a2f2f706f7365722e707567782e6f72672f7068702d736369656e63652f7465787472616e6b2f762f737461626c652e737667 68747470733a2f2f706f7365722e707567782e6f72672f7068702d736369656e63652f7465787472616e6b2f646f776e6c6f616473 68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6c6963656e73652d4d49542d4646463330302e737667

This source code is an implementation of the TextRank algorithm (Automatic summarization) on PHP7 strict mode. It can summarize a text, article for example to a short paragraph. Before it would start the summarizing it removes the junk words what are defined in the Stopwords namespace. It is possible to extend it with another languages.

TextRank or Automatic summarization

Automatic summarization is the process of reducing a text document with a computer program in order to create a summary that retains the most important points of the original document. Technologies that can make a coherent summary take into account variables such as length, writing style and syntax. Automatic data summarization is part of machine learning and data mining. The main idea of summarization is to find a representative subset of the data, which contains the information of the entire set. Summarization technologies are used in a large number of sectors in industry today. - Wikipedia

The algorithm of this implementation is:

  • Find sentences,
  • Remove stopwords,
  • Create integer values by find and count the matching words,
  • Change the integer values by the related words' integer values,
  • Normalize values to create scores,
  • Order by scores

Install

composer require php-science/textrank

Test

cd project-folder
composer test

or

cd project-folder
phpunit --colors='always' $(pwd)/tests

Examples

use PhpScience\TextRank\Tool\StopWords\English;

// String contains a long text, see the /res/sample1.txt file.
$text = "Lorem ipsum...";

$api = new TextRankFacade();
// English implementation for stopwords/junk words:
$stopWords = new English();
$api->setStopWords($stopWords);

// Array of the most important keywords:
$result = $api->getOnlyKeyWords($text); 

// Array of the sentences from the most important part of the text:
$result = $api->getHighlights($text); 

// Array of the most important sentences from the text:
$result = $api->summarizeTextBasic($text);

More examples:

Authors, Contributors

Name GitHub user
David Belicza @DavidBelicza
Riccardo Marton @riccardomarton
Syndesi @Syndesi
vincentsch @vincentsch
Andrew Welch @khalwat
Andrey Astashov @mvcaaa
Leo Toneff @bragle
Willy Arisky @willyarisky