org_heigl/textstatistics

Get statistics from a given text

0.2.1 2018-02-16 16:58 UTC

README

Calculate text-statistics including Sylables, Flesch-Reading-Ease (english and german) and such things.

Build Status Scrutinizer Code Quality Code Climate StyleCI Coverage Status

Latest Stable Version Total Downloads License composer.lock

Why

The one other implementation davechild/textstatistics sadly only implements statistics for english texts. That sadly didn't work for texts with f.e. german umlauts. So I decided to implement some of the algorithms again using work I already did for a hyphenator.

That's why f.e. the syllable-calculation differs.

Installation

TextStatistics is best installed using composer

$ composer require org_heigl/textstatistics

Usage

The different Calculators all implement a common CalculatorInterface and therefore all provide a calculate-Method that expects a Text-Object containing the Text to be calculated.

Currently these Statistics are avalable:

  • Average Sentence Length
  • Average Syllables per word
  • Character-Count (including Whitespace)
  • Character-Count (excluding whitespace)
  • Flesch-Kincaid Grade Level
  • Flesch-Reading-Ease for English texts
  • Flesch-Reading-Ease for German texts
  • Flesch-Reading-Ease School-Grade measurement
  • Sentence-Count
  • Max Syllables in Sentence
  • Max Words in Sentence
  • Syllable-Count
  • Wiener Sachtext-Formel 1, 2, 3 and 4
  • Word-Count
  • Max Syllables in Word
  • Number of words with minimum N characters
  • Percentage of Words with minimum N characters
  • Number of words with minimum N syllables
  • Percentage of words with minimum N syllables
  • Number of words with only N syllables
  • Percentage of words with only N syllables

There are Factory-Methods for each statistic available, so getting one of the statistics requires the following line of code:

$text = new \Org_Heigl\TextStatistics\Text($theText);
$wordCount =\Org_Heigl\TextStatistics\Service\WordCounterFactory::getCalculator()->calculate($text);
$fleschReadingEase = /Org_Heigl\TextStatistics\Service\FleschReadingEaseCalculatorFactory::getCalculator()->calculate($text);

You can also add multiple Calculators to the TextStatisticsGenerator and retrieve multiple Statistics in one go like this:

$text = new \Org_Heigl\TextStatistics\Text($theText);

$statGenerator = new \Org_Heigl\TextStatistics\TextSTatisticsGenerator();
$statGenerator->add('wordCount', \Org_Heigl\TextStatistics\Service\WordCounterFactory::getCalculator());
$statGenerator->add('flesch', \Org_Heigl\TextStatistics\Service\FleschReadingEaseCalculatorFactory::getCalculator());

print_R($statGenerator->calculate($text));

// array(
//    'wordCount' => xx,
//    'flesch' => yy,
// )