Text syllable splitting and hyphenation using Frank M. Liang's TeX algorithm.
Copyright © 2011-2019 Martijn van der Lee. MIT Open Source license applies.
PHP Syllable splitting and hyphenation. or rather... PHP Syl-la-ble split-ting and hy-phen-ation.
Based on the work by Frank M. Liang (http://www.tug.org/docs/liang/) and the many volunteers in the TeX community.
Many languages supported. i.e. english (us/uk), spanish, german, french, dutch, italian, romanian, russian, etc. 76 languages in total.
Language sources: http://tug.org/tex-hyphen/#languages
Supports PHP 5.6 and up, so you can use it on older servers.
Just include phpSyllable in your project, set up the autoloader to the classes directory and instantiate yourself a Sylllable class.
$syllable = new Syllable('en-us'); echo $syllable->hyphenateText('Provide a plethora of paragraphs');
The following is an incomplete list, containing only the most common methods. For a complete documentation of all classes, read the generated PHPDoc.
Create a new Syllable class, with defaults
Set the directory where compiled language files may be stored.
Default to the
cache subdirectory of the current directory.
Set the directory where language source files can be found.
Default to the
languages subdirectory of the current directory.
Specify the character encoding to use or disable character encoding handling
completely by specifying
null as encoding. The default encoding is
which will work in most situations.
Set the language whose rules will be used for hyphenation.
Set the hyphen text or object to use as a hyphen marker.
Get the hyphen object used as a hyphen marker.
Set the minimum length required for a word to be hyphenated. Any words with less characters than this length will not be hyphenated.
Get the minimum length required for a word to be hyphenated.
Split a single word on where the hyphenation would go.
Split a text on where the hyphenation would go.
Hyphenate a single word.
Hyphenate all words in the plain text.
Hyphenate all readable text in the HTML, excluding HTML tags and attributes.
Count the number of syllables in the text and return a map with syllable count as key and number of words for that syllable count as the value.
Count the number of words in the text.
Count the number of syllables in the text.
Count the number of polysyllables (words with 3 or more syllables) in the text.
Exclude all HTML elements from hyphenation, allowing explicit whitelisting.
Exclude from hyphenation all HTML content within the given elements.
Exclude from hyphenation all HTML content within elements with the given attributes. If a value is specified, only those elements with attributes with that specific value are excluded.
Exclude from hyphenation all HTML content within elements matching the specified xpath queries.
Hyphenate all HTML content within the given elements, ignoring any rules which might exclude them from hyphenation.
Hyphenate all HTML content within elements with the given attributes. If a value is specified, only those elements with attributes with that specific value are included, ignoring any rules which might exclude them from hyphenation.
Hyphenate all HTML content within elements matching the specified xpath queries, ignoring any rules which might exclude them from hyphenation.
See the included demo.php file for a working example.
// Setup the autoloader (if needed) require_once dirname(__FILE__) . '/classes/autoloader.php'; // Create a new instance for the language $syllable = new Syllable('en-us'); // Set the directory where the .tex files are stored $syllable->getSource()->setPath(__DIR__ . '/languages'); // Set the directory where Syllable can store cache files $syllable->getCache()->setPath(__DIR__ . '/cache'); // Set the hyphen style. In this case, the ­ HTML entity // for HTML (falls back to '-' for text) $syllable->setHyphen(new Syllable_Hyphen_Soft); // Set the treshold (sensitivity) $syllable->setTreshold(Syllable::TRESHOLD_MOST); // Output hyphenated text echo $syllable->hyphenateText('Provide your own paragraphs...');
- Fixed PHP 7.4 compatibility (#37) by @Dargmuesli.
- Fixed bug reverted in refactoring (continue 3) by @Dargmuesli.
- Fixed bug reverted in refactoring (continue 2).
- Refactored for modern PHP and support for current PHP version.
getMinWordLength()to limit hyphenation to words with at least the specified number of characters.
- Fixes for composer.
- Composer autoloader added
- Improved documentation
- Updated spanish language files.
- Initial PHPDoc.
- More fixes for apostrophes in splitting.
- Fix for French language handling
- Refactor .text loading into source class.
- Massive cache performance increase (excessive writes).
- Fix slow initial cache writing; too many writes (only one was needed).
- Removed min_hyphenation; mb_strlen takes more time than hashmap lookup.
- Refactored cache interface.
- Improved unittests.
- Deprecated treshold feature. Was based on misinterpretation of the algorithm. Methods, constants and constructor signature unchanged, although you can now omit the treshold if you want (or leave it in, it's detected as a "fake" treshold).