nihongodera / limelight
A php Japanese language text analyzer and parser.
Installs: 63 148
Dependents: 0
Suggesters: 0
Security: 0
Stars: 97
Watchers: 8
Forks: 28
Open Issues: 6
Type:project
Requires
- php: >=5.6
- ext-mecab: *
Requires (Dev)
- phpunit/phpunit: ^6.4
This package is auto-updated.
Last update: 2024-12-29 05:32:35 UTC
README
A php Japanese language analyzer and parser.
- Split Japanese text into individual, full words
- Find parts of speech for words
- Find dictionary entries (lemmas) for conjugated words
- Get readings and pronunciations for words
- Build furigana for words
- Convert Japanese to romaji (English lettering)
Quick Guide
- Version Notes
- Install Limelight
- Parse Text
- Get Results
- Full Documentation
- Sources, Contributions, and Contributing
Version Notes
- April 25, 2016: The Limelight API changed in Version 1.6.0. The new API uses collection methods to give developers better control of Limelight parse results. Please see the wiki for the updated documentation.
- April 11, 2016: php-mecab, the MeCab bindings Limelight uses, were updated to version 0.6.0 in Dec. 2015 for php 7 support. The pre-0.6.0 bindings no longer work with the master branch of Limelight. If you are using an older version of php-mecab, please update your bindings or use the php-mecab_pre_0.6.0 version.
Install Limelight
Using Docker
From the project root, build the image:
docker build -f docker/Dockerfile -t limelight .
Once it is built, run the container:
docker run --name limelight -v /host/path/to/limelight:/usr/limelight -d --rm limelight
Access the project in the container:
docker exec -it limelight bash
Install composer dependencies from within the container:
composer install
Without Docker
Requirements
- php > 5.6
Dependencies
Before installing Limelight, you must install both mecab and the php extension php-mecab on your system.
Linux Ubuntu Users
Use the install script included in this repository. The script only works for and php7. Download the script:
curl -O https://raw.githubusercontent.com/nihongodera/limelight/master/install_mecab_php-mecab.sh
Make the file executable:
chmod +x install_mecab_php-mecab.sh
Execute the script:
./install_mecab_php-mecab.sh
You may need to restart your server to complete the process.
For information about what the script does, see here.
Other Systems
Please see this page to learn more about installing on your system.
Install Limelight
Install Limelight through composer.
composer require nihongodera/limelight
Parse Text
Make a new instance of Limelight\Limelight. Limelight takes no arguments.
$limelight = new Limelight();
Use the parse() method on the Limelight object to parse Japanese text.
$results = $limelight->parse('庭でライムを育てています。');
The returned object is an instance of Limelight\Classes\LimelightResults.
Get Results
Get results for the entire text using methods available on LimelightResults.
$results = $limelight->parse('庭でライムを育てています。'); echo 'Words: ' . $results->string('word') . "\n"; echo 'Readings: ' . $results->string('reading') . "\n"; echo 'Pronunciations: ' . $results->string('pronunciation') . "\n"; echo 'Lemmas: ' . $results->string('lemma') . "\n"; echo 'Parts of speech: ' . $results->string('partOfSpeech') . "\n"; echo 'Hiragana: ' . $results->toHiragana()->string('word') . "\n"; echo 'Katakana: ' . $results->toKatakana()->string('word') . "\n"; echo 'Romaji: ' . $results->string('romaji', ' ') . "\n"; echo 'Furigana: ' . $results->string('furigana') . "\n";
Output: Words: 庭でライムを育てています。 Readings: ニワデライムヲソダテテイマス。 Pronunciations: ニワデライムヲソダテテイマス。 Lemmas: 庭でライムを育てる。 Parts of speech: noun postposition noun postposition verb symbol Hiragana: にわでらいむをそだてています。 Katakana: ニワデライムヲソダテテイマス。 Romaji: niwa de raimu o sodateteimasu. Furigana: でライムをてています。
Alter the collection of words however you like using the library of collection methods.
Get individual words off the LimelightResults object by using one of several applicable collection methods. Use methods available on the returned LimelightWord object.
$results = $limelight->parse('庭でライムを育てています。'); $word1 = $results->pull(2); $word2 = $results->where('word', '庭'); echo $word1->string('romaji') . "\n"; echo $word2->string('furigana') . "\n";
Output: raimu
Methods on the LimelightResults object and the LimelightWord object follow the same conventions, but LimelightResults methods are plural (words()) while LimelightWord methods are singular (word()).
Alternatively, loop through all the words on the LimelightResults object.
$results = $limelight->parse('庭でライムを育てています。'); foreach ($results as $word) { echo $word->word() . ' is a ' . $word->partOfSpeech() . ' read like ' . $word->reading() . "\n"; }
Output: 庭 is a noun read like ニワ で is a postposition read like デ ライム is a noun read like ライム を is a postposition read like ヲ 育てています is a verb read like ソダテテイマス 。 is a symbol read like 。
Full Documentation
Full documentation for Limelight can be found on the Limelight Wiki page.
Sources, Contributions, and Contributing
The Japanese parsing logic used in Limelight was adapted from Kimtaro's excellent Ruby program Ve. A big thank you to him and all the others who contributed on that project.
Limelight relies heavily on both MeCab and php-mecab.
Collection methods and methods in the Arr class were derived from Laravel's collection methods.
Contributors more than welcome.