angkor / khmercut
Khmer Tokenization
Requires
- php: ^8.3
- illuminate/contracts: ^11.0
- illuminate/process: ^11.0
- laravel/pint: ^1.17
Requires (Dev)
- larastan/larastan: ^2.9
- mockery/mockery: ^1.6
- orchestra/testbench: ^9.4
- pestphp/pest: ^2.35
- pestphp/pest-plugin-arch: ^2.7
- pestphp/pest-plugin-laravel: ^2.4
- pestphp/pest-plugin-type-coverage: ^2.8
- phpstan/phpstan-deprecation-rules: ^1.1.4
- phpstan/phpstan-phpunit: ^1.3.15
README
Khmercut
is a wrapper for the PHP Laravel framework, built on top of the Rust package created by seanghay/khmercut. This allows developers to leverage the functionality provided by the khmercut
Rust package within a Laravel application.
Installation
You can install the package via composer:
composer require angkor/khmercut
Download
You can download the built from Release link and choose the right platform and move it to where want it to be.
Usage
Publish the configuration file to set the binary path
php artisan vendor:publish --provider="Angkor\Khmercut\KhmercutServiceProvider" --tag="config"
Setup the .env
variable
TOKENIZER_BINARY_PATH=usr/local/bin/khmercut
use use Angkor\Khmercut\Tokenizer; Tokenizer::make('Pretty girl សួស្តីស្រីស្អាត Hello World សួស្តីពិភពលោក'); //output: "Pretty girl សួស្តី\u{200B}ស្រី\u{200B}ស្អាត Hello World សួស្តី\u{200B}ពិភពលោក"; Tokenizer::make('Pretty girl សួស្តីស្រីស្អាត Hello World សួស្តីពិភពលោក', '|'); //output: "Pretty girl សួស្តី|ស្រី|ស្អាត Hello World សួស្តី|ពិភពលោក";
Tokenizer
will add the ZERO WIDTH SPACE
only Khmer Word.
Testing
composer test
Changelog
Please see CHANGELOG for more information what has changed recently.
Contributing
Please see CONTRIBUTING for details.
Security
If you discover any security related issues, please email semsphy@gmail.com instead of using the issue tracker.
Credits
License
The MIT License (MIT). Please see License File for more information.