angkor/khmercut

Khmer Tokenization

0.1.4 2024-08-30 07:49 UTC

This package is auto-updated.

Last update: 2024-08-30 07:50:45 UTC


README

Status Total Downloads Latest Stable Version License: MIT

Khmercut is a wrapper for the PHP Laravel framework, built on top of the Rust package created by seanghay/khmercut. This allows developers to leverage the functionality provided by the khmercut Rust package within a Laravel application.

Installation

You can install the package via composer:

composer require angkor/khmercut

Download

You can download the built from Release link and choose the right platform and move it to where want it to be.

Usage

Publish the configuration file to set the binary path

php artisan vendor:publish --provider="Angkor\Khmercut\KhmercutServiceProvider" --tag="config"

Setup the .env variable

TOKENIZER_BINARY_PATH=usr/local/bin/khmercut
use use Angkor\Khmercut\Tokenizer;

Tokenizer::make('Pretty girl សួស្តីស្រីស្អាត Hello World សួស្តីពិភពលោក');

//output: "Pretty girl សួស្តី\u{200B}ស្រី\u{200B}ស្អាត Hello World សួស្តី\u{200B}ពិភពលោក";

Tokenizer::make('Pretty girl សួស្តីស្រីស្អាត Hello World សួស្តីពិភពលោក', '|');

//output: "Pretty girl សួស្តី|ស្រី|ស្អាត Hello World សួស្តី|ពិភពលោក";

Tokenizer will add the ZERO WIDTH SPACE only Khmer Word.

Testing

composer test

Changelog

Please see CHANGELOG for more information what has changed recently.

Contributing

Please see CONTRIBUTING for details.

Security

If you discover any security related issues, please email semsphy@gmail.com instead of using the issue tracker.

Credits

License

The MIT License (MIT). Please see License File for more information.