textualization / sentencepiece
Google SentencePiece bindings using FFI and a C adapter.
Fund package maintenance!
Ko-Fi
Requires (Dev)
- phpunit/phpunit: ^9.5.8
README
This is a minimal wrapper on top of Google SentencePiece to enable executing the XLMRobertaTokenizer encode method.
It needs the dynamic library for SentencePiece built with aditional C wrapper functions, see the fork at [https://github.com/textualization/sentencepiece/].
A binary for the library can be downloaded by doing:
composer exec -- php -r "require 'vendor/autoload.php'; Textualization\SentencePiece\Vendor::check();"
but depending on platform and GLIBC you might need to compile it yourself and copy to vendor/textualization/sentencepiece/lib
(create the folder if it doesn't exist). See src/Vendor.php
for details.
Running the tests
To run the tests you'll need to install the library per the instructions above.
To fully test it, download this file sentencepiece.bpe.model and place it in tests/
.