textualization / ropherta-tokenizer
GPT3Tokenizer (BPE) with Roberta-base vocabulary.
Fund package maintenance!
Ko-Fi
Requires
- gioni06/gpt3-tokenizer: v1.2.0
- textualization/sentencepiece: v0.0.3
Requires (Dev)
- phpunit/phpunit: ^9.5.8
README
This is just a wrapper around GPT3Tokenizer using the HuggingFace RoBERTa vocab and merge files.
See GPT3 documentation for example use (or the generated test case under tests/
).
XLM Tokenizer
To use the multilingual version, the SentencePiece dependency needs to be initialized and an aditional model file needs to be downloaded:
composer exec -- php -r "require 'vendor/autoload.php'; Textualization\SentencePiece\Vendor::check();"
composer exec -- php -r "require 'vendor/autoload.php'; Textualization\Ropherta\Tokenizer\Vendor::check();"
Sponsors
We thank our sponsor: