textualization/ropherta-tokenizer

GPT3Tokenizer (BPE) with Roberta-base vocabulary.

Maintainers

Package info

github.com/Textualization/RophertaTokenizer

Issues

pkg:composer/textualization/ropherta-tokenizer

Fund package maintenance!

Ko-Fi

Statistics

Installs: 104

Dependents: 1

Suggesters: 0

Stars: 1

v0.0.7 2024-02-21 00:01 UTC

This package is auto-updated.

Last update: 2026-02-21 04:17:00 UTC


README

This is just a wrapper around GPT3Tokenizer using the HuggingFace RoBERTa vocab and merge files.

See GPT3 documentation for example use (or the generated test case under tests/).

XLM Tokenizer

To use the multilingual version, the SentencePiece dependency needs to be initialized and an aditional model file needs to be downloaded:

composer exec -- php -r "require 'vendor/autoload.php'; Textualization\SentencePiece\Vendor::check();"
composer exec -- php -r "require 'vendor/autoload.php'; Textualization\Ropherta\Tokenizer\Vendor::check();"

Sponsors

We thank our sponsor: