semji / gpt-3-tokenizer-php
PHP Text Tokenizer for GPT models
Installs: 75 104
Dependents: 0
Suggesters: 0
Security: 0
Stars: 4
Watchers: 4
Forks: 4
Open Issues: 1
Requires
- php: ^8.1
- ext-mbstring: *
Requires (Dev)
- friendsofphp/php-cs-fixer: ^3.14
- phpstan/phpstan: ^1.9
- phpunit/phpunit: ^9.5
- rector/rector: ^0.15.12
- symfony/var-dumper: ^6.2
This package is auto-updated.
Last update: 2025-04-27 03:45:19 UTC
README
PHP Text Tokenizer for GPT models
About
A PHP toolkit to tokenize text like GPT family of models process it.
Forked from https://github.com/CodeRevolutionPlugins/GPT-3-Encoder-PHP to fit our usage, fix bugs and add unit testing.
Usage
The mbstring PHP extension is needed for this tool to work correctly (in case non-ASCII characters are present in the tokenized text): details here on how to install mbstring PHP 8.1 is needed too;
use Semji\GPT3Tokenizer\Encoder; $prompt = "Many words map"; $encoder = new Encoder(); $encoder->encode($prompt);