semji/gpt-3-tokenizer-php

PHP Text Tokenizer for GPT models

Installs: 111 105

Dependents: 0

Suggesters: 0

Security: 0

Stars: 4

Watchers: 4

Forks: 4

Open Issues: 1

pkg:composer/semji/gpt-3-tokenizer-php

v2.0.0 2023-02-07 09:15 UTC

This package is auto-updated.

Last update: 2025-10-27 04:46:32 UTC


README

PHP Text Tokenizer for GPT models

About

A PHP toolkit to tokenize text like GPT family of models process it.

Forked from https://github.com/CodeRevolutionPlugins/GPT-3-Encoder-PHP to fit our usage, fix bugs and add unit testing.

Usage

The mbstring PHP extension is needed for this tool to work correctly (in case non-ASCII characters are present in the tokenized text): details here on how to install mbstring PHP 8.1 is needed too;

use Semji\GPT3Tokenizer\Encoder;
$prompt = "Many words map";
$encoder = new Encoder();
$encoder->encode($prompt);