semji/gpt-3-tokenizer-php

PHP Text Tokenizer for GPT models

v2.0.0 2023-02-07 09:15 UTC

This package is auto-updated.

Last update: 2024-04-21 14:43:29 UTC


README

PHP Text Tokenizer for GPT models

About

A PHP toolkit to tokenize text like GPT family of models process it.

Forked from https://github.com/CodeRevolutionPlugins/GPT-3-Encoder-PHP to fit our usage, fix bugs and add unit testing.

Usage

The mbstring PHP extension is needed for this tool to work correctly (in case non-ASCII characters are present in the tokenized text): details here on how to install mbstring PHP 8.1 is needed too;

use Semji\GPT3Tokenizer\Encoder;
$prompt = "Many words map";
$encoder = new Encoder();
$encoder->encode($prompt);