danny50610/bpe-tokeniser

PHP port for openai/tiktoken (most)

0.1.0 2023-08-21 12:37 UTC

This package is auto-updated.

Last update: 2024-04-25 16:42:10 UTC


README

PHP Test codecov Latest Stable Version Total Downloads License

PHP port for openai/tiktoken (most)

Installation

composer require danny50610/bpe-tokeniser

Example

GPT-4 / GPT-3.5-Turbo (cl100k_base)

use Danny50610\BpeTokeniser\EncodingFactory;

$enc = EncodingFactory::createByEncodingName('cl100k_base');

var_dump($enc->encode("hello world"));
/**
 * output: 
 * array(2) {
 *  [0]=>
 *  int(15339)
 *  [1]=>
 *  int(1917)
 * }
 */

var_dump($enc->decode($enc->encode("hello world")));
// output: string(11) "hello world"
use Danny50610\BpeTokeniser\EncodingFactory;

$enc = EncodingFactory::createByModelName('gpt-3.5-turbo');

var_dump($enc->decode($enc->encode("hello world")));
// output: string(11) "hello world"

For available encodings, see src/EncodingFactory.php