sundance-solutions / larachain-token-count
Quick helper to count tokens
Installs: 6 695
Dependents: 0
Suggesters: 0
Security: 0
Stars: 3
Watchers: 2
Forks: 1
Open Issues: 2
Requires
- php: ^8.1
- illuminate/contracts: ^10.0
- spatie/laravel-package-tools: ^1.14.0
Requires (Dev)
- laravel/pint: ^1.0
- nunomaduro/collision: ^7.9
- nunomaduro/larastan: ^2.0.1
- orchestra/testbench: ^8.0
- pestphp/pest: ^2.0
- pestphp/pest-plugin-arch: ^2.0
- pestphp/pest-plugin-laravel: ^2.0
- phpstan/extension-installer: ^1.1
- phpstan/phpstan-deprecation-rules: ^1.0
- phpstan/phpstan-phpunit: ^1.0
This package is auto-updated.
Last update: 2024-11-02 01:22:46 UTC
README
GO USE https://github.com/yethee/tiktoken-php 👉
Below is supersceded by the above ☝️
GPT-3 Approximate Token Counter in PHP
This repository contains a PHP function that approximates the token count of a text string, following the tokenization rules used by OpenAI's GPT-3.
GPT-3, an advanced language model developed by OpenAI, reads text in chunks called tokens. A token in GPT-3 can be as short as one character or as long as one word (e.g., 'a', 'apple'). For languages with more complex scripts (like Chinese, Japanese, etc.), one character can be multiple tokens. Spaces and punctuation are also considered separate tokens.
The function provided here offers an approximation of how GPT-3 might tokenize a given string, counting words, spaces, and punctuation as separate tokens. This allows you to estimate the number of tokens in a text string without making an API call, which can be useful for monitoring usage or avoiding unnecessary costs.
Please note that this is a simplified approximation, and the actual tokenization may vary slightly in GPT-3's actual implementation. In particular, some words might be tokenized into multiple tokens if they contain special characters or are very long. Additionally, this method may not accurately tokenize languages other than English, especially those using non-Latin characters.
As of the last update in September 2021, OpenAI has not provided a public method for accurately counting tokens the way GPT-3 does. Therefore, this function is an estimation, not a guaranteed accurate count.
Installation
You can install the package via composer:
composer require sundance-solutions/larachain-token-count
Usage
use SundanceSolutions\LarachainTokenCount\Facades\LarachainTokenCount; $text = "Your document text..."; $results = LarachainTokenCount::count($text); expect($results)->toEqual(8);
Testing
composer test
Changelog
Please see CHANGELOG for more information on what has changed recently.
Contributing
Please see CONTRIBUTING for details.
Security Vulnerabilities
Please review our security policy on how to report security vulnerabilities.
Credits
License
The MIT License (MIT). Please see License File for more information.