helgesverre / toon
Token-Oriented Object Notation - A compact data format for reducing token consumption when sending structured data to LLMs
Installs: 2 792
Dependents: 2
Suggesters: 0
Security: 0
Stars: 93
Watchers: 1
Forks: 6
Open Issues: 0
pkg:composer/helgesverre/toon
Requires
- php: ^8.1
Requires (Dev)
- laravel/pint: ^1.10
- phpbench/phpbench: ^1.4
- phpstan/phpstan: ^1.10
- phpunit/phpunit: ^10.0
README
A PHP port of toon-format/toon - a compact data format designed to reduce token consumption when sending structured data to Large Language Models.
Contents
- Quick Start · Basic Usage · Decoding · Configuration
- Tutorials · Version Compatibility · Development
What is TOON?
TOON is a compact, human-readable format for structured data optimized for LLM contexts. For format details and efficiency analysis, see the TOON Specification.
Installation
Install via Composer:
composer require helgesverre/toon
Requirements
- PHP 8.1 or higher
Quick Start
use HelgeSverre\Toon\Toon; // Encode data echo Toon::encode(['user' => 'Alice', 'score' => 95]); // user: Alice // score: 95 // Decode back to PHP $data = Toon::decode("user: Alice\nscore: 95"); // ['user' => 'Alice', 'score' => 95]
Try it online at ArrayAlchemy.
Basic Usage
use HelgeSverre\Toon\Toon; // Simple values echo Toon::encode('hello'); // hello echo Toon::encode(42); // 42 echo Toon::encode(true); // true echo Toon::encode(null); // null // Arrays echo Toon::encode(['a', 'b', 'c']); // [3]: a,b,c // Objects echo Toon::encode([ 'id' => 123, 'name' => 'Ada', 'active' => true ]); // id: 123 // name: Ada // active: true
Decoding TOON
TOON supports bidirectional conversion - you can decode TOON strings back to PHP arrays:
use HelgeSverre\Toon\Toon; // Decode simple values $result = Toon::decode('42'); // 42 $result = Toon::decode('hello'); // "hello" $result = Toon::decode('true'); // true // Decode arrays $result = Toon::decode('[3]: a,b,c'); // ['a', 'b', 'c'] // Decode objects (returned as associative arrays) $toon = <<<TOON id: 123 name: Ada active: true TOON; $result = Toon::decode($toon); // ['id' => 123, 'name' => 'Ada', 'active' => true] // Decode nested structures $toon = <<<TOON user: id: 123 email: ada@example.com metadata: active: true score: 9.5 TOON; $result = Toon::decode($toon); // ['user' => ['id' => 123, 'email' => 'ada@example.com', 'metadata' => ['active' => true, 'score' => 9.5]]]
Note: TOON objects are decoded as PHP associative arrays, not objects.
Tabular Format
TOON's most efficient format is for uniform object arrays:
echo Toon::encode([ 'users' => [ ['id' => 1, 'name' => 'Alice', 'role' => 'admin'], ['id' => 2, 'name' => 'Bob', 'role' => 'user'], ] ]);
Output:
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user
Field names are declared once in the header, then each row contains only values. This is where TOON achieves the largest token savings compared to JSON.
See docs/EXAMPLES.md for more encoding examples.
Configuration Options
Customize encoding behavior with EncodeOptions:
use HelgeSverre\Toon\EncodeOptions; // Custom indentation (default: 2) $options = new EncodeOptions(indent: 4); echo Toon::encode(['a' => ['b' => 'c']], $options); // a: // b: c // Tab delimiter instead of comma (default: ',') $options = new EncodeOptions(delimiter: "\t"); echo Toon::encode(['tags' => ['a', 'b', 'c']], $options); // tags[3\t]: a b c // Pipe delimiter $options = new EncodeOptions(delimiter: '|'); echo Toon::encode(['tags' => ['a', 'b', 'c']], $options); // tags[3|]: a|b|c
Special Value Handling
String Quoting
TOON only quotes strings when necessary:
echo Toon::encode('hello'); // hello (no quotes) echo Toon::encode('true'); // "true" (quoted - looks like boolean) echo Toon::encode('42'); // "42" (quoted - looks like number) echo Toon::encode('a:b'); // "a:b" (quoted - contains colon) echo Toon::encode(''); // "" (quoted - empty string) echo Toon::encode("line1\nline2"); // "line1\nline2" (quoted - control chars)
DateTime Objects
DateTime objects are automatically converted to ISO 8601 format:
$date = new DateTime('2025-01-01T00:00:00+00:00'); echo Toon::encode($date); // "2025-01-01T00:00:00+00:00"
PHP Enums
PHP enums are automatically normalized - BackedEnum values are extracted, UnitEnum names are used:
enum Status: string { case ACTIVE = 'active'; case INACTIVE = 'inactive'; } enum Priority: int { case LOW = 1; case HIGH = 10; } enum Color { case RED; case GREEN; case BLUE; } // BackedEnum with string value echo Toon::encode(Status::ACTIVE); // active // BackedEnum with int value echo Toon::encode(Priority::HIGH); // 10 // UnitEnum (no backing value) echo Toon::encode(Color::BLUE); // BLUE // Array of enum cases echo Toon::encode(Priority::cases()); // [2]: 1,10
Special Numeric Values
Non-finite numbers are converted to null:
echo Toon::encode(INF); // null echo Toon::encode(-INF); // null echo Toon::encode(NAN); // null
Helper Functions
TOON provides global helper functions for convenience:
// Basic encoding $toon = toon($data); // Decoding $data = toon_decode($toonString); // Lenient decoding (forgiving parsing) $data = toon_decode_lenient($toonString); // Compact (minimal indentation) $compact = toon_compact($data); // Readable (generous indentation) $readable = toon_readable($data); // Tabular (tab-delimited) $tabular = toon_tabular($data); // Compare with JSON $stats = toon_compare($data); // Returns: ['toon' => 450, 'json' => 800, 'savings' => 350, 'savings_percent' => '43.8%'] // Get size estimate $size = toon_size($data); // Estimate token count (4 chars/token heuristic) $tokens = toon_estimate_tokens($data);
Tutorials
Step-by-step guides for integrating TOON with LLM providers:
Getting Started
- Getting Started with TOON (10-15 min) Learn the basics: installation, encoding, configuration, and your first LLM integration.
Framework Integrations
-
OpenAI PHP Client Integration (15-20 min) Integrate TOON with OpenAI's official PHP client. Covers messages, function calling, and streaming.
-
Laravel + Prism AI Application (20-30 min) Build a complete Laravel AI chatbot using TOON and Prism for multi-provider support.
-
Anthropic/Claude Integration (20-25 min) Leverage Claude's 200K context window with TOON optimization. Process large datasets efficiently.
Advanced Topics
-
Token Optimization Strategies (20-25 min) Deep dive into token economics, RAG optimization, and cost reduction strategies.
-
Building a RAG System with TOON and Ollama (30-40 min) Create a production-ready RAG pipeline with TOON, Ollama embeddings, and vector similarity search.
See the tutorials/ directory for all tutorials and learning paths.
Version Compatibility
This library tracks the TOON Specification. Major versions align with spec versions.
| Library | Spec | Key Changes |
|---|---|---|
| v3.0.0 | v3.0 | List-item objects with tabular first field use depth +2 for rows |
| v2.0.0 | v2.0 | Removed [#N] length marker; decoder rejects legacy format |
| v1.4.0 | v1.3 | Full decoder, strict mode |
| v1.3.0 | v1.3 | PHP enum support |
| v1.2.0 | v1.3 | Empty array fix |
| v1.1.0 | v1.3 | Benchmarks, justfile |
| v1.0.0 | v1.3 | Initial release |
For format details and token efficiency analysis, see the TOON Specification.
Format Rules
Objects
- Key-value pairs with colons
- Indentation-based nesting (2 spaces by default)
- Empty objects shown as
key:
Arrays
- Primitives: Inline format with length
tags[3]: a,b,c - Uniform objects: Tabular format with headers
items[2]{sku,qty}: A1,2 - Mixed/non-uniform: List format with hyphens
Indentation
- 2 spaces per level (configurable)
- No trailing spaces
- No final newline
PHP-Specific Limitations
Numeric Key Handling
PHP automatically converts numeric string keys to integers in arrays:
// PHP automatically converts numeric keys $data = ['123' => 'value']; // Key becomes integer 123 echo Toon::encode($data); // "123": value (quoted as string)
The library handles this by quoting numeric keys when encoding.
Use Cases
TOON is ideal for:
- Sending structured data in LLM prompts
- Reducing token costs in API calls to language models
- Improving context window utilization
- Making data more human-readable in AI conversations
Note: TOON is optimized for LLM contexts and is not intended as a replacement for JSON in APIs or data storage.
Differences from JSON
TOON is not a strict superset or subset of JSON. Key differences:
- Bidirectional encoding and decoding (objects decode as associative arrays)
- Optimized for readability and token efficiency in LLM contexts
- Uses whitespace-significant formatting (indentation-based nesting)
- Includes metadata like array lengths and field headers for better LLM comprehension
Credits
- Original TypeScript implementation: toon-format/toon
- Specification: toon-format/spec
- PHP port: HelgeSverre
License
Development
Testing
composer test # Run tests composer test:coverage # Generate coverage report composer analyse # Static analysis
Benchmarks
cd benchmarks && composer install && composer run benchmark
See benchmarks/README.md for details.