Token-Oriented Object Notation - A compact data format for reducing token consumption when sending structured data to LLMs

Installs: 2 792

Dependents: 2

Suggesters: 0

Security: 0

Stars: 93

Watchers: 1

Forks: 6

Open Issues: 0

pkg:composer/helgesverre/toon

v3.0.0 2025-11-26 14:00 UTC

This package is auto-updated.

Last update: 2025-11-28 13:48:39 UTC


README

Packagist Version Total Downloads License Try it

A PHP port of toon-format/toon - a compact data format designed to reduce token consumption when sending structured data to Large Language Models.

Contents

What is TOON?

TOON is a compact, human-readable format for structured data optimized for LLM contexts. For format details and efficiency analysis, see the TOON Specification.

Installation

Install via Composer:

composer require helgesverre/toon

Requirements

  • PHP 8.1 or higher

Quick Start

use HelgeSverre\Toon\Toon;

// Encode data
echo Toon::encode(['user' => 'Alice', 'score' => 95]);
// user: Alice
// score: 95

// Decode back to PHP
$data = Toon::decode("user: Alice\nscore: 95");
// ['user' => 'Alice', 'score' => 95]

Try it online at ArrayAlchemy.

Basic Usage

use HelgeSverre\Toon\Toon;

// Simple values
echo Toon::encode('hello');        // hello
echo Toon::encode(42);             // 42
echo Toon::encode(true);           // true
echo Toon::encode(null);           // null

// Arrays
echo Toon::encode(['a', 'b', 'c']);
// [3]: a,b,c

// Objects
echo Toon::encode([
    'id' => 123,
    'name' => 'Ada',
    'active' => true
]);
// id: 123
// name: Ada
// active: true

Decoding TOON

TOON supports bidirectional conversion - you can decode TOON strings back to PHP arrays:

use HelgeSverre\Toon\Toon;

// Decode simple values
$result = Toon::decode('42');           // 42
$result = Toon::decode('hello');        // "hello"
$result = Toon::decode('true');         // true

// Decode arrays
$result = Toon::decode('[3]: a,b,c');
// ['a', 'b', 'c']

// Decode objects (returned as associative arrays)
$toon = <<<TOON
id: 123
name: Ada
active: true
TOON;

$result = Toon::decode($toon);
// ['id' => 123, 'name' => 'Ada', 'active' => true]

// Decode nested structures
$toon = <<<TOON
user:
  id: 123
  email: ada@example.com
  metadata:
    active: true
    score: 9.5
TOON;

$result = Toon::decode($toon);
// ['user' => ['id' => 123, 'email' => 'ada@example.com', 'metadata' => ['active' => true, 'score' => 9.5]]]

Note: TOON objects are decoded as PHP associative arrays, not objects.

Tabular Format

TOON's most efficient format is for uniform object arrays:

echo Toon::encode([
    'users' => [
        ['id' => 1, 'name' => 'Alice', 'role' => 'admin'],
        ['id' => 2, 'name' => 'Bob', 'role' => 'user'],
    ]
]);

Output:

users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

Field names are declared once in the header, then each row contains only values. This is where TOON achieves the largest token savings compared to JSON.

See docs/EXAMPLES.md for more encoding examples.

Configuration Options

Customize encoding behavior with EncodeOptions:

use HelgeSverre\Toon\EncodeOptions;

// Custom indentation (default: 2)
$options = new EncodeOptions(indent: 4);
echo Toon::encode(['a' => ['b' => 'c']], $options);
// a:
//     b: c

// Tab delimiter instead of comma (default: ',')
$options = new EncodeOptions(delimiter: "\t");
echo Toon::encode(['tags' => ['a', 'b', 'c']], $options);
// tags[3\t]: a	b	c

// Pipe delimiter
$options = new EncodeOptions(delimiter: '|');
echo Toon::encode(['tags' => ['a', 'b', 'c']], $options);
// tags[3|]: a|b|c

Special Value Handling

String Quoting

TOON only quotes strings when necessary:

echo Toon::encode('hello');           // hello (no quotes)
echo Toon::encode('true');            // "true" (quoted - looks like boolean)
echo Toon::encode('42');              // "42" (quoted - looks like number)
echo Toon::encode('a:b');             // "a:b" (quoted - contains colon)
echo Toon::encode('');                // "" (quoted - empty string)
echo Toon::encode("line1\nline2");    // "line1\nline2" (quoted - control chars)

DateTime Objects

DateTime objects are automatically converted to ISO 8601 format:

$date = new DateTime('2025-01-01T00:00:00+00:00');
echo Toon::encode($date);
// "2025-01-01T00:00:00+00:00"

PHP Enums

PHP enums are automatically normalized - BackedEnum values are extracted, UnitEnum names are used:

enum Status: string {
    case ACTIVE = 'active';
    case INACTIVE = 'inactive';
}

enum Priority: int {
    case LOW = 1;
    case HIGH = 10;
}

enum Color {
    case RED;
    case GREEN;
    case BLUE;
}

// BackedEnum with string value
echo Toon::encode(Status::ACTIVE);
// active

// BackedEnum with int value
echo Toon::encode(Priority::HIGH);
// 10

// UnitEnum (no backing value)
echo Toon::encode(Color::BLUE);
// BLUE

// Array of enum cases
echo Toon::encode(Priority::cases());
// [2]: 1,10

Special Numeric Values

Non-finite numbers are converted to null:

echo Toon::encode(INF);     // null
echo Toon::encode(-INF);    // null
echo Toon::encode(NAN);     // null

Helper Functions

TOON provides global helper functions for convenience:

// Basic encoding
$toon = toon($data);

// Decoding
$data = toon_decode($toonString);

// Lenient decoding (forgiving parsing)
$data = toon_decode_lenient($toonString);

// Compact (minimal indentation)
$compact = toon_compact($data);

// Readable (generous indentation)
$readable = toon_readable($data);

// Tabular (tab-delimited)
$tabular = toon_tabular($data);

// Compare with JSON
$stats = toon_compare($data);
// Returns: ['toon' => 450, 'json' => 800, 'savings' => 350, 'savings_percent' => '43.8%']

// Get size estimate
$size = toon_size($data);

// Estimate token count (4 chars/token heuristic)
$tokens = toon_estimate_tokens($data);

Tutorials

Step-by-step guides for integrating TOON with LLM providers:

Getting Started

  • Getting Started with TOON (10-15 min) Learn the basics: installation, encoding, configuration, and your first LLM integration.

Framework Integrations

Advanced Topics

See the tutorials/ directory for all tutorials and learning paths.

Version Compatibility

This library tracks the TOON Specification. Major versions align with spec versions.

Library Spec Key Changes
v3.0.0 v3.0 List-item objects with tabular first field use depth +2 for rows
v2.0.0 v2.0 Removed [#N] length marker; decoder rejects legacy format
v1.4.0 v1.3 Full decoder, strict mode
v1.3.0 v1.3 PHP enum support
v1.2.0 v1.3 Empty array fix
v1.1.0 v1.3 Benchmarks, justfile
v1.0.0 v1.3 Initial release

For format details and token efficiency analysis, see the TOON Specification.

Format Rules

Objects

  • Key-value pairs with colons
  • Indentation-based nesting (2 spaces by default)
  • Empty objects shown as key:

Arrays

  • Primitives: Inline format with length tags[3]: a,b,c
  • Uniform objects: Tabular format with headers items[2]{sku,qty}: A1,2
  • Mixed/non-uniform: List format with hyphens

Indentation

  • 2 spaces per level (configurable)
  • No trailing spaces
  • No final newline

PHP-Specific Limitations

Numeric Key Handling

PHP automatically converts numeric string keys to integers in arrays:

// PHP automatically converts numeric keys
$data = ['123' => 'value'];  // Key becomes integer 123
echo Toon::encode($data);    // "123": value (quoted as string)

The library handles this by quoting numeric keys when encoding.

Use Cases

TOON is ideal for:

  • Sending structured data in LLM prompts
  • Reducing token costs in API calls to language models
  • Improving context window utilization
  • Making data more human-readable in AI conversations

Note: TOON is optimized for LLM contexts and is not intended as a replacement for JSON in APIs or data storage.

Differences from JSON

TOON is not a strict superset or subset of JSON. Key differences:

  • Bidirectional encoding and decoding (objects decode as associative arrays)
  • Optimized for readability and token efficiency in LLM contexts
  • Uses whitespace-significant formatting (indentation-based nesting)
  • Includes metadata like array lengths and field headers for better LLM comprehension

Credits

License

MIT License

Development

Testing

composer test                # Run tests
composer test:coverage       # Generate coverage report
composer analyse             # Static analysis

Benchmarks

cd benchmarks && composer install && composer run benchmark

See benchmarks/README.md for details.