PHP implementation of Token-Oriented Object Notation (TOON) - A compact, human-readable serialization format designed for LLM input

Installs: 13

Dependents: 0

Suggesters: 0

Security: 0

Stars: 9

Watchers: 0

Forks: 0

Open Issues: 0

pkg:composer/mateuszkardas/toon

v0.1.0 2025-11-03 18:23 UTC

This package is auto-updated.

Last update: 2026-01-03 19:09:25 UTC


README

PHP Version License: MIT

Note: This is a PHP implementation based on the original TOON format specification and reference implementations from toon-format/toon

PHP implementation of Token-Oriented Object Notation (TOON), a compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage.

TOON is intended for LLM input, not output. It's particularly efficient for uniform arrays of objects – multiple fields per row, same structure across items.

Features

  • 💸 Token-efficient: typically 30–60% fewer tokens than JSON
  • 🤿 LLM-friendly guardrails: explicit lengths and fields enable validation
  • 🍱 Minimal syntax: removes redundant punctuation (braces, brackets, most quotes)
  • 📐 Indentation-based structure: like YAML, uses whitespace instead of braces
  • 🧺 Tabular arrays: declare keys once, stream data as rows
  • Full compatibility: Implements the TOON v1.3 specification

Installation

Install via Composer:

composer require mateuszkardas/toon

Quick Start

<?php

require_once 'vendor/autoload.php';

use Toon\Toon;

// Data to encode
$data = [
    'users' => [
        ['id' => 1, 'name' => 'Alice', 'role' => 'admin'],
        ['id' => 2, 'name' => 'Bob', 'role' => 'user'],
    ],
];

// Encode to TOON
$encoded = Toon::encode($data);
echo $encoded;
// Output:
// users[2]{id,name,role}:
//   1,Alice,admin
//   2,Bob,user

// Decode back to PHP
$decoded = Toon::decode($encoded);
print_r($decoded);

Why TOON?

AI is becoming cheaper and more accessible, but larger context windows allow for larger data inputs as well. LLM tokens still cost money – and standard JSON is verbose and token-expensive:

{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" }
  ]
}

TOON conveys the same information with fewer tokens:

users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

API Reference

Encoding

Toon::encode(mixed $input, ?EncodeOptions $options = null): string

Convert a PHP value to TOON format.

Options:

  • indent (int): Number of spaces per indentation level (default: 2)
  • delimiter (string): Delimiter for arrays (',' '|' or "\t") (default: comma)
  • lengthMarker (string): Optional marker to prefix array lengths ("#" or "") (default: empty)

Example:

use Toon\Toon;
use Toon\EncodeOptions;
use Toon\Constants;

$options = new EncodeOptions(
    indent: 4,
    delimiter: Constants::DELIMITER_PIPE,
    lengthMarker: '#'
);

$result = Toon::encode($data, $options);

Decoding

Toon::decode(string $input, ?DecodeOptions $options = null): mixed

Convert TOON format to a PHP value.

Options:

  • indent (int): Number of spaces per indentation level (default: 2)
  • strict (bool): Enforce strict validation (default: true)

Example:

use Toon\Toon;
use Toon\DecodeOptions;

$options = new DecodeOptions(
    indent: 4,
    strict: false
);

$result = Toon::decode($toonString, $options);

Convenience Methods

// Shorthand for encode with defaults
$toon = Toon::stringify($data);

// Shorthand for decode with defaults
$data = Toon::parse($toonString);

Format Overview

Primitives

$data = [
    'name' => 'Alice',
    'age' => 30,
    'active' => true,
    'deleted' => null,
];

// TOON:
// name: Alice
// age: 30
// active: true
// deleted: null

Objects (Nested)

$data = [
    'user' => [
        'name' => 'Alice',
        'email' => 'alice@example.com',
    ],
];

// TOON:
// user:
//   name: Alice
//   email: alice@example.com

Arrays (Inline)

$data = [
    'tags' => ['php', 'toon', 'serialization'],
    'scores' => [95, 87, 92],
];

// TOON:
// tags[3]: php,toon,serialization
// scores[3]: 95,87,92

Arrays (Tabular) - Most Efficient

$data = [
    'users' => [
        ['id' => 1, 'name' => 'Alice', 'role' => 'admin'],
        ['id' => 2, 'name' => 'Bob', 'role' => 'user'],
        ['id' => 3, 'name' => 'Charlie', 'role' => 'user'],
    ],
];

// TOON:
// users[3]{id,name,role}:
//   1,Alice,admin
//   2,Bob,user
//   3,Charlie,user

Arrays (List)

$data = [
    'items' => [
        ['name' => 'Item A', 'nested' => ['x' => 1]],
        ['name' => 'Item B', 'nested' => ['x' => 2]],
    ],
];

// TOON:
// items[2]:
//   - name: Item A
//     nested:
//       x: 1
//   - name: Item B
//     nested:
//       x: 2

String Escaping

TOON properly escapes special characters:

$data = [
    'text' => "Line 1\nLine 2\tTabbed",
    'quoted' => 'She said "Hello"',
    'path' => 'C:\\Users\\file.txt',
];

// TOON:
// text: "Line 1\nLine 2\tTabbed"
// quoted: "She said \"Hello\""
// path: "C:\\Users\\file.txt"

Custom Delimiters

use Toon\Constants;

// Pipe delimiter
$options = new EncodeOptions(delimiter: Constants::DELIMITER_PIPE);
// Output: users[2]{id,name}:
//           1|Alice
//           2|Bob

// Tab delimiter
$options = new EncodeOptions(delimiter: Constants::DELIMITER_TAB);
// Output: users[2]{id,name}:
//           1	Alice
//           2	Bob

Length Markers

$options = new EncodeOptions(lengthMarker: '#');
// Output: users[#2]{id,name}:
//           1,Alice
//           2,Bob

Examples

See the /examples directory for more comprehensive examples:

  • examples/basic.php - Basic usage examples
  • examples/advanced.php - Advanced features and use cases

Run examples:

php examples/basic.php
php examples/advanced.php

Requirements

  • PHP 8.1 or higher

Testing

composer install
composer test

Notes and Limitations

  • For LLM input only: TOON is optimized for human-to-LLM communication, not LLM output
  • Best for tabular data: Maximum efficiency with uniform arrays of objects
  • Deeply nested data: JSON may be more efficient for complex nested structures
  • Empty arrays: Treated as empty objects [] in the round-trip

Use Cases

  • Passing large datasets to LLMs (GPT, Claude, etc.)
  • Reducing token costs in AI applications
  • Serializing tabular data for prompts
  • Configuration files for AI agents
  • Data exchange in token-constrained environments

Related Projects

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details.

Author

Mateusz Kardaś

Acknowledgments

Based on the TOON format specification and reference implementation by the toon-format team.