README

ATON is a token-efficient data serialization format designed specifically for LLM applications. It reduces token usage by up to 55% compared to JSON while maintaining perfect data fidelity.

V2 Features

Compression Modes: FAST, BALANCED, ULTRA, ADAPTIVE
Query Language: SQL-like syntax with full AST parser
Streaming Encoder: Process large datasets in chunks
Dictionary Compression: Automatic string deduplication
Full PHP 8 Support: Enums, named arguments, typed properties
Zero Dependencies: Lightweight and fast

Installation

composer require dagost/aton-format

Quick Start

<?php

use Aton\ATON;
use Aton\Enums\CompressionMode;

// Simple encode/decode
$data = [
    'employees' => [
        ['id' => 1, 'name' => 'Alice', 'role' => 'Engineer', 'active' => true],
        ['id' => 2, 'name' => 'Bob', 'role' => 'Designer', 'active' => true],
        ['id' => 3, 'name' => 'Carol', 'role' => 'Manager', 'active' => true],
    ]
];

$atonText = ATON::encode($data);
echo $atonText;
// Output:
// @schema[id:int, name:str, role:str, active:bool]
// @defaults[active:true]
//
// employees(3):
//   1, "Alice", "Engineer"
//   2, "Bob", "Designer"
//   3, "Carol", "Manager"

// Decode back
$original = ATON::decode($atonText);

Compression Modes

use Aton\Encoder;
use Aton\Enums\CompressionMode;

// Fast: No dictionary compression, fastest encoding
$fast = new Encoder(compression: CompressionMode::FAST);

// Balanced: Good compression with reasonable speed (default)
$balanced = new Encoder(compression: CompressionMode::BALANCED);

// Ultra: Maximum compression, best for large datasets
$ultra = new Encoder(compression: CompressionMode::ULTRA);

// Adaptive: Automatically selects mode based on data size
$adaptive = new Encoder(compression: CompressionMode::ADAPTIVE);

Query Language

ATON supports SQL-like queries for filtering data:

use Aton\ATON;
use Aton\QueryEngine;

$data = [
    'products' => [
        ['id' => 1, 'name' => 'Laptop', 'price' => 999, 'category' => 'Electronics'],
        ['id' => 2, 'name' => 'Mouse', 'price' => 29, 'category' => 'Electronics'],
        ['id' => 3, 'name' => 'Desk', 'price' => 299, 'category' => 'Furniture'],
    ]
];

// Parse and execute query
$queryEngine = ATON::createQueryEngine();
$query = $queryEngine->parse("products WHERE price > 100 ORDER BY price DESC LIMIT 10");
$results = $queryEngine->execute($data, $query);

// Or encode with query directly
$filteredAton = ATON::encodeWithQuery($data, "products WHERE category = 'Electronics'");

Query Syntax

-- Basic filtering
products WHERE price > 100

-- Multiple conditions
products WHERE price > 100 AND category = 'Electronics'

-- OR conditions
products WHERE category = 'Electronics' OR category = 'Furniture'

-- IN operator
products WHERE category IN ('Electronics', 'Furniture')

-- LIKE operator (pattern matching)
products WHERE name LIKE '%Laptop%'

-- BETWEEN
products WHERE price BETWEEN 100 AND 500

-- Sorting and pagination
products WHERE active = true ORDER BY price DESC LIMIT 10 OFFSET 5

-- Select specific fields
products SELECT id, name WHERE price > 100

Streaming Encoder

For large datasets, use the streaming encoder:

use Aton\StreamEncoder;
use Aton\Enums\CompressionMode;

$streamEncoder = new StreamEncoder(
    chunkSize: 100,
    compression: CompressionMode::BALANCED
);

$largeData = [
    'records' => array_map(
        fn($i) => ['id' => $i, 'name' => "Record $i", 'value' => rand()],
        range(1, 10000)
    )
];

// Process in chunks
foreach ($streamEncoder->streamEncode($largeData) as $chunk) {
    echo "Chunk {$chunk['chunkId']}/{$chunk['totalChunks']}\n";
    echo "Progress: " . ($chunk['metadata']['progress'] * 100) . "%\n";

    // Process chunk data
    sendToAPI($chunk['data']);
}

Compression Statistics

use Aton\ATON;

$stats = ATON::getCompressionStats($data);

echo "Original tokens: {$stats['originalTokens']}\n";
echo "Compressed tokens: {$stats['compressedTokens']}\n";
echo "Savings: {$stats['savingsPercent']}%\n";
echo "Compression ratio: {$stats['compressionRatio']}\n";

API Reference

ATON Facade

ATON::encode(array $data, bool $compress = true, CompressionMode $compression = CompressionMode::BALANCED): string
ATON::decode(string $atonString): array
ATON::encodeWithQuery(array $data, string $queryString): string
ATON::getCompressionStats(array $data, CompressionMode $compression = CompressionMode::BALANCED): array
ATON::createEncoder(...): Encoder
ATON::createDecoder(...): Decoder
ATON::createStreamEncoder(...): StreamEncoder
ATON::createQueryEngine(): QueryEngine

Encoder Class

$encoder = new Encoder(
    optimize: true,              // Enable schema and defaults optimization
    compression: CompressionMode::BALANCED,  // Compression mode
    queryable: false,            // Add queryable markers
    validate: true               // Validate input data
);

$encoder->encode($data, $compress);           // Encode to ATON
$encoder->encodeWithQuery($data, $query);     // Encode with query filter
$encoder->estimateTokens($text);              // Estimate token count
$encoder->getCompressionStats($data);         // Get compression stats

Decoder Class

$decoder = new Decoder(validate: true);

$decoder->decode($atonString);  // Decode ATON to array

QueryEngine Class

$queryEngine = new QueryEngine();

$query = $queryEngine->parse($queryString);    // Parse query to AST
$results = $queryEngine->execute($data, $query); // Execute query

StreamEncoder Class

$streamEncoder = new StreamEncoder(
    chunkSize: 100,
    compression: CompressionMode::BALANCED
);

foreach ($streamEncoder->streamEncode($data, $tableName) as $chunk) {
    // Process chunk
}

Exceptions

use Aton\Exceptions\ATONException;
use Aton\Exceptions\ATONEncodingException;
use Aton\Exceptions\ATONDecodingException;
use Aton\Exceptions\ATONQueryException;

try {
    $aton = ATON::encode($data);
} catch (ATONEncodingException $e) {
    echo "Encoding error: " . $e->getMessage();
}

ATON Format Specification

Basic Structure

@dict[#0:"repeated string", #1:"another string"]
@schema[field1:type1, field2:type2, ...]
@defaults[field1:value1, field2:value2, ...]

entityName(count):
  value1, value2, ...
  value1, value2, ...

Supported Types

Type	Description	Example
`int`	Integer	`42`
`float`	Floating point	`3.14`
`str`	String	`"hello"`
`bool`	Boolean	`true`, `false`
`null`	Null value	`null`
`array`	Array	`[1,2,3]`
`object`	Object	`{key:value}`

Performance

Dataset	JSON Tokens	ATON Tokens	Reduction
Employee Records (1K)	12,450	5,280	57.6%
Product Catalog (10K)	145,200	64,800	55.4%
Transaction Log (100K)	1,856,000	815,000	56.1%

Requirements

PHP 8.0 or higher

License

MIT License - see LICENSE for details.

Author

Stefano D'Agostino

GitHub: @dagoSte
Email: dago.stefano@gmail.com

dagost / aton-format

Maintainers

Details