dagost / aton-format
ATON - Adaptive Token-Oriented Notation. A token-efficient data serialization format for LLM applications.
Installs: 0
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
pkg:composer/dagost/aton-format
Requires
- php: >=8.0
Requires (Dev)
- phpunit/phpunit: ^10.0
This package is not auto-updated.
Last update: 2025-11-25 11:44:21 UTC
README
ATON is a token-efficient data serialization format designed specifically for LLM applications. It reduces token usage by up to 55% compared to JSON while maintaining perfect data fidelity.
V2 Features
- Compression Modes: FAST, BALANCED, ULTRA, ADAPTIVE
- Query Language: SQL-like syntax with full AST parser
- Streaming Encoder: Process large datasets in chunks
- Dictionary Compression: Automatic string deduplication
- Full PHP 8 Support: Enums, named arguments, typed properties
- Zero Dependencies: Lightweight and fast
Installation
composer require dagost/aton-format
Quick Start
<?php use Aton\ATON; use Aton\Enums\CompressionMode; // Simple encode/decode $data = [ 'employees' => [ ['id' => 1, 'name' => 'Alice', 'role' => 'Engineer', 'active' => true], ['id' => 2, 'name' => 'Bob', 'role' => 'Designer', 'active' => true], ['id' => 3, 'name' => 'Carol', 'role' => 'Manager', 'active' => true], ] ]; $atonText = ATON::encode($data); echo $atonText; // Output: // @schema[id:int, name:str, role:str, active:bool] // @defaults[active:true] // // employees(3): // 1, "Alice", "Engineer" // 2, "Bob", "Designer" // 3, "Carol", "Manager" // Decode back $original = ATON::decode($atonText);
Compression Modes
use Aton\Encoder; use Aton\Enums\CompressionMode; // Fast: No dictionary compression, fastest encoding $fast = new Encoder(compression: CompressionMode::FAST); // Balanced: Good compression with reasonable speed (default) $balanced = new Encoder(compression: CompressionMode::BALANCED); // Ultra: Maximum compression, best for large datasets $ultra = new Encoder(compression: CompressionMode::ULTRA); // Adaptive: Automatically selects mode based on data size $adaptive = new Encoder(compression: CompressionMode::ADAPTIVE);
Query Language
ATON supports SQL-like queries for filtering data:
use Aton\ATON; use Aton\QueryEngine; $data = [ 'products' => [ ['id' => 1, 'name' => 'Laptop', 'price' => 999, 'category' => 'Electronics'], ['id' => 2, 'name' => 'Mouse', 'price' => 29, 'category' => 'Electronics'], ['id' => 3, 'name' => 'Desk', 'price' => 299, 'category' => 'Furniture'], ] ]; // Parse and execute query $queryEngine = ATON::createQueryEngine(); $query = $queryEngine->parse("products WHERE price > 100 ORDER BY price DESC LIMIT 10"); $results = $queryEngine->execute($data, $query); // Or encode with query directly $filteredAton = ATON::encodeWithQuery($data, "products WHERE category = 'Electronics'");
Query Syntax
-- Basic filtering products WHERE price > 100 -- Multiple conditions products WHERE price > 100 AND category = 'Electronics' -- OR conditions products WHERE category = 'Electronics' OR category = 'Furniture' -- IN operator products WHERE category IN ('Electronics', 'Furniture') -- LIKE operator (pattern matching) products WHERE name LIKE '%Laptop%' -- BETWEEN products WHERE price BETWEEN 100 AND 500 -- Sorting and pagination products WHERE active = true ORDER BY price DESC LIMIT 10 OFFSET 5 -- Select specific fields products SELECT id, name WHERE price > 100
Streaming Encoder
For large datasets, use the streaming encoder:
use Aton\StreamEncoder; use Aton\Enums\CompressionMode; $streamEncoder = new StreamEncoder( chunkSize: 100, compression: CompressionMode::BALANCED ); $largeData = [ 'records' => array_map( fn($i) => ['id' => $i, 'name' => "Record $i", 'value' => rand()], range(1, 10000) ) ]; // Process in chunks foreach ($streamEncoder->streamEncode($largeData) as $chunk) { echo "Chunk {$chunk['chunkId']}/{$chunk['totalChunks']}\n"; echo "Progress: " . ($chunk['metadata']['progress'] * 100) . "%\n"; // Process chunk data sendToAPI($chunk['data']); }
Compression Statistics
use Aton\ATON; $stats = ATON::getCompressionStats($data); echo "Original tokens: {$stats['originalTokens']}\n"; echo "Compressed tokens: {$stats['compressedTokens']}\n"; echo "Savings: {$stats['savingsPercent']}%\n"; echo "Compression ratio: {$stats['compressionRatio']}\n";
API Reference
ATON Facade
ATON::encode(array $data, bool $compress = true, CompressionMode $compression = CompressionMode::BALANCED): string ATON::decode(string $atonString): array ATON::encodeWithQuery(array $data, string $queryString): string ATON::getCompressionStats(array $data, CompressionMode $compression = CompressionMode::BALANCED): array ATON::createEncoder(...): Encoder ATON::createDecoder(...): Decoder ATON::createStreamEncoder(...): StreamEncoder ATON::createQueryEngine(): QueryEngine
Encoder Class
$encoder = new Encoder( optimize: true, // Enable schema and defaults optimization compression: CompressionMode::BALANCED, // Compression mode queryable: false, // Add queryable markers validate: true // Validate input data ); $encoder->encode($data, $compress); // Encode to ATON $encoder->encodeWithQuery($data, $query); // Encode with query filter $encoder->estimateTokens($text); // Estimate token count $encoder->getCompressionStats($data); // Get compression stats
Decoder Class
$decoder = new Decoder(validate: true); $decoder->decode($atonString); // Decode ATON to array
QueryEngine Class
$queryEngine = new QueryEngine(); $query = $queryEngine->parse($queryString); // Parse query to AST $results = $queryEngine->execute($data, $query); // Execute query
StreamEncoder Class
$streamEncoder = new StreamEncoder( chunkSize: 100, compression: CompressionMode::BALANCED ); foreach ($streamEncoder->streamEncode($data, $tableName) as $chunk) { // Process chunk }
Exceptions
use Aton\Exceptions\ATONException; use Aton\Exceptions\ATONEncodingException; use Aton\Exceptions\ATONDecodingException; use Aton\Exceptions\ATONQueryException; try { $aton = ATON::encode($data); } catch (ATONEncodingException $e) { echo "Encoding error: " . $e->getMessage(); }
ATON Format Specification
Basic Structure
@dict[#0:"repeated string", #1:"another string"]
@schema[field1:type1, field2:type2, ...]
@defaults[field1:value1, field2:value2, ...]
entityName(count):
value1, value2, ...
value1, value2, ...
Supported Types
| Type | Description | Example |
|---|---|---|
int |
Integer | 42 |
float |
Floating point | 3.14 |
str |
String | "hello" |
bool |
Boolean | true, false |
null |
Null value | null |
array |
Array | [1,2,3] |
object |
Object | {key:value} |
Performance
| Dataset | JSON Tokens | ATON Tokens | Reduction |
|---|---|---|---|
| Employee Records (1K) | 12,450 | 5,280 | 57.6% |
| Product Catalog (10K) | 145,200 | 64,800 | 55.4% |
| Transaction Log (100K) | 1,856,000 | 815,000 | 56.1% |
Requirements
- PHP 8.0 or higher
Links
License
MIT License - see LICENSE for details.
Author
Stefano D'Agostino
- GitHub: @dagoSte
- Email: dago.stefano@gmail.com