arnapou / json-parser
Library - JSON stream parser and writer, modern, easy to use, without dependencies.
Requires
- php: ~8.3.0 || ~8.4.0
- arnapou/stream: ^1.3
Requires (Dev)
- friendsofphp/php-cs-fixer: ^3.52
- phpstan/extension-installer: ^1.3
- phpstan/phpstan: ^2.0
- phpstan/phpstan-deprecation-rules: ^2.0
- phpstan/phpstan-phpunit: ^2.0
- phpstan/phpstan-strict-rules: ^2.0
- phpunit/php-code-coverage: ^11.0
- phpunit/phpunit: ^11.0
README
This library allow you to READ or WRITE json as a stream.
This was built upon the RFC-8259, and it has no dependencies.
Installation
composer require arnapou/json-parser
packagist ๐๏ธ arnapou/json-parser
Examples
Look at the ๐ example folder for more.
Indent a file on the fly.
$reader = new Arnapou\Json\JsonReader(
input: new Arnapou\Stream\Input\FileInput($input_filename),
visitor: new Arnapou\Json\Visitor\WhitespacesVisitor(
pretty: Arnapou\Json\Core\Pretty::Indented,
output: new Arnapou\Stream\Output\FileOutput($output_filename),
)
);
$reader->read();
Iterate over the 2nd level of a json.
$reader = new Arnapou\Json\Iterator\JsonLeafIterator(
new Arnapou\Stream\Input\FileInput($input_filename),
maxDepth: 2
);
foreach ($reader as $node) {
// This is an Arnapou\Json\JsonNode\ValueNode with properties :
// - parents
// - depth
// - key
// - value (json decode of the leaf)
}
When it is worth to use this library
- need of very small memory footprint
- a stream of documents (several in the same "body")
- one input stream for several output
- or more dynamic architecture of multiple inputs, visitors, outputs
Performance
โ ๏ธ This is important to remind that this library is slow compared to native json_encode and json_decode !
๐๏ธ The main goal is to stream in order to have a very small memory footprint.
Metrics done with an Intelยฎ Coreโข i7-10510U CPU @ 1.80GHz ร 8
:
Read | Json size | Time | Memory | Byte rate | JIT | JIT boost |
---|---|---|---|---|---|---|
JsonReader | 100 MB | 12.66 sec | 4 MB | 7.9 MB/s | โ | |
JsonReader | 100 MB | 10.64 sec | 4 MB | 9.4 MB/s | โ | +19% |
JsonLeafIterator | 100 MB | 15.15 sec | 4 MB | 6.6 MB/s | โ | |
JsonLeafIterator | 100 MB | 10.42 sec | 4 MB | 9.6 MB/s | โ | +45% |
json_decode() | 100 MB | 0.75 sec | 310 MB | 133.7 MB/s | โ | |
json_decode() | 100 MB | 0.75 sec | 310 MB | 133.9 MB/s | โ | +0% |
Write | Json size | Time | Memory | Byte rate | JIT | JIT boost |
---|---|---|---|---|---|---|
JsonWriter | 100 MB | 1.23 sec | 4 MB | 81 MB/s | โ | |
JsonWriter | 100 MB | 0.80 sec | 4 MB | 124 MB/s | โ | +50% |
json_encode() | 100 MB | 0.47 sec | 420 MB | 210 MB/s | โ | |
json_encode() | 100 MB | 0.45 sec | 420 MB | 220 MB/s | โ | +5% |
Example to test with JIT :
php -d opcache.enable_cli=1 -d opcache.jit_buffer_size=256m example/bandwidth_reader.php
php -d opcache.enable_cli=1 -d opcache.jit_buffer_size=256m example/bandwidth_writer.php
This globally a tradeoff between CPU and memory.
I/O considerations : relative to network speed on an internet web server, the byte rate of JsonReader may be not so bad (I worked in a company where the managed internet gateway of our web SaaS infrastructure was in average 10 MB/s).
OOP
This lib use some patterns : visitor, decorator, adapter, iterator.
The code is highly decoupled and simple by design. But you may need to fully understand these patterns to make fun things with all the stuff here.
Main interfaces
Input (from arnapou/stream
)
The stream you "read".
namespace Arnapou\Stream\Input;
interface Input
{
public function open(): void;
public function read(): string;
public function close(): void;
}
Output (from arnapou/stream
)
The stream you "write".
namespace Arnapou\Stream\Output;
interface Output
{
public function write(string $data): void;
}
Visitor
The object to inject into the reader to watch the stream.
namespace Arnapou\Json\Core;
use Arnapou\Json\JsonNode\Key\{ArrayKeyNode, ObjectKeyNode};
use Arnapou\Json\JsonNode\Nested\{ArrayNode, ObjectNode};
use Arnapou\Json\JsonNode\Scalar\{LiteralNode, NumberNode, StringNode};
use Arnapou\Json\JsonNode\Structure\{StructureCharacterNode, WhitespaceNode};
interface Visitor
{
public function beginNode(ObjectNode|ArrayNode $node): void;
public function endNode(ObjectNode|ArrayNode $node): void;
public function enterStructure(WhitespaceNode|StructureCharacterNode $node): void;
public function enterKey(ObjectKeyNode|ArrayKeyNode $node): void;
public function enterValue(NumberNode|StringNode|LiteralNode $node): void;
}
Don't forget to be from the point of view of json-parsing.
Main concrete classes
JsonReader
Parse the Input
stream and calls Visitor
methods.
$input = new Arnapou\Stream\Input\StringInput('{"id": 42, "text": "Hello World"}');
$visitor = new FullDecodeVisitor();
$reader = new JsonReader($input, $visitor);
$reader->read();
print_r($visitor->getDecoded());
JsonWriter
Write data to an Output
(obviously, for a stream, make use of generators).
$output = new EchoOutput();
$writer = new JsonWriter($output);
$writer->writeValue(
[
'id' => 42,
'text' => 'Hello World',
]
);
JsonStreamUtils
Simple static functions for very simple use cases.
Arnapou\Json\JsonStreamUtils::pretty(
new Arnapou\Stream\Input\FileInput($input_filename),
new Arnapou\Stream\Output\FileOutput($output_filename)
);
Iterators
They use php Fibers to adapt a visitor to an iterator pattern.
This cause a small lack of performance (without JIT) to the price of ease.
JsonLeafIterator
Utility to iterate the "leaves" nodes of the Input
stream with a simple foreach
.
To iterate over the leaves, you have to give a "max depth". The nodes deeper are decoded as array values.
This use an abstract LeafVisitor class which forces to implement these methods :
abstract class LeafVisitor implements Visitor
{
public function enterLeaf(ValueNode $node): void;
}
JsonDecodeIterator
Utility to iterate filtered nodes of the Input
stream with a simple foreach
.
To iterate over the leaves, you have to give a ShouldDecodeCallback. This select nodes which should be decoded regardless of the depth.
This use an abstract DecodeVisitor class which forces to implement these methods :
abstract class DecodeVisitor implements Visitor
{
protected function shouldDecode(ObjectNode|ArrayNode|LiteralNode|NumberNode|StringNode $node): bool;
protected function isDecoded(ValueNode $node): void;
}
Nodes
Bellow the inheritance tree, ๐ถ is interface, ๐ฆ is concrete :
- ๐ถ JsonNode
- ๐ฆ ValueNode
- ๐ถ KeyNode
- ๐ฆ ArrayKeyNode
- ๐ฆ ObjectKeyNode
- ๐ถ NestedNode
- ๐ฆ ArrayNode
- ๐ฆ ObjectNode
- ๐ถ ScalarNode
- ๐ฆ NumberNode
- ๐ฆ StringNode
- ๐ฆ LiteralNode
- ๐ถ StructureNode
- ๐ฆ WhitespaceNode
- ๐ฆ StructureCharacterNode
All implementations of JsonNode are used inside the Visitor except ValueNode which is used by the ValueNodeIterator
Each "node" carry its context :
$node->parents
: array of parent keys$node->depth
: level of depth of the node$node->key
: the current key$node->fullPath()
: return a string representation of the full path (ex:items.3.name
)
Limitations
Your mind.
You can make silly things mixing Input, Output, Visitor.
Example :
- an
Input
which- send the stream to a
JsonReader
- write the raw stream in parallel to an
Output 1
- send the stream to a
- the
JsonReader
has aMultipleVisitor
which contains- a
BandwidthVisitor
to gather metrics about the stream - a
WhitespacesVisitor
to pretty print into anOutput 2
- a
LeafVisitor
implementation to extract specific nodes
- a
If you ask how I got a few metrics about the speed of my parser,
look at the Bandwidth
interface, the RepeatInput
, etc ... ๐
Php versions
Date | Ref | 8.4 | 8.3 | 8.2 |
---|---|---|---|---|
25/11/2024 | 2.4.x, main | ร | ร | |
25/11/2023 | 2.0 - 2.3 | ร | ||
07/03/2023 | 1.x | ร |