arnapou/json-parser

Library - JSON stream parser and writer, modern, easy to use, without dependencies.

v2.3 2024-04-14 15:42 UTC

This package is auto-updated.

Last update: 2024-04-14 13:43:54 UTC


README

pipeline coverage

This library allow you to READ or WRITE json as a stream.

This was built upon the RFC-8259, and it has no dependencies.

Installation

composer require arnapou/json-parser

packagist 👉️ arnapou/json-parser

Examples

Look at the 📁 example folder for more.

Indent a file on the fly.

$reader = new Arnapou\Json\JsonReader(
    input: new Arnapou\Stream\Input\FileInput($input_filename),
    visitor: new Arnapou\Json\Visitor\WhitespacesVisitor(
        pretty: Arnapou\Json\Core\Pretty::Indented,
        output: new Arnapou\Stream\Output\FileOutput($output_filename),
    )
);
$reader->read();

Iterate over the 2nd level of a json.

$reader = new Arnapou\Json\Iterator\JsonLeafIterator(
    new Arnapou\Stream\Input\FileInput($input_filename),
    maxDepth: 2
);

foreach ($reader as $node) {
    // This is an Arnapou\Json\JsonNode\ValueNode with properties :
    // - parents
    // - depth
    // - key
    // - value (json decode of the leaf)
}

When it is worth to use this library

  • need of very small memory footprint
  • a stream of documents (several in the same "body")
  • one input stream for several output
  • or more dynamic architecture of multiple inputs, visitors, outputs

Performance

⚠️ This is important to remind that this library is slow compared to native json_encode and json_decode !

👉️ The main goal is to stream in order to have a very small memory footprint.

Metrics done with an Intel® Core™ i7-10510U CPU @ 1.80GHz × 8 :

ReadJson sizeTimeMemoryByte rateJITJIT boost
JsonReader100 MB12.66 sec4 MB7.9 MB/s
JsonReader100 MB10.64 sec4 MB9.4 MB/s+19%
JsonLeafIterator100 MB15.15 sec4 MB6.6 MB/s
JsonLeafIterator100 MB10.42 sec4 MB9.6 MB/s+45%
json_decode()100 MB0.75 sec310 MB133.7 MB/s
json_decode()100 MB0.75 sec310 MB133.9 MB/s+0%
WriteJson sizeTimeMemoryByte rateJITJIT boost
JsonWriter100 MB1.23 sec4 MB81 MB/s
JsonWriter100 MB0.80 sec4 MB124 MB/s+50%
json_encode()100 MB0.47 sec420 MB210 MB/s
json_encode()100 MB0.45 sec420 MB220 MB/s+5%

Example to test with JIT :

php -d opcache.enable_cli=1 -d opcache.jit_buffer_size=256m example/bandwidth_reader.php
php -d opcache.enable_cli=1 -d opcache.jit_buffer_size=256m example/bandwidth_writer.php

This globally a tradeoff between CPU and memory.

I/O considerations : relative to network speed on an internet web server, the byte rate of JsonReader may be not so bad (I worked in a company where the managed internet gateway of our web SaaS infrastructure was in average 10 MB/s).

OOP

This lib use some patterns : visitor, decorator, adapter, iterator.

The code is highly decoupled and simple by design. But you may need to fully understand these patterns to make fun things with all the stuff here.

Main interfaces

Input (from arnapou/stream)

The stream you "read".

namespace Arnapou\Stream\Input;

interface Input
{
    public function open(): void;
    public function read(): string;
    public function close(): void;
}

Output (from arnapou/stream)

The stream you "write".

namespace Arnapou\Stream\Output;

interface Output
{
    public function write(string $data): void;
}

Visitor

The object to inject into the reader to watch the stream.

namespace Arnapou\Json\Core;

use Arnapou\Json\JsonNode\Key\{ArrayKeyNode, ObjectKeyNode};
use Arnapou\Json\JsonNode\Nested\{ArrayNode, ObjectNode};
use Arnapou\Json\JsonNode\Scalar\{LiteralNode, NumberNode, StringNode};
use Arnapou\Json\JsonNode\Structure\{StructureCharacterNode, WhitespaceNode};

interface Visitor
{
    public function beginNode(ObjectNode|ArrayNode $node): void;
    public function endNode(ObjectNode|ArrayNode $node): void;
    public function enterStructure(WhitespaceNode|StructureCharacterNode $node): void;
    public function enterKey(ObjectKeyNode|ArrayKeyNode $node): void;
    public function enterValue(NumberNode|StringNode|LiteralNode $node): void;
}

Don't forget to be from the point of view of json-parsing.

Main concrete classes

JsonReader

Parse the Input stream and calls Visitor methods.

$input = new Arnapou\Stream\Input\StringInput('{"id": 42, "text": "Hello World"}');
$visitor = new FullDecodeVisitor();

$reader = new JsonReader($input, $visitor);
$reader->read();

print_r($visitor->getDecoded());

JsonWriter

Write data to an Output (obviously, for a stream, make use of generators).

$output = new EchoOutput();

$writer = new JsonWriter($output);
$writer->writeValue(
    [
        'id' => 42,
        'text' => 'Hello World',
    ]
);

JsonStreamUtils

Simple static functions for very simple use cases.

Arnapou\Json\JsonStreamUtils::pretty(
    new Arnapou\Stream\Input\FileInput($input_filename),
    new Arnapou\Stream\Output\FileOutput($output_filename)
);

Iterators

They use php Fibers to adapt a visitor to an iterator pattern.

This cause a small lack of performance (without JIT) to the price of ease.

JsonLeafIterator

Utility to iterate the "leaves" nodes of the Input stream with a simple foreach.

To iterate over the leaves, you have to give a "max depth". The nodes deeper are decoded as array values.

This use an abstract LeafVisitor class which forces to implement these methods :

abstract class LeafVisitor implements Visitor
{
  public function enterLeaf(ValueNode $node): void;
}

JsonDecodeIterator

Utility to iterate filtered nodes of the Input stream with a simple foreach.

To iterate over the leaves, you have to give a ShouldDecodeCallback. This select nodes which should be decoded regardless of the depth.

This use an abstract DecodeVisitor class which forces to implement these methods :

abstract class DecodeVisitor implements Visitor
{
  protected function shouldDecode(ObjectNode|ArrayNode|LiteralNode|NumberNode|StringNode $node): bool;
    
  protected function isDecoded(ValueNode $node): void;
}

Nodes

Bellow the inheritance tree, 🔶 is interface, 🔷 is concrete :

All implementations of JsonNode are used inside the Visitor except ValueNode which is used by the ValueNodeIterator

Each "node" carry its context :

  • $node->parents : array of parent keys
  • $node->depth : level of depth of the node
  • $node->key : the current key
  • $node->fullPath() : return a string representation of the full path (ex: items.3.name)

Limitations

Your mind.

You can make silly things mixing Input, Output, Visitor.

Example :

  • an Input which
    • send the stream to a JsonReader
    • write the raw stream in parallel to an Output 1
  • the JsonReader has a MultipleVisitor which contains
    • a BandwidthVisitor to gather metrics about the stream
    • a WhitespacesVisitor to pretty print into an Output 2
    • a LeafVisitor implementation to extract specific nodes

If you ask how I got a few metrics about the speed of my parser, look at the Bandwidth interface, the RepeatInput, etc ... 🙂

Changelog versions

StartTag, BranchPhp
25/11/20232.x, main8.3
07/03/20231.x8.2