klkvsk / json-decode-stream
JSON streaming reader
Installs: 41 273
Dependents: 1
Suggesters: 0
Security: 0
Stars: 4
Watchers: 2
Forks: 2
Open Issues: 3
Requires
- php: >=7.1
- ext-json: *
Requires (Dev)
- nyholm/psr7: ^1.3
- phpunit/phpunit: *
- psr/http-message: ^1.0
- vimeo/psalm: *
README
This is a JSON parsing library that allows parsing stream of JSON data. You can process JSON records on the fly without decoding complete structure into memory first. This is especially useful when parsing large JSON files.
Installation
composer require klkvsk/json-decode-stream
Basic usage
In most cases, streaming parser is used to parse lists of repeated objects. For example, here is a list of users:
{ "users": [ { "name": "Alice", "age": 20 }, { "name": "Bob", "age": 30 } ] }
To iterate over each user and print their name, use:
$parser = \JsonDecodeStream\Parser::fromFile("users.json"); foreach ($parser->items("users[]") as $user) { echo $user->name; } // or foreach ($parser->items("users[].name") as $name) { echo $name; }
Documentation
json-decode-stream uses layered generators to process data. There are 3 layers, and the processing goes as follows:
Tokenizer -(tokens)-> Parser -(events)-> Collector -(items)-> your code
- Tokens are parts of encoded JSON data: braces, comas, strings, numbers.
- Events are emitted based on sequence of incoming tokens: object started/ended, key specified, value, etc.
- Items are final parts of decoded JSON: scalar values, arrays and objects, that are matched by selectors.
Parser
class provides access to each layer directly,
but for most use cases only $parser->items($selector)
is needed.
Selectors
Selector is a string that specifies full path to collected JSON fields. Simple selectors are:
[]
- any element of an array or every key of an object[5]
- element of an array with index 5[10:15]
- any element of an array with index ranged from 10 to 15[5:]
- any element of an array starting from index 5[:5]
- any element of an array before and including index 5foo
- value of an object by key 'foo'["some long key with spaces or \"quotes\" in it"]
- also a value by key
Selectors can be nested:
users[]
would select each userresult.total
would select key "total" in "result" objectresult[]
would select value of "total" and any other fields of "result" objectusers[:2].name
would select names of first 3 users[].id
would select every ID in the top-level array of objects with "id" field
Reference
Parser
Constructors:
-
Parser::fromString($jsonString)
-
Parser::fromFile($filePath)
-
Parser::fromStream($resource)
Where
$resource
is any resource that supportsfread()
-
Parser::fromPsr7($stream)
Where
$stream
is any PSR-7 compliant StreamInterface, i.e.$psr7Request->getBody()
-
new Parser(new SourceBuffer(new SourceInterfaceImplementation()))
Use in other custom cases
Methods:
-
$parser->tokens(): Generator<Token>
Iterates over encoded JSON document, returning
Token
objects. -
$parser->events(): Generator<Event>
Iterates over
tokens()
, returningEvent
objects. -
$parser->items($selectors): Generator<scalar|object|array>
Iterates over
events()
, returning decoded JSON fields matched by selectorsWhere
$selectors
either- single selector string:
"result.users[]"
- coma-separated selector strings:
"result.total, result.users[]"
- array of selector strings:
[ "result.total", "result.users[]" ]
- custom CollectorInterface or array of them
null
to collect whole objects/arrays in JSON-sequences (separated with coma or/and newline in source)
String selectors are converted to
Collector
classes internally.For default Collector, iterated values have their full path in key, like
"result.users[4]" => ["num" => "Five", ..]
- single selector string:
Event
Methods:
-
$event->getId(): string
Enumeration:
Event::DOCUMENT_START
Event::DOCUMENT_END
Event::OBJECT_START
Event::OBJECT_END
Event::KEY
Event::VALUE
-
$event->getValue(): string|number|bool|null
- For
Event::VALUE
a corresponding value is returned. - For
Event::KEY
a string (field name of an object) is returned. - For other events null is returned.
- For
-
$event->getPath(): string
Full path to the currently parsed element.
-
$event->getDepth(): int
How many nested levels of JSON structure are we deep. Elements of top-level array/object have depth 1.
-
$event->matchPath(string $selector): bool
Checks if currenly parsed element's path is contained within selector.
-
$event->getLineNumber(): int
and$event->getCharNumber(): int
Returns currently parsed position inside decoded source.
Token
-
$token->getId(): string
Enumeration:
Token::OBJECT_START
Token::OBJECT_END
Token::ARRAY_START
Token::ARRAY_END
Token::KEY_DELIMITER
Token::COMA
Token::TRUE
Token::FALSE
Token::NULL
Token::STRING
Token::NUMBER
Token::WHITESPACE
-
$token->getValue(): string|number|bool|null
Returns corresponding value only for
STRING
,NUMBER
orWHITESPACE
tokens. -
$token->getLineNumber(): int
and$token->getCharNumber(): int
Returns currently parsed position inside decoded source.
Custom Collectors
CollectorInterface
defines only one method:
processEvent(Event $event)
Return an array of [ key, value ]
to be yielded from items()
when you need to emit an item.
Yield multiple [ key, value ]
s when you need to emit multiple items.
Otherwise, return null if you have nothing yet to yield.
Here is an example of custom Collector:
class AggregationCollector implements CollectorInterface { protected int $count; protected float $sum; public function processEvent(Event $event) { switch ($event->getId()) { case Event::DOCUMENT_START: $this->count = 0; $this->sum = 0; break; case Event::VALUE: if ($event->matchPath("games[].score")) { $this->sum += $event->getValue(); $this->count++; } break; case Event::DOCUMENT:END: yield [ 'count', $this->count ]; yield [ 'sum', $this->sum ]; yield [ 'avg', $this->count ? ($this->sum / $this->count) : 0 ]; break; } } } $aggregates = iterator_to_array($parser->items(new AggregationCollector())); var_dump($aggregates); // [ 'count' => 10, 'sum' => 50, 'avg' => 5 ]
Dependencies
There are no external dependencies except ext-json
,
which is normally comes with every PHP distribution.
Default json_decode
is used to parse single JSON strings when Parser finds them.
This is faster and more error-proof than writing own JSON string parser/validator.
Testing
This lib is heavily covered with unit tests and CI-tested under all versions of PHP since 7.1.
To run tests, install via composer with --dev
and run
$ vendor/bin/phpunit
License
This code is distributed under MIT license.