maxakawizard / json-collection-parser
Streaming parser for large JSON files containing array of objects
Installs: 629 528
Dependents: 10
Suggesters: 0
Security: 0
Stars: 130
Watchers: 7
Forks: 19
Open Issues: 1
Requires
- php: >=7.1
- psr/http-message: ~1.0|~2.0
- salsify/json-streaming-parser: ^8.0.2
Requires (Dev)
- mockery/mockery: ^1.3
- phpunit/phpunit: >6 <10
- squizlabs/php_codesniffer: ~3.0
Suggests
- ext-zlib: Needed to support GZIP-compressed files
README
Event-based parser for large JSON collections (consumes small amount of memory). Built on top of JSON Streaming Parser
This package is compliant with PSR-4 and PSR-12 code styles and supports parsing of PSR-7 message interfaces. If you notice compliance oversights, please send a patch via pull request.
Installation
You will need Composer to install the package
composer require maxakawizard/json-collection-parser:~1.0
Input data format
Data must be in one of following formats:
Array of objects (valid JSON)
[ { "id": 78, "title": "Title", "dealType": "sale", "propertyType": "townhouse", "properties": { "bedroomsCount": 6, "parking": "yes" }, "photos": [ "1.jpg", "2.jpg" ], "agents": [ { "name": "Joe", "email": "joe@realestate.email" }, { "name": "Sally", "email": "sally@realestate.email" } ] }, { "id": 729, "dealType": "rent_long", "propertyType": "villa" }, { "id": 5165, "dealType": "rent_short", "propertyType": "villa" } ]
Sequence of object literals:
{
"id": 78,
"dealType": "sale",
"propertyType": "townhouse"
}
{
"id": 729,
"dealType": "rent_long",
"propertyType": "villa"
}
{
"id": 5165,
"dealType": "rent_short",
"propertyType": "villa"
}
Sequence of object and array literals:
[[{
"id": 78,
"dealType": "sale",
"propertyType": "townhouse"
}]]
{
"id": 729,
"dealType": "rent_long",
"propertyType": "villa"
}
[{
"id": 5165,
"dealType": "rent_short",
"propertyType": "villa"
}]
Sequence of object and array literals (some of objects in subarrays, comma-separated):
[
{
"id": 78,
"dealType": "sale",
"propertyType": "townhouse"
},
{
"id": 729,
"dealType": "rent_long",
"propertyType": "villa"
}
]
{
"id": 5165,
"dealType": "rent_short",
"propertyType": "villa"
}
Usage
Function as callback:
function processItem(array $item) { is_array($item); //true print_r($item); } $parser = new \JsonCollectionParser\Parser(); $parser->parse('/path/to/file.json', 'processItem');
Closure as callback:
$items = []; $parser = new \JsonCollectionParser\Parser(); $parser->parse('/path/to/file.json', function (array $item) use (&$items) { $items[] = $item; });
Static method as callback:
class ItemProcessor { public static function process(array $item) { is_array($item); //true print_r($item); } } $parser = new \JsonCollectionParser\Parser(); $parser->parse('/path/to/file.json', ['ItemProcessor', 'process']);
Instance method as callback:
class ItemProcessor { public function process(array $item) { is_array($item); //true print_r($item); } } $parser = new \JsonCollectionParser\Parser(); $processor = new \ItemProcessor(); $parser->parse('/path/to/file.json', [$processor, 'process']);
Receive items as objects:
function processItem(\stdClass $item) { is_array($item); //false is_object($item); //true print_r($item); } $parser = new \JsonCollectionParser\Parser(); $parser->parseAsObjects('/path/to/file.json', 'processItem');
Receive chunks of items as arrays:
function processChunk(array $chunk) { is_array($chunk); //true count($chunk) === 5; //true foreach ($chunk as $item) { is_array($item); //true is_object($item); //false print_r($item); } } $parser = new \JsonCollectionParser\Parser(); $parser->chunk('/path/to/file.json', 'processChunk', 5);
Receive chunks of items as objects:
function processChunk(array $chunk) { is_array($chunk); //true count($chunk) === 5; //true foreach ($chunk as $item) { is_array($item); //false is_object($item); //true print_r($item); } } $parser = new \JsonCollectionParser\Parser(); $parser->chunkAsObjects('/path/to/file.json', 'processChunk', 5);
Pass stream as parser input:
$stream = fopen('/path/to/file.json', 'r'); $parser = new \JsonCollectionParser\Parser(); $parser->parseAsObjects($stream, 'processItem');
Pass PSR-7 MessageInterface as parser input:
use Psr\Http\Message\MessageInterface; /** @var MessageInterface $resource */ $resource = $httpClient->get('https://httpbin.org/get'); $parser = new \JsonCollectionParser\Parser(); $parser->parseAsObjects($resource, 'processItem');
Pass PSR-7 StreamInterface as parser input:
use Psr\Http\Message\MessageInterface; /** @var MessageInterface $resource */ $resource = $httpClient->get('https://httpbin.org/get'); $parser = new \JsonCollectionParser\Parser(); $parser->parseAsObjects($resource->getBody(), 'processItem');
Supported formats
.json
- raw JSON.gz
- GZIP-compressed JSON (you will needzlib
PHP extension installed)
Supported sources
- file
- string
- stream / resource
- HTTP message interface PSR-7
Running tests
composer test
License
This library is released under MIT license.