epocsquadron/gather-content-streaming-client

Get large datasets from GatherContent API without memory overload.

0.2.0 2019-07-09 20:43 UTC

This package is not auto-updated.

Last update: 2024-05-23 18:34:03 UTC


README

A modern API client for GatherContent. It uses Guzzle's streaming capabilities to parse collections ("Items", "Projects", etc) on-the-fly, providing the ability to act on large datasets.

Features

  • Process items during streaming
  • Limit/paginate results
  • Asynchronous support
  • Light on dependencies (and you probably already have Guzzle)

Motivation

The GatherContent API does not provide a lot of options internally to filter result sets. In fact, if you wish to get a list of content items, you will be stuck with having to request a verbose result set describing absolutely all items in your project. You are not even given the ability to filter the response schema down to just the fields you want (such as id, if you are preparing for a subsequent call for each item's details). This library was born out of a need to be able to parse the API response more efficiently and with better control over the returned items.

Design Philosophy

I attempted to follow patterns that would be familiar to users of many of the popular frameworks, especially Laravel. Queries to the API can have constraints built up through method chaining, much like using a SQL query builder in a framework. Once conditions are built up, a call to a method that actually performs the query is needed. The easy method for this is get, in direct analogy to those aforementioned query builders. However, this turns a stream into an in-memory representation, so we provide an each method which takes callbacks that can be applied every time a full item in the collection is parsed, and optionally at the end.

The combination of filtering applied on the stream and the ability to provide a callback for per-item processing allows the memory footprint to remain essentially static when dealing with very large result sets. Beyond that, I've attempted to make calls that get the details of a specific item return a more sane result schema than the default provided by the raw response.

State of the Project

As this was developed out of neccessity specifically for pulling items from a folder, I have not provided all of the possible filter types that one might want, nor have I even provided query builders for projects, folders, or files. The established interfaces should be well enough designed to generalize to all of those endpoints and filters should be fairly easy to add by copying the ItemsQueryBuilder::folder method. If you have a need for this streaming capability but it doesn't provide the functionality you need, I encourage you to put together a pull request.

Contributing

This project is open to pull requests. Keep to PSR-2, which should be default for you already if you're using composer packages, and try to provide explanatory comments. Other than that, I'm not picky and don't require any sort of contributor agreement. Just open a pull request against the develop branch on Gitlab.

License & Copyright

Copyright 2019 Daniel Poulin epocsquadron@protonmail.com

This library is distributed under the MPL v2 license, whose full text is included in the LICENSE.md file in this repository.

One file in this work is modified from the maxakawizard/JsonCollectionParser library and is borrowed under the MIT license and marked with appropriate attribution within the file's comments.