r4j4h / php-druid-ingest
PHP Classes to wrap ingestion of data into Druid
Requires
- php: >=5.3.0
- guzzle/guzzle: ~3.9
- psr/log: 1.0.0
- r4j4h/php-druid-query: 1.0.x-dev
- symfony/console: 2.5.4
Requires (Dev)
- phing/phing: 2.*
- phploc/phploc: ~2.0
- phpmd/phpmd: 2.1.1
- phpunit/phpunit: 4.2.*
This package is not auto-updated.
Last update: 2024-12-17 02:04:32 UTC
README
Experimental PHP wrapper around ingesting data from a variety of data sources into Druid as a data source.
Overview
The wrapper lives in the namespace PhpDruidIngest
.
Classes provide for the various tasks related to the extraction, transformation, and loading of data from other sources into druid. This involves:
- the extraction, transformation, and loading of that data
- preparing it in a place and format ready for Druid
- the generation of a compatible Druid indexing task
- the execution of the indexing task
- usage of the returned job id for monitoring of the indexing task job, and
- removal of prepared ingestion data after Druid has finished ingesting the file.
When executed, this will need to live on the Druid node that will ingest the data, using LocalFilePreparer
.
Otherwise it will need a way to move or stream the file from itself to the destination Node (say via scp
).
RemoteSCPPreparer
is an initial stab at this.
Design
- Instantiate a
IFetcher
, configured to fetch the desired records for the desired time periods. - Instantiate a
IPreparer
to record the results in memory or in file, tranfser results to destination, returning the destination path - Instantiate a
IDruidQueryParameters
, configured with parameters. <-- was dimension definition, now index task params + path from IFetcher - Instantiate a
IDruidQueryExecutor
, configured to hit a Druid endpoint. - Instantiate a
IDruidQueryGenerator
. <-- index task generator - Instantiate a
IDruidQueryResponseHandler
. <-- gets the task id - Run the
IDruidQueryExecutor
'sexecuteQuery
function with the IDruidQuery, getting the result. - Hand the resulting task id to
IDruidJobWatcher
who polls until task succeeds or finishes IPreparer
then cleans up left over ingestion file
Fetchers are the most interesting element in play here. By adding new Fetchers we can support new input sources.
Initially we are using mysqli
to handle fetching from MySQL databases. Fetching from HTTP endpoints, or a log, or running
map/reduce or storm and getting the results results are all good ideas for other fetchers.
Please refer to this diagram for an overview of how this works underneath the hood:
(From this Dynamic LucidChart Source URL)
How to Test
From the root directory, in a command terminal run: php vendor/bin/phpunit tests
or more preferably php vendor/bin/phing
.
Generate Documentation
From the root directory, in a command terminal run: php vendor/bin/phing docs
.
How to Install
Right now, there is no tagged version. To be ready for it when it comes, branch-aliases are in place.
- Stable branch:
~1.0@dev
- Stable branch w/ PHP 5.3 Compatibility Support:
dev-php-53-compat
- Cutting edge:
~1.1@dev
To install, it is suggested to use Composer. If you have it installed, then the following instructions in a composer.json should be all you need to get started:
If you are using PHP 5.3, there is a bug and you will need to use an alternative branch.
Up to date PHP:
{ "repositories": [ { "type": "vcs", "url": "git@github.com:r4j4h/php-druid-ingest" } ], "require": { "r4j4h/php-druid-ingest": "~1.0@dev" } }
PHP 5.3 Compatibility:
{ "repositories": [ { "type": "vcs", "url": "git@github.com:r4j4h/php-druid-ingest" } ], "require": { "r4j4h/php-druid-ingest": "dev-php-53-compat" } }
Once that is in, composer install
and composer update
should work.
Once those are run, require Composer's autoloader and you are off to the races:
require 'vendor/autoload.php';