fabiomez/data-extractor

Library for data extraction from common resources like string or a CSV row from files

1.0.0 2021-03-25 16:43 UTC

This package is auto-updated.

Last update: 2024-08-26 01:09:46 UTC


README

Why

Everytime that I had to parse a file like a CNAB, EDI or any CSV file, the process became very similar and always returning an array that I could not ever trust on what kind of index they will bring back. Some times breaking the application because some data did not come. So, I decided to write this library that works with objects like models for extracted data. The way the data will be extracted is wrote on models attributes docblock, and this will grant to developers an auto-complete on models attributes with a proper description on how the extraction was made, and optionaly, a descrition of the meaning of this data. What I really missed on arrays.

Instalation

Execute composer require fabiomez/data-extractor

Or add fabiomez/data-extractor: "*" to required section of your composer.json file.

Usage

The Data Extractor works above models attributes. A model can be any class with public attributes that have the @stractable tag on its docblock.

At version 1, Data extractor brings 3 types of data getters, being: substring, array and regex.

Each type of value getter require its own docblok tags that must be a subtag from @extractable tag.

Substring

The substring value getter works just like substr PHP function where,

  • {@start} is the initial position, based on 0 index.
  • {@length} is the length of the desired text
/**
 * @extractable
 *   {@start integer}
 *   {@length integer}
 */

Array

The index is a simple array index tha must be extracted (say, from an CSV file). Both numeric or associative index.

/**
 * @extractable
 *   {@index mixed}
 */

Regex

Regex uses patterns to match the desired data, where:

  • {@pattern} must be any valid pattern tha must match the desired data
  • {@index} is a numeric index of the matched data from pattern
/**
 * @extractable
 *   {@pattern string}
 *   {@index integer}
 */

The Model

Write a class with public attributes with docblock description to guide the Extractor

class MyModel
{
    /**
     * @extractable
     *    {@start 0}
     *    {@length 10}
     * @otherTag from prop 1
     */
    public $prop1;

    /**
     * @extractable
     *    {@start 10}
     *    {@length 11}
     * @otherTag from prop 2
     */
    public $prop2;
}

Getting the extractor

The extractor can be directly instantiated or created via factory

Directly

use Fabiomez\DataExtractor\Extractor;
use Fabiomez\DataExtractor\DocBlockParser;
use Fabiomez\DataExtractor\ValueGetters\ArrayValueGetter;
use Fabiomez\DataExtractor\ValueGetters\RegexValueGetter;
use Fabiomez\DataExtractor\ValueGetters\SubstringValueGetter;

//Array extractor
$extractor = new Extractor(
    DocBlockParser::createInstance(),
    new ArrayValueGetter()
);

//Regex extractor
$extractor = new Extractor(
    DocBlockParser::createInstance(),
    new RegexValueGetter()
);

//Substring extractor
$extractor = new Extractor(
    DocBlockParser::createInstance(),
    new SubstringValueGetter()
);

Via Factory

use Fabiomez\DataExtractor\ExtractorFactory;

$factory = new ExtractorFactory();

//Array extractor
$extractor = $factory->createArrayExtractor();

//Regex extractor
$extractor = $factory->createRegexExtractor();

//Substring extractor
$extractor = $factory->createSubstringExtractor();

Extracting the data from source

The extraction process can use the model namespace or an instance;

//By namespace
$extractedModel = $extractor->extract(MyModel::class, 'First dataSecond Data');

//By instance
$extractedModel = $extractor->extract(new MyModel(), 'First dataSecond Data');

echo $extractedModel->prop1; //will give 'First data'
echo $extractedModel->prop2; //will give 'Second data'

Optionaly a callback can be provided on third paramenter to touch the model after the extraction

$extractedModel = $extractor->extract(
    MyModel::class,
    'First dataSecond Data',
    function ($model, $propertiesSchema) {
        foreach ($propertiesSchema as $property => $schema) {
            $model->{$property} .= $schema['otherTag'];
        }
    }
);

echo $extractedModel->prop1; //will give 'First data from prop 1'
echo $extractedModel->prop2; //will give 'Second data from prop 2'