jrmgx / etl-package
ETL: Extract, Transform, Load
Requires
- php: ^8.1
- ext-mbstring: *
- doctrine/dbal: ^3.6
- symfony/config: ^6.2
- symfony/dependency-injection: ^6.2
- symfony/expression-language: ^6.2
- symfony/http-client: ^6.2
- symfony/property-access: ^6.2
- twig/twig: ^3.5
Requires (Dev)
- friendsofphp/php-cs-fixer: ^3.14
- phpstan/phpstan: ^1.10
- phpunit/phpunit: ^10.0
- symfony/finder: ^6.2
- symfony/yaml: ^6.2
This package is auto-updated.
Last update: 2025-05-13 17:49:49 UTC
README
THIS IS A WORK IN PROGRESS
This package will allow you to:
- Extract data from a given source
- Transform it as you need
- Load the result to a given destination
It is based on a simple config file described below.
Basic Example
In this basic example:
- We are getting a CSV file from the local repository
- We transform the data a bit
- We send back the transformed data into an API as JSON
Yaml is optional, it could have been a plain PHP array.
extract: pull: type: file uri: ./demo/data_in.csv read: format: csv options: trim: true transform: filter: type: query options: where: 'Age > :min' parameters: min: 30 mapping: type: expressive map: out.name: in.Name out.sex: in.Sex out.age_in_sec: 'in.Age * 365 * 24 * 60 * 60' load: write: format: json push: type: http uri: https://webhook.site/f24c112b-8344-4fe3-a9e5-53baf36c912f options: headers: 'Authorization': 'Basic 9e222b3b7647c7'
Configuration
On the ETL part, everything is configured into one single file that describe each steps.
But for the sake of readability, the documentation has been split, so each component has it own page.
Extractors
Extractors are responsible to pull and read your data.
Pull: Responsible for pulling the data from its source, be it a local file, an API or a database.
Read: Given the resource gotten in Pull, read that and convert it to a normalized array to work with later.
Most of the time, the Pull and Read section are independent, but for some type of resource (like database), they work together (pull will make and handle the connection and read get the data from that connection).
Transformers
Transformers are responsible to filter and transform the given data.
Filter: Given the data, apply specific filtering logic.
Mapping: Given the data, associate output fields to input fields, with optional transformation.
Chaining
Sometimes you may want to take advantage of multiple transformers,
you can do it by adding multiples entries into a transformers
section.
transformers: first_transform: # This name does not matter, but it has to be unique mapping: type: simple map: out.name: in.Name out.sex: in.Sex out.size: in.Height second_transform: # This name does not matter, but it has to be unique filter: type: query options: where: 'size > :size' parameters: size: 2 mapping: type: expressive map: out.name: in.name out.sex: in.sex out.squared: 'in.size * in.size'
Loaders
Loaders are responsible to write and push your data.
Write: Given the data, write and convert it to the specified format / type.
Push: Given the resource gotten in Write, push it to the configured source.
Most of the time, the Write and Push section are independent, but for some type of resource (like database), they work together (write prepare the data and push will handle the connection).
Use it in your project
Symfony or via dependency injection
In a Symfony project you can add those lines to your services.yaml
to make it work:
Jrmgx\Etl\: resource: '../vendor/jrmgx/etl-package/src/' exclude: - '../vendor/jrmgx/etl-package/src/Config/'
This will register everything as a service, and then you can inject Etl
like this:
class YourService { public function __construct(private readonly Etl $etl) { } public function yourMethod(): void { // Yaml is optional, you can provide an array instead $configFile = Yaml::parseFile(__DIR__.'/../../config/etl.yaml'); $config = new Config($configFile, __DIR__ . '/../../'); $this->etl->execute($config); } }
In other projects using dependency injection it should be pretty similar.
Manual setup
TODO
Add custom Extractors/Transformers/Loaders
Implement Custom Extractors
...
Implement Custom Transformers
...
Implement Custom Loaders
...
Development Environment
Install the project
...
Run the project
...
Tests and code style
...
Licence
MIT
Attributions
data_in.csv
comes from: https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html