stickee/import

Stickee data importer module - import from CSV, Akeneo, etc

2.0.0 2020-12-03 14:07 UTC

This package is auto-updated.

Last update: 2021-02-03 15:39:03 UTC


README

Import data from Akeneo, CSV feeds, Hydrogen or other sources into a database table.

Installation

composer require stickee/import

Method of Operation

The importer follows the following process:

  1. Create a temporary table
  2. Insert data from your data source into the table
  3. Merge the temporary table in to the data table

Usage

Create an instance of the Importer object and call run()

use Stickee\Import\Importer;

$importer = new Importer($dataMerger, $temporaryTableManager, $iterable);
$importer->run();

The three arguments to the Importer constructor are detailed below. You can optionally call initialise and / or cleanup manually, before and after run respectively - see Multiple Importers

Data Merger

The system supplies a single \Stickee\Import\Utils\DataMerger class you can create an instance of.

Constructor

You can construct the DataMerger using the following constructor parameters, or using the fluent methods below.

Name Type Default Description Example
$db \Illuminate\Database\ConnectionInterface - The database connection
$tableName string - The target table name my_data
$tempTableName string - The temporary table name - Usually you will get this from $temporaryTableManager->getTableName() my_data_tmp
$columns array - The columns to insert data into ['code', 'name', 'description']
$importIndexes string[] ['PRIMARY'] The indexes used while importing. Usually this will an index or indexes that cover $joinFields ['PRIMARY', 'code']
$joinFields array ['id'] The columns to use to join the table and temporary table ['code']
$preservedColumns array [] Columns that should be inserted, but not updated; use if the table can be written to outside of the importer ['custom_description']
$deleteRemovedItems boolean true Delete items that aren't in the temporary table false
$addAutoIdColumn boolean true Whether to add a {table}.id auto-number field. Should be true unless the ID is in the data false

Properties

You can set these to fine-tune the behaviour

Name Type Description
$bulkInsertSize int How many rows to insert at a time
$bulkUpdateSize int How many rows to delete / update at a time

Fluent methods

Method Description
setImportIndexes(string $importIndexes) Set the importIndexes
setJoinFields(string $joinFields) Set the joinFields
setPreservedColumns(string $preservedColumns) Set the preservedColumns
setDeleteRemovedItems(string $deleteRemovedItems) Set the deleteRemovedItems
setAddAutoIdColumn(string $addAutoIdColumn) Set the addAutoIdColumn
setBulkInsertSize(int $bulkInsertSize) Set the bulk insert size
setBulkUpdateSize(int $bulkUpdateSize) Set the bulk update size
setOutput(?\Illuminate\Console\OutputStyle) Set the console output

For example:

$dataMerger = (new DataMerger($db, 'my_table', $temporaryTableManager->getTableName(), `['code', 'name', 'description']))
    ->setImportIndexes(['PRIMARY', 'my_table_code_unique'])
    ->setJoinFields(['code'])
    ->setBulkInsertSize(10000);

Temporary Table Manager

This will create and drop the temporary table.

Usually Stickee\Import\TableManagers\AutoTableManager is the best choice. This will use a TEMPORARY TABLE ENGINE=MEMORY table where possible, and fall back to TEMPORARY TABLE.

Iterable

This must be an iterable that provides an associative array of column name => value Implementing this is up to you, but there are classes provided to assist with creating an iterator for Akeneo data and for Hydrogen Table Sync data. For Akeneo you can use \Stickee\Import\Iterators\AkeneoIterator. For Hydrogen you can use or extend \Stickee\Import\Iterators\TableIterator.

Example 1

The iterator could be as simple as a plain array

$iterator = [
    ['name' => 'Fluttershy', 'colour' => 'yellow'],
    ['name' => 'Rainbow Dash', 'colour' => 'blue'],
    ...
];

Example 2

Reading from a CSV could be implemented as a generator, to keep memory usage down

use IteratorAggregate;
use League\Csv\Reader;
use Stickee\Import\Utils\RowMapper;

$iterator = new class implements IteratorAggregate
{
    public function getIterator(): iterable
    {
        $csvStream = (new FileStream())->getStream('http://example.com/data.csv');
        $csv = Reader::createFromStream($csvStream);
        $rowMapper = new RowMapper(['name' => 0, 'colour' => 1]);

        foreach ($csv as $row) {
            yield $rowMapper->mapRow($row);
        }
    }
}

Images

If you need to download images then you should create a generator like the CSV example above, and download each row's image(s) before you yield the row. Helper utilities are available to make the download process easier.

Helper Utilities

DownloadManager

Download files, keeping track of which have already been downloaded so the same file won't be downloaded twice

use Stickee\Import\Utils\DownloadManager

$downloadManager = new DownloadManager($fileDownloadService);
$downloadManager->downloadFile('http://example.com/1.jpg');
$downloadManager->downloadFile('http://example.com/2.jpg', 'two.jpg');

FileDownloadService

Download a file to a Laravel Disk. It accepts local paths as well as URLs

use Stickee\Import\Utils\FileDownloadService

$fileDownloadService = new FileDownloadService($disk, $fileStream);
$fileDownloadService->storeFileContents('a.txt', 'Hello World!');
$fileDownloadService->storeFile('http://example.com/a.txt', 'example-a.txt');
$fileDownloadService->storeFile('/var/www/a.txt', 'local-a.txt');

FileStream

Open a local path or URL as a file stream. Don't forget to fclose the stream when you're done with it.

use Stickee\Import\Utils\FileStream

$fileStream = new FileStream();

$file = $fileStream->getStream('http://example.com/a.txt');
echo fread($file, 1024);
fclose($file);

IdLookupService

If you're adding your own primary key to the data, then you may need to look up relationships using the data's own key (known as the code).

For example, you have two CSV files:

ponies.csv

code,tribe,name,colour
F,P,Fluttershy,yellow
RD,P,Rainbow Dash,blue
TS,U,Twilight Sparkle,purple

tribes.csv

code,name
P,Pegasus
U,Unicorn
E,Earth Pony

Your database looks like this:

mydb.ponies

id | code | tribe_id | name             | colour
---|------|----------|------------------|-------
 1 | F    | 1        | Fluttershy       | yellow
 2 | RD   | 1        | Rainbow Dash     | blue
 3 | TS   | 2        | Twilight Sparkle | purple

mydb.tribes

id | code | name
---|------|-----------
1  | P    | Pegasus
2  | U    | Unicorn
3  | E    | Earth Pony

For the mydb.ponies table you need to convert the tribe from the CSV into a mydb.tribes.id value - e.g. P should be transformed to 1. This can be achieved by creating an IdLookupService and then using it inside your iterator.

use Stickee\Import\Utils\IdLookupService

$idLookupService = new IdLookupService($db, 'tribes', 'code');
$tribeId = $idLookupService->get('P'); // $tribeId === 1

QueryChunkGenerator

This is similar to Laravel's DB::query()->chunk() but instead of taking a callback it yields each record

use Stickee\Import\Utils\QueryChunkGenerator

$query = MyModel::query();
$iterator = new ChunkGenerator($query, 100);

// This will echo the ID for all models, but will only fetch 100 at a time from the database
foreach ($iterator as $row) {
    echo $row->id;
}

RowMapper

This is a helper for converting from numerically-indexed data to an associative array, e.g. for CSV data. The CSV row indexes don't have to all be specified, or be in any particular order. If you want to add a null column, set the index to -1.

use Stickee\Import\Utils\RowMapper

$map = [
    'id' => 0,
    'colour' => 4,
    always_null' => -1,
];

$rowMapper = new RowMapper($map);

$row = $rowMapper->mapRow([1, 'F', 1 , 'Fluttershy', 'yellow']);

// $row === ['id' => 1, 'colour' => 'yellow', 'always_null' => null]

Multiple Importers

If you have multiple importers that you want to run in a single transaction, you must run initialise() before you start the transaction, otherwise it will be autocomitted.

$importer1->initialise();
$importer2->initialise();

DB::transaction(function () {
    $importer1->run();
    $importer2->run();
});

Common Errors

SQLSTATE[HY000]: General error: 1114 The table '...' is full

This occurs when a MEMORY table (which will be used by MemoryTableManager and AutoTableManager where possible) exceeds MySQL's max_heap_table_size. If you get this error, you can either use TemporaryTableManager, which will use a temporary table on disk, or increase max_heap_table_size like this:

$maxHeapTableSize = DB::select(DB::raw('SHOW VARIABLES LIKE "max_heap_table_size"'))[0]->Value;
DB::statement('SET SESSION max_heap_table_size=4294967295');

$importer->run();

DB::statement('SET SESSION max_heap_table_size=?', [(int)$maxHeapTableSize]);

Developing

The easiest way to make changes is to make the project you're importing the module in to load the module from your filesystem instead of the composer repository, like this:

  1. composer remove stickee/import
  2. Edit composer.json and add
    "repositories" : [
            {
                "type": "path",
                "url": "../import-module"
            }
        ]
    where "../import-module" is the path to where you have this project checked out
  3. composer require stickee/import

NOTE: Do not check in your composer.json like this!