stickee / import
Stickee data importer module - import from CSV, Akeneo, etc
This package's canonical repository appears to be gone and the package has been frozen as a result.
Requires
- php: ^7.2
Suggests
- akeneo/api-php-client: Allows importing from Akeneo. Will also require php-http/guzzle6-adapter ^2.0 and php-http/httplug ^1.0
README
Import data from Akeneo, CSV feeds, Hydrogen or other sources into a database table.
Installation
composer require stickee/import
Method of Operation
The importer follows the following process:
- Create a temporary table
- Insert data from your data source into the table
- Merge the temporary table in to the data table
Usage
Create an instance of the Importer
object and call run()
use Stickee\Import\Importer; $importer = new Importer($dataMerger, $temporaryTableManager, $iterable); $importer->run();
The three arguments to the Importer
constructor are detailed below.
You can optionally call initialise
and / or cleanup
manually, before and after run
respectively - see Multiple Importers
Data Merger
The system supplies a single \Stickee\Import\Utils\DataMerger
class you can create an instance of.
Constructor
You can construct the DataMerger using the following constructor parameters, or using the fluent methods below.
Name | Type | Default | Description | Example |
---|---|---|---|---|
$db |
\Illuminate\Database\ConnectionInterface |
- | The database connection | |
$tableName |
string |
- | The target table name | my_data |
$tempTableName |
string |
- | The temporary table name - Usually you will get this from $temporaryTableManager->getTableName() |
my_data_tmp |
$columns |
array |
- | The columns to insert data into | ['code', 'name', 'description'] |
$importIndexes |
string[] |
['PRIMARY'] |
The indexes used while importing. Usually this will an index or indexes that cover $joinFields |
['PRIMARY', 'code'] |
$joinFields |
array |
['id'] |
The columns to use to join the table and temporary table | ['code'] |
$preservedColumns |
array |
[] |
Columns that should be inserted, but not updated; use if the table can be written to outside of the importer | ['custom_description'] |
$deleteRemovedItems |
boolean |
true |
Delete items that aren't in the temporary table | false |
$addAutoIdColumn |
boolean |
true |
Whether to add a {table}.id auto-number field. Should be true unless the ID is in the data |
false |
Properties
You can set these to fine-tune the behaviour
Name | Type | Description |
---|---|---|
$bulkInsertSize |
int |
How many rows to insert at a time |
$bulkUpdateSize |
int |
How many rows to delete / update at a time |
Fluent methods
Method | Description |
---|---|
setImportIndexes(string $importIndexes) |
Set the importIndexes |
setJoinFields(string $joinFields) |
Set the joinFields |
setPreservedColumns(string $preservedColumns) |
Set the preservedColumns |
setDeleteRemovedItems(string $deleteRemovedItems) |
Set the deleteRemovedItems |
setAddAutoIdColumn(string $addAutoIdColumn) |
Set the addAutoIdColumn |
setBulkInsertSize(int $bulkInsertSize) |
Set the bulk insert size |
setBulkUpdateSize(int $bulkUpdateSize) |
Set the bulk update size |
setOutput(?\Illuminate\Console\OutputStyle) |
Set the console output |
For example:
$dataMerger = (new DataMerger($db, 'my_table', $temporaryTableManager->getTableName(), `['code', 'name', 'description'])) ->setImportIndexes(['PRIMARY', 'my_table_code_unique']) ->setJoinFields(['code']) ->setBulkInsertSize(10000);
Temporary Table Manager
This will create and drop the temporary table.
Usually Stickee\Import\TableManagers\AutoTableManager
is the best choice.
This will use a TEMPORARY TABLE ENGINE=MEMORY
table where possible, and fall back to TEMPORARY TABLE
.
Iterable
This must be an iterable that provides an associative array of column name => value
Implementing this is up to you, but there are classes provided to assist with creating an iterator for Akeneo data and for Hydrogen Table Sync data.
For Akeneo you can use \Stickee\Import\Iterators\AkeneoIterator
.
For Hydrogen you can use or extend \Stickee\Import\Iterators\TableIterator
.
Example 1
The iterator could be as simple as a plain array
$iterator = [ ['name' => 'Fluttershy', 'colour' => 'yellow'], ['name' => 'Rainbow Dash', 'colour' => 'blue'], ... ];
Example 2
Reading from a CSV could be implemented as a generator, to keep memory usage down
use IteratorAggregate; use League\Csv\Reader; use Stickee\Import\Utils\RowMapper; $iterator = new class implements IteratorAggregate { public function getIterator(): iterable { $csvStream = (new FileStream())->getStream('http://example.com/data.csv'); $csv = Reader::createFromStream($csvStream); $rowMapper = new RowMapper(['name' => 0, 'colour' => 1]); foreach ($csv as $row) { yield $rowMapper->mapRow($row); } } }
Images
If you need to download images then you should create a generator like the CSV example above, and download each row's image(s) before you yield the row. Helper utilities are available to make the download process easier.
Helper Utilities
DownloadManager
Download files, keeping track of which have already been downloaded so the same file won't be downloaded twice
use Stickee\Import\Utils\DownloadManager $downloadManager = new DownloadManager($fileDownloadService); $downloadManager->downloadFile('http://example.com/1.jpg'); $downloadManager->downloadFile('http://example.com/2.jpg', 'two.jpg');
FileDownloadService
Download a file to a Laravel Disk. It accepts local paths as well as URLs
use Stickee\Import\Utils\FileDownloadService $fileDownloadService = new FileDownloadService($disk, $fileStream); $fileDownloadService->storeFileContents('a.txt', 'Hello World!'); $fileDownloadService->storeFile('http://example.com/a.txt', 'example-a.txt'); $fileDownloadService->storeFile('/var/www/a.txt', 'local-a.txt');
FileStream
Open a local path or URL as a file stream.
Don't forget to fclose
the stream when you're done with it.
use Stickee\Import\Utils\FileStream $fileStream = new FileStream(); $file = $fileStream->getStream('http://example.com/a.txt'); echo fread($file, 1024); fclose($file);
IdLookupService
If you're adding your own primary key to the data, then you may need to look up relationships using the data's own key (known as the code).
For example, you have two CSV files:
ponies.csv
code,tribe,name,colour
F,P,Fluttershy,yellow
RD,P,Rainbow Dash,blue
TS,U,Twilight Sparkle,purple
tribes.csv
code,name
P,Pegasus
U,Unicorn
E,Earth Pony
Your database looks like this:
mydb.ponies
id | code | tribe_id | name | colour
---|------|----------|------------------|-------
1 | F | 1 | Fluttershy | yellow
2 | RD | 1 | Rainbow Dash | blue
3 | TS | 2 | Twilight Sparkle | purple
mydb.tribes
id | code | name
---|------|-----------
1 | P | Pegasus
2 | U | Unicorn
3 | E | Earth Pony
For the mydb.ponies
table you need to convert the tribe
from the CSV into a mydb.tribes.id
value - e.g. P
should be transformed to 1
.
This can be achieved by creating an IdLookupService and then using it inside your iterator.
use Stickee\Import\Utils\IdLookupService $idLookupService = new IdLookupService($db, 'tribes', 'code'); $tribeId = $idLookupService->get('P'); // $tribeId === 1
QueryChunkGenerator
This is similar to Laravel's DB::query()->chunk()
but instead of taking a callback it yields each record
use Stickee\Import\Utils\QueryChunkGenerator $query = MyModel::query(); $iterator = new ChunkGenerator($query, 100); // This will echo the ID for all models, but will only fetch 100 at a time from the database foreach ($iterator as $row) { echo $row->id; }
RowMapper
This is a helper for converting from numerically-indexed data to an associative array, e.g. for CSV data.
The CSV row indexes don't have to all be specified, or be in any particular order.
If you want to add a null column, set the index to -1
.
use Stickee\Import\Utils\RowMapper $map = [ 'id' => 0, 'colour' => 4, always_null' => -1, ]; $rowMapper = new RowMapper($map); $row = $rowMapper->mapRow([1, 'F', 1 , 'Fluttershy', 'yellow']); // $row === ['id' => 1, 'colour' => 'yellow', 'always_null' => null]
Multiple Importers
If you have multiple importers that you want to run in a single transaction, you must run initialise()
before you
start the transaction, otherwise it will be autocomitted.
$importer1->initialise(); $importer2->initialise(); DB::transaction(function () { $importer1->run(); $importer2->run(); });
Common Errors
SQLSTATE[HY000]: General error: 1114 The table '...' is full
This occurs when a MEMORY table (which will be used by MemoryTableManager
and AutoTableManager
where possible)
exceeds MySQL's max_heap_table_size
.
If you get this error, you can either use TemporaryTableManager
, which will use a temporary table on disk,
or increase max_heap_table_size
like this:
$maxHeapTableSize = DB::select(DB::raw('SHOW VARIABLES LIKE "max_heap_table_size"'))[0]->Value; DB::statement('SET SESSION max_heap_table_size=4294967295'); $importer->run(); DB::statement('SET SESSION max_heap_table_size=?', [(int)$maxHeapTableSize]);
Developing
The easiest way to make changes is to make the project you're importing the module in to load the module from your filesystem instead of the composer repository, like this:
composer remove stickee/import
- Edit
composer.json
and add"repositories" : [ { "type": "path", "url": "../import-module" } ]
where "../import-module" is the path to where you have this project checked out composer require stickee/import
NOTE: Do not check in your composer.json
like this!