survos / import-bundle
import-bundle Bundle
Fund package maintenance!
kbond
Installs: 154
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Type:symfony-bundle
pkg:composer/survos/import-bundle
Requires
- php: ^8.4
- symfony/config: ^7.4||^8.0
- symfony/dependency-injection: ^7.4||^8.0
- symfony/http-kernel: ^7.4||^8.0
- symfony/type-info: ^7.4||^8.0
- twig/twig: ^3.4
Requires (Dev)
- doctrine/doctrine-bundle: ^3.0
- doctrine/orm: ^3.5
- phpstan/phpstan-symfony: ^2.0
- phpunit/phpunit: ^12.4
- survos/jsonl-bundle: ^2.0
- symfony/browser-kit: ^7.4||^8.0
- symfony/framework-bundle: ^7.4||^8.0
- symfony/phpunit-bridge: ^7.4||^8.0
- symfony/serializer: ^7.4||^8.0
- symfony/twig-bundle: ^7.4||^8.0
- symfony/var-dumper: ^7.4||^8.0
- dev-main
- 2.0.81
- 2.0.80
- 2.0.79
- 2.0.78
- 2.0.77
- 2.0.76
- 2.0.75
- 2.0.74
- 2.0.73
- 2.0.72
- 2.0.71
- 2.0.70
- 2.0.69
- 2.0.68
- 2.0.67
- 2.0.66
- 2.0.65
- 2.0.64
- 2.0.63
- 2.0.62
- 2.0.61
- 2.0.60
- 2.0.59
- 2.0.58
- 2.0.57
- 2.0.56
- 2.0.55
- 2.0.54
- 2.0.53
- 2.0.51
- 2.0.50
- 2.0.49
- 2.0.48
- 2.0.47
- 2.0.46
- 2.0.45
- 2.0.44
- 2.0.43
- 2.0.42
- 2.0.41
- 2.0.40
- 2.0.39
- 2.0.38
- 2.0.37
- 2.0.36
- 2.0.35
- 2.0.34
- 2.0.33
- 2.0.32
- 2.0.31
- 2.0.30
- 2.0.29
- 2.0.28
- 2.0.27
This package is auto-updated.
Last update: 2025-12-02 10:43:20 UTC
README
Symfony bundle that provides tools for importing data.
SurvosImportBundle helps you get raw CSV/JSON data into your database via Doctrine with minimal fuss.
Typical problems this bundle solves:
- You have CSV or JSON exports (from an API, a vendor, a legacy system…) and you want them in your app’s database.
- You need a real primary key, correct Doctrine field types (int, float, bool, datetime, json, text…), and ideally some basic statistics to make good schema decisions.
- You want a repeatable pipeline that goes from:
- Raw file → cleaned, normalized JSONL + profile
- JSONL + profile → Doctrine entity with good defaults
- JSONL → Doctrine entities persisted efficiently (batches, progress, etc.)
SurvosImportBundle provides exactly that pipeline:
import:convert– convert raw CSV/JSON into JSONL + a profile with field statistics.code:entity– generate a Doctrine entity from that profile (via SurvosCodeBundle).import:entities– import JSONL records into your database using Doctrine.
You can also use it in a simpler “direct CSV → Entity → Import” mode for quick one-off jobs and demos.
Table of Contents
- Installation
- Quick Start (Direct CSV → Entity → Import)
- Concepts
- The Pipeline
- End-to-End Example
- Complete Demo App with EasyAdmin
- Castor Automation
- Events & Extensibility
- Tips & Gotchas
- See Also
Installation
composer require survos/import-bundle composer require --dev survos/code-bundle
Register the bundle if you’re not using auto-discovery:
// config/bundles.php return [ Survos\ImportBundle\SurvosImportBundle::class => ['all' => true], ];
Quick Start (Direct CSV → Entity → Import)
This is the minimal “I just want my CSV in Doctrine” flow.
In short, install the bundles:
composer req survos/import-bundle composer req --dev survos/code-bundle
First, create an entity class by inspecting the first line (and/or a sample) of a CSV file:
bin/console code:entity Movie --file=data/movies.csv
The entity has property names that loosely match the CSV headers
(e.g. "First Name" becomes $firstName in the entity).
Then import the data:
bin/console import:entities Movie --file data/movies.csv --limit 500
That’s the “fast path” for simple, flat CSVs.
For more control and richer metadata, use the JSONL-based pipeline below.
Concepts
JSONL
The bundle normalizes input into JSON Lines (JSONL):
- One JSON object per line
- Easy to stream in batches
- Unix-friendly
- Plays nicely with SurvosJsonlBundle and other ETL tools
Example (movies.jsonl):
{"id": 1, "title": "The Matrix", "year": 1999}
{"id": 2, "title": "Inception", "year": 2010}
Profile
Conversion also generates a profile (*.profile.json) containing:
- Field type inference
- Null count, distinct count
- String length stats
- Boolean-like detection
- Facet candidate detection
- Primary key candidates
- First/last samples
- Min/max distributions
This powers code:entity to emit correct Doctrine field mappings (e.g. using Types::TEXT when max length > 255).
The Pipeline
1. import:convert
Goal: Transform CSV/JSON/ZIP/GZ input into:
- A normalized
*.jsonlfile - A detailed
*.profile.jsonfile
Usage:
bin/console import:convert data/movies.csv --dataset=movies
Features:
- Detects CSV / JSON / JSONL / ZIP / GZIP automatically
- Normalizes encoding
- Produces JSONL for streaming
- Produces a profile with complete field statistics
- Supports
--limit,--tags,--dataset
2. code:entity
(from SurvosCodeBundle, but part of this pipeline)
Goal: Generate a Doctrine entity from a JSONL profile.
Example:
bin/console code:entity data/movies.profile.json App\\Entity\\Movie
What it infers:
- Primary key (or use
--pk) - Doctrine field types:
- small strings →
string - long strings (length > 255) →
Types::TEXT - ints/floats
- datetime/dates
- json for nested structures
- small strings →
- Public properties with helpful PHPDoc derived from the profile
#[ORM\Entity(repositoryClass: ...)]
You review/tweak it, then generate schema/migrations.
3. import:entities
Goal: Insert the JSONL data into your database using Doctrine.
Example:
bin/console import:entities App\\Entity\\Movie data/movies.jsonl
Key features:
- Batch processing (
--batch=200) - PK assignment via
--pk - Reset/truncate via
--reset - Progress bar
- Works with any Doctrine entity
End-to-End Example
Step 1 — Convert CSV → JSONL + profile
bin/console import:convert data/movies.csv --dataset=movies
Produces:
data/movies.jsonldata/movies.profile.json
Step 2 — Generate Doctrine entity
bin/console code:entity data/movies.profile.json App\\Entity\\Movie --pk=id
Creates something like:
#[ORM\Entity(repositoryClass: MovieRepository::class)] class Movie { #[ORM\Id] #[ORM\Column(type: 'integer')] public ?int $id = null; #[ORM\Column(length: 255, nullable: true)] public ?string $title = null; #[ORM\Column(type: 'integer', nullable: true)] public ?int $year = null; // ... }
Step 3 — Import entities
bin/console import:entities App\\Entity\\Movie data/movies.jsonl --pk=id
Done — your DB is now populated.
Complete Demo App with EasyAdmin
This is a complete “from scratch” demo using EasyAdmin to view the data.
Prerequisites
- symfony CLI
- curl
- PHP 8.4 (the demo uses property hooks)
- gunzip (because the demo data is gzipped)
Commands
symfony new import-demo --webapp && cd import-demo composer config extra.symfony.allow-contrib true echo "DATABASE_URL=sqlite:///%kernel.project_dir%/var/data.db" > .env.local symfony server:start -d composer req --dev survos/code-bundle composer req survos/import-bundle league/csv composer req easycorp/easyadmin-bundle:4.x-dev mkdir -p data curl -L -o data/movies.csv.gz https://github.com/metarank/msrd/raw/master/dataset/movies.csv.gz gunzip data/movies.csv.gz # sanity check head -n 2 data/movies.csv # generate entity from CSV bin/console code:entity Movie --file=data/movies.csv # create schema bin/console d:sch:update --force # import some data bin/console import:entities Movie --file data/movies.csv --limit 500 # EasyAdmin dashboard + CRUD bin/console make:admin:dashboard -n bin/console make:admin:crud App\\Entity\\Movie -n
For reasons that are still a bit mysterious, clearing the cache inline doesn’t always work, so run:
bin/console cache:clear bin/console cache:pool:clear cache.app symfony open:local --path=/admin/movie
Castor Automation
Instead of the bash script above, you can run everything as a Castor command, after installing Castor:
curl "https://castor.jolicode.com/install" | bash
Now create a project, download the castor file and build using it:
symfony new import-demo --webapp && cd import-demo curl -L https://github.com/survos/import-bundle/raw/master/app/castor.php -o castor.php castor build
This will scaffold the demo, run imports, and set up admin views in one go.
Events & Extensibility
SurvosImportBundle emits events so you can tweak records on the fly during conversion/import.
The three main ImportBundle events are:
-
ImportConvertStartedEvent- Emitted when an import/convert run starts.
- Carries dataset name, input path, limit, tags, etc.
- Good place for initialization, logging, or dataset-specific setup.
-
ImportConvertRowEvent- Emitted for every row during conversion.
- Lets you mutate, enrich, or even drop records before they are written to JSONL.
- You can:
- Normalize IDs
- Slugify codes
- Attach derived URLs
- Store images to disk
- Deduplicate by tracking
$event->index/keys
-
ImportConvertFinishedEvent- Emitted when conversion finishes.
- Good for summaries, flushing caches, or post-processing.
You can also listen to JsonlBundle’s events (e.g. JsonlConvertStartedEvent, JsonlRecordEvent) for lower-level control of JSONL conversion.
Example: Enriching Records During Conversion
Here’s a simplified example based on a real service used in this bundle’s demos:
<?php namespace App\Service; use Survos\CoreBundle\Service\SurvosUtils; use Survos\ImportBundle\Event\ImportConvertRowEvent; use Symfony\Component\EventDispatcher\Attribute\AsEventListener; use Symfony\Component\String\Slugger\SluggerInterface; class EnhanceRecordService { /** @var string[] */ private array $seen = []; public function __construct( private SluggerInterface $asciiSlugger, ) {} #[AsEventListener(event: ImportConvertRowEvent::class)] public function tweakRecord(ImportConvertRowEvent $event): void { $record = $event->row; // Clean up nulls / empty arrays $record = SurvosUtils::removeNullsAndEmptyArrays($record); switch ($event->dataset) { case 'wcma': $id = (int) $record['id']; // De-dupe by ID if (in_array($id, $this->seen, true)) { // Drop this row entirely $event->row = null; return; } $this->seen[] = $id; // Normalize ID and build useful URLs $record['id'] = $id; $record['citation_url'] = sprintf( 'https://egallery.williams.edu/objects/%d', $id ); $record['manifest'] = sprintf( 'https://egallery.williams.edu/apis/iiif/presentation/v2/1-objects-%d/manifest', $id ); break; case 'marvel': // Slug based on name for a stable "code" $code = $this->asciiSlugger->slug($record['name'])->toString(); $record['code'] = $code; if (in_array($code, $this->seen, true)) { $event->row = null; // skip duplicates return; } $this->seen[] = $code; break; case 'car': // Assign a synthetic ID using the row index $record['id'] = $event->index + 1; break; } // Save modified record back onto the event $event->row = $record; } }
You can also attach helpers, for example to store base64 images as files and replace the JSON field with a URL:
private function saveBase64Image(string $base64String, string $outputPath): bool { $dir = \dirname($outputPath); if (!is_dir($dir)) { mkdir($dir, 0777, true); } if (preg_match('/^data:image\/(\w+);base64,/', $base64String, $matches)) { $base64String = substr($base64String, strpos($base64String, ',') + 1); } $imageData = base64_decode($base64String, true); if ($imageData === false) { return false; } return file_put_contents($outputPath, $imageData) !== false; }
This pattern—listen to events and mutate $event->row—is the recommended way to inject domain-specific logic into a generic import pipeline without forking the bundle.
Tips & Gotchas
-
Type errors during import
Usually caused by wrong--pkor mismatched types.
Re-check the profile and/or adjust the entity types. -
Long text fields
Over 255 chars → mapped toTypes::TEXTbycode:entity.
If the data changes shape later, regenerate or tweak manually. -
Nested structures
Complex JSON structures are mapped to Doctrine’sjsontype.
Make sure your database platform supports it. -
Iterate fast
Use--limitduring development:- Faster profiling
- Less noise
- Regenerate the full JSONL once the entity looks good.
See Also
- SurvosJsonlBundle — JSONL utilities, enrichment, pipelines
- SurvosCodeBundle — entity generation, Twig/JS/Liquid template generation
- SurvosMeiliBundle — search and indexing once entities are in Doctrine