dgtlss / parqbridge
Export Laravel database tables to Parquet files using Storage disks (no external deps).
Requires
- php: >=8.3
Requires (Dev)
- illuminate/config: ^11.0|^12.0
- illuminate/console: ^11.0|^12.0
- illuminate/container: ^11.0|^12.0
- illuminate/database: ^11.0|^12.0
- illuminate/filesystem: ^11.0|^12.0
- illuminate/support: ^11.0|^12.0
- league/flysystem: ^3.25.1
- phpoption/phpoption: ^1.9
- phpunit/phpunit: ^10.5
- vlucas/phpdotenv: ^5.6
README
Export your Laravel database tables to real Apache Parquet files on any Storage disk (local, S3, etc.) with a simple artisan command.
ParqBridge focuses on zero PHP dependency bloat while still producing spec-compliant Parquet files by delegating the final write step to a tiny, embedded Python script using PyArrow (or any custom CLI you prefer). You keep full Laravel DX for configuration and Storage; we bridge your data to Parquet.
Installation
- Require the package in your app (path repo or VCS):
composer require dgtlss/parqbridge
-
Laravel will auto-discover the service provider. Alternatively, register
ParqBridge\\ParqBridgeServiceProvider
manually. -
Publish the config if you want to customize defaults:
php artisan vendor:publish --tag="parqbridge-config"
Configuration
Set your export disk and options in .env
or config/parqbridge.php
.
PARQUET_DISK
: which filesystem disk to use (e.g.,s3
,local
).PARQUET_OUTPUT_DIR
: directory prefix within the disk (defaultparquet-exports
).PARQUET_CHUNK_SIZE
: rows per DB chunk when exporting (default 1000).PARQUET_INFERENCE
:database|sample|hybrid
(defaulthybrid
).PARQUET_COMPRESSION
: compression codec for Parquet (UNCOMPRESSED
/NONE
,SNAPPY
,GZIP
,ZSTD
,BROTLI
,LZ4_RAW
) when using PyArrow backend.PARQBRIDGE_WRITER
:pyarrow
(default) orcustom
. Ifcustom
, setPARQBRIDGE_CUSTOM_CMD
.PARQBRIDGE_PYTHON
: python executable for PyArrow (defaultpython3
).
Example .env
:
PARQUET_DISK=s3 PARQUET_OUTPUT_DIR=parquet-exports PARQUET_CHUNK_SIZE=2000
Ensure your filesystems
disk is configured (e.g., s3
) in config/filesystems.php
.
FTP disk configuration
You can export directly to an FTP server using Laravel's ftp
disk. Add an FTP disk to config/filesystems.php
and reference it via PARQUET_DISK=ftp
or --disk=ftp
.
'disks' => [ 'ftp' => [ 'driver' => 'ftp', 'host' => env('FTP_HOST'), 'username' => env('FTP_USERNAME'), 'password' => env('FTP_PASSWORD'), // Optional FTP settings 'port' => (int) env('FTP_PORT', 21), 'root' => env('FTP_ROOT', ''), 'passive' => filter_var(env('FTP_PASSIVE', true), FILTER_VALIDATE_BOOL), 'ssl' => filter_var(env('FTP_SSL', false), FILTER_VALIDATE_BOOL), 'timeout' => (int) env('FTP_TIMEOUT', 90), ], ],
Note: This package will coerce common FTP env values (e.g., port
, timeout
, passive
, ssl
) to the proper types before resolving the disk to avoid Flysystem type errors like "Argument #5 ($port) must be of type int, string given".
Usage
- List tables:
php artisan parqbridge:tables
- Export a table to the configured disk:
php artisan parqbridge:export users --where="active = 1" --limit=1000 --output="parquet-exports" --disk=s3
On success, the command prints the full path written within the disk. Files are named {table}-{YYYYMMDD_HHMMSS}.parquet
.
- Export ALL tables into one folder (timestamped subfolder inside
parqbridge.output_directory
):
php artisan parqbridge:export-all --disk=s3 --output="parquet-exports" --exclude=migrations,password_resets
Options:
--include=
: comma-separated allowlist of table names--exclude=
: comma-separated denylist of table names
Data types
The schema inferrer maps common DB types to a set of Parquet primitive types and logical annotations. With the PyArrow backend, an Arrow schema is constructed to faithfully write types:
- Primitive:
BOOLEAN
,INT32
,INT64
,FLOAT
,DOUBLE
,BYTE_ARRAY
,FIXED_LEN_BYTE_ARRAY
- Logical:
UTF8
,DATE
,TIME_MILLIS
,TIME_MICROS
,TIMESTAMP_MILLIS
,TIMESTAMP_MICROS
,DECIMAL
For decimals we write Arrow decimal types (decimal128
/decimal256
) with declared precision
/scale
.
Testing
Run the test suite:
composer install vendor/bin/phpunit
The tests bootstrap a minimal container, create a SQLite database, and verify:
- listing tables works on SQLite
- exporting a table writes a Parquet file to the configured disk (magic
PAR1
) - schema inference on SQLite maps major families
Backend requirements
- By default ParqBridge uses Python + PyArrow. Ensure
python3
is available and install PyArrow:
python3 -m pip install --upgrade pip python3 -m pip install pyarrow
- Alternatively set a custom converter command via
PARQBRIDGE_WRITER=custom
andPARQBRIDGE_CUSTOM_CMD
(must read{input}
CSV and write{output}
Parquet).
You can automate setup via the included command:
php artisan parqbridge:setup --write-env
Options:
--python=
: path/name of Python (default from configparqbridge.pyarrow_python
)--venv=
: location for virtualenv (default./parqbridge-venv
)--no-venv
: install into global Python instead of a venv--write-env
: appendPARQBRIDGE_PYTHON
andPARQBRIDGE_WRITER
to.env
--upgrade
: upgrade pip first--dry-run
: print commands without executing