ankane / ducklake
DuckLake for PHP
Installs: 0
Dependents: 0
Suggesters: 0
Security: 0
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
pkg:composer/ankane/ducklake
Requires
- php: >= 8.3
- satur.io/duckdb: ^2.0.1
Requires (Dev)
- phpunit/phpunit: ^12
This package is not auto-updated.
Last update: 2025-11-09 19:07:17 UTC
README
🦆 DuckLake for PHP
Run your own data lake with a SQL database and file/object storage
new DuckLake\Client( catalogUrl: 'postgres://user:pass@host:5432/dbname', storageUrl: 's3://my-bucket/' );
Note: DuckLake is not considered production-ready at the moment
Installation
Run:
composer require ankane/ducklake
Add scripts to composer.json to download the shared library:
"scripts": { "post-install-cmd": "DuckLake\\Library::check", "post-update-cmd": "DuckLake\\Library::check" }
And run:
composer install
Getting Started
Create a client - this one stores everything locally
$ducklake = new DuckLake\Client( catalogUrl: 'sqlite:///ducklake.sqlite', storageUrl: 'data_files/', createIfNotExists: true );
Create a table
$ducklake->sql('CREATE TABLE events (id bigint, name text)');
Load data from a file
$ducklake->sql("COPY events FROM 'data.csv'");
Confirm a new Parquet file was added to the data lake
$ducklake->listFiles('events');
Query the data
$ducklake->sql('SELECT COUNT(*) FROM events')->toArray();
Catalog Database
Catalog information can be stored in:
- Postgres:
postgres://user@pass@host:5432/dbname - SQLite:
sqlite:///path/to/dbname.sqlite - DuckDB:
duckdb:///path/to/dbname.duckdb
Note: MySQL and MariaDB are not currently supported due to duckdb/ducklake#70 and duckdb/ducklake#210
There are two ways to set up the schema:
- Run this script
- Configure the client to do it
new DuckLake\Client(createIfNotExists: true, ...);
Data Storage
Data can be stored in:
- Local files:
data_files/ - Amazon S3:
s3://my-bucket/path/ - Other providers: todo
Amazon S3
Credentials are detected in the standard AWS SDK locations
IAM permissions
- Read:
s3::ListBucket,s3::GetObject - Write:
s3::ListBucket,s3::PutObject - Maintenance:
s3::ListBucket,s3::GetObject,s3::PutObject,s3::DeleteObject
Operations
Create an empty table
$ducklake->sql('CREATE TABLE events (id bigint, name text)');
Or a table from a file
$ducklake->sql("CREATE TABLE events AS FROM 'data.csv'");
Load data from a file
$ducklake->sql("COPY events FROM 'data.csv'");
You can also load data directly from other data sources
$ducklake->attach('blog', 'postgres://localhost:5432/blog'); $ducklake->sql('INSERT INTO events SELECT * FROM blog.events');
Or register existing data files
$ducklake->addDataFiles('events', 'data.parquet');
Note: This transfers ownership to the data lake, so the file may be deleted as part of maintenance
Update data
$ducklake->sql('UPDATE events SET name = ? WHERE id = ?', ['Test', 1]);
Delete data
$ducklake->sql('DELETE * FROM events WHERE id = ?', [1]);
Schema Changes
Update the schema
$ducklake->sql('ALTER TABLE events ADD COLUMN active BOOLEAN');
Set or remove a partitioning key
$ducklake->sql('ALTER TABLE events SET PARTITIONED BY (name)'); // or $ducklake->sql('ALTER TABLE events RESET PARTITIONED BY');
Views
Create a view
$ducklake->sql('CREATE VIEW events_view AS SELECT * FROM events');
Drop a view
$ducklake->sql('DROP VIEW events_view');
Snapshots
Get snapshots
$ducklake->snapshots();
Query the data at a specific snapshot version or time
$ducklake->sql('SELECT * FROM events AT (VERSION => ?)', [3]); // or $ducklake->sql('SELECT * FROM events AT (TIMESTAMP => ?)', [new DateTime()]);
You can also specify a snapshot when creating the client
new DuckLake\Client(snapshotVersion: 3, ...); // or new DuckLake\Client(snapshotTime: new DateTime(), ...);
Maintenance
Merge files
$ducklake->mergeAdjacentFiles();
Expire snapshots
$ducklake->expireSnapshots(olderThan: new DateTime());
Clean up old files
$ducklake->cleanupOldFiles(olderThan: new DateTime());
Rewrite files with a certain percentage of deleted rows
$ducklake->rewriteDataFiles(deleteThreshold: 0.5);
Configuration
Get options
$ducklake->options();
Set an option globally
$ducklake->setOption('parquet_compression', 'zstd');
Or for a specific table
$ducklake->setOption('parquet_compression', 'zstd', tableName: 'events');
SQL Safety
Use parameterized queries when possible
$ducklake->sql('SELECT * FROM events WHERE id = ?', [1]);
For places that do not support parameters, use quote or quoteIdentifier
$quotedTable = $ducklake->quoteIdentifier('events'); $quotedFile = $ducklake->quote('path/to/data.csv'); $ducklake->sql("COPY $quotedTable FROM $quotedFile");
Reference
Get table info
$ducklake->tableInfo();
Drop a table
$ducklake->dropTable('events'); // or $ducklake->dropTable('events', ifExists: true);
List files
$ducklake->listFiles('events');
List files at a specific snapshot version or time
$ducklake->listFiles('events', snapshotVersion: 3); // or $ducklake->listFiles('events', snapshotTime: new DateTime());
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/ducklake-php.git cd ducklake-php composer install # Postgres createdb ducklake_php_test createdb ducklake_php_test2 CATALOG=postgres composer test # MySQL and MariaDB mysqladmin create ducklake_php_test mysqladmin create ducklake_php_test2 CATALOG=mysql composer test # SQLite CATALOG=sqlite composer test # DuckDB CATALOG=duckdb composer test