ellipticmarketing / backfill
Pull a sanitized copy of your production database to local/staging environments
Requires
- php: ^8.2
- illuminate/console: ^10.0||^11.0||^12.0||^13.0
- illuminate/contracts: ^10.0||^11.0||^12.0||^13.0
- illuminate/database: ^10.0||^11.0||^12.0||^13.0
- illuminate/http: ^10.0||^11.0||^12.0||^13.0
- illuminate/routing: ^10.0||^11.0||^12.0||^13.0
- illuminate/support: ^10.0||^11.0||^12.0||^13.0
- spatie/laravel-package-tools: ^1.16
- symfony/process: ^6.0||^7.0
Requires (Dev)
- laravel/pint: ^1.14
- nunomaduro/collision: ^7.0||^8.0
- orchestra/testbench: ^8.0||^9.0||^10.0||^11.0
- pestphp/pest: ^2.0||^3.0
- pestphp/pest-plugin-laravel: ^2.0||^3.0
README
Pull a sanitized copy of your production database into local/staging environments.
Backfill is a Laravel package that creates a secure bridge between your production server and local development environments. It copies your production data into a temporary database, sanitizes sensitive information using pure SQL (no PHP row iteration), and streams the result as a SQL dump to your local machine — where it's imported via the mysql CLI for maximum speed.
Production data is never modified. The package refuses to run destructive operations outside of explicitly allowed environments.
Table of Contents
- Installation
- Quick Start
- Configuration Reference
- Commands
- Events
- How It Works
- Alternate Database Credentials
- Cleanup & Crash Recovery
- Security
- Edge Cases & Limitations
- Testing
- License
Installation
Require the package via Composer:
composer require ellipticmarketing/backfill
Laravel will auto-discover the service provider. No manual registration needed.
Publish the configuration file:
php artisan vendor:publish --tag="backfill-config"
No migrations needed. Sync history is stored in
storage/backfill-state.json, not a database table. This avoids the chicken-and-egg problem of needing a migration for a tool that overwrites your database.
Quick Start
1. Run the install command
This generates a secure token and walks you through the setup:
php artisan backfill:install
The installer will ask if you already have a Backfill token or need to generate a new one. It detects your environment and shows the right instructions — server setup on production, client setup on local/staging. It can also scan your database for columns that may need sanitization and write the rules to config/backfill.php for you.
2. Configure both environments
The install command will tell you exactly what to add. In short:
Server (production) .env:
BACKFILL_TOKEN=<generated-token> BACKFILL_SERVER_ENABLED=true
Client (local/staging) .env:
BACKFILL_TOKEN=<same-token> BACKFILL_SOURCE_URL=https://your-production-app.com
3. Set up sanitization rules
Edit config/backfill.php on the server side:
'sanitize' => [ 'users' => [ 'email' => [ 'type' => 'email', 'exclude' => ['*@yourcompany.com'], ], 'name' => ['type' => 'name'], 'password' => ['type' => 'hash'], 'phone' => ['type' => 'phone'], ], 'customers' => [ 'email' => ['type' => 'email'], 'address' => ['type' => 'address'], ], ],
4. Pull the database
On your local machine:
# First time — full sync php artisan backfill:pull --full # Subsequent times — only new/updated rows php artisan backfill:pull
That's it. Your local database now has sanitized production data.
Configuration Reference
All configuration lives in config/backfill.php. The server (production) and client (local) share the same config file — each side only reads the section relevant to it.
Authentication
'auth_token' => env('BACKFILL_TOKEN'),
A shared secret token used for all communication between server and client. This token is sent as a Bearer token in the Authorization header and validated using a timing-safe comparison (hash_equals).
Recommendations:
- Use a random string of at least 64 characters
- Never commit the token to version control
- Rotate the token periodically
Server Settings
'server' => [ 'enabled' => env('BACKFILL_SERVER_ENABLED', false), 'route_prefix' => 'api/backfill', 'middleware' => [], 'temp_strategy' => env('BACKFILL_TEMP_STRATEGY', 'database'), 'temp_database' => env('BACKFILL_TEMP_DATABASE'), 'temp_username' => env('BACKFILL_TEMP_USERNAME'), 'temp_password' => env('BACKFILL_TEMP_PASSWORD'), 'chunk_size' => 5000, ],
| Key | Default | Description |
|---|---|---|
enabled |
false |
Must be true on the production server to expose the sync API endpoints |
route_prefix |
api/backfill |
URL prefix for the sync endpoints. Endpoints are GET /{prefix}/manifest and GET /{prefix}/dump/{table} |
middleware |
[] |
Additional middleware to apply to sync routes (on top of api and token auth) |
temp_strategy |
database |
How temporary data is stored during sanitization. See Temporary Database Strategy |
temp_database |
null |
The name of the temporary database to create if using 'database' strategy. If null, a random name is generated. |
temp_username |
null |
Alternate DB username for temp operations (if your app user can't create databases). See Alternate Database Credentials |
temp_password |
null |
Password for the alternate DB user |
chunk_size |
5000 |
Number of rows read per chunk when building the SQL dump |
Client Settings
'client' => [ 'source_url' => env('BACKFILL_SOURCE_URL'), 'allowed_environments' => ['local', 'staging'], 'timeout' => 300, 'chunk_size' => 5000, ],
| Key | Default | Description |
|---|---|---|
source_url |
null |
Base URL of the production server (e.g., https://myapp.com) |
allowed_environments |
['local', 'staging'] |
The backfill:pull command will refuse to run in any environment not listed here. This is the primary safety mechanism preventing accidental overwrites of production data |
timeout |
300 |
HTTP timeout in seconds for each request to the server |
chunk_size |
5000 |
Number of rows inserted per batch during import |
Sanitization Rules
'sanitize' => [ 'table_name' => [ 'column_name' => [ 'type' => 'email', // Required: sanitization type 'exclude' => ['pattern'], // Optional: patterns to skip ], ], ],
All sanitization happens via SQL in the temporary database. No PHP iteration over rows. This is critical for performance on large tables.
Available Sanitization Types
| Type | SQL Expression | Example Output |
|---|---|---|
email |
CONCAT(UUID(), '@example.test') |
550e8400-e29b-41d4-a716-446655440000@example.test |
name |
CONCAT('User_', id) |
User_123 |
phone |
CONCAT('+1555', LPAD(id, 7, '0')) |
+15550000123 |
text |
CONCAT('text_', MD5(RAND())) |
text_a1b2c3d4e5f6... |
hash |
Static bcrypt hash of 'password' |
$2y$10$92IXUNpk... (always the same) |
null |
NULL |
NULL |
address |
CONCAT(id, ' Example St') |
123 Example St |
local_ip |
CONCAT('192.168.', FLOOR(RAND()*255), '.', FLOOR(RAND()*255)) |
192.168.42.117 |
Exclude Patterns
The exclude option lets you keep certain rows unsanitized. This is useful for preserving team accounts, test accounts, or system users.
'users' => [ 'email' => [ 'type' => 'email', 'exclude' => [ '*@yourcompany.com', // Wildcard: keep all company emails 'admin@specific.com', // Exact match 'test-*@example.com', // Prefix wildcard ], ], ],
Exclude patterns use SQL LIKE under the hood (* is converted to %). The generated SQL looks like:
UPDATE `users` SET `email` = CASE WHEN `email` LIKE '%@yourcompany.com' THEN `email` WHEN `email` LIKE 'admin@specific.com' THEN `email` WHEN `email` LIKE 'test-%@example.com' THEN `email` ELSE CONCAT(UUID(), '@example.test') END
Row Limits
'limits' => [ 'table_name' => [ 'max_rows' => 1000, // Required 'order_by' => 'created_at', // Optional (default: primary key) 'direction' => 'desc', // Optional (default: 'desc') ], ],
Limits the number of rows synced for specific tables. Useful for large log, audit, or analytics tables where you only need recent data for development.
The package resolves foreign key dependencies utilizing a Stateless Subset Resolver:
- Bottom-Up Inclusion: If a child table has no limit (e.g., you want all recent
cars), the package organically keeps all parent rows referenced by those cars, even if the parent table itself has a limit (e.g.,users). - Top-Down Exclusion: If a parent row evaluates as too old and gets discarded, any child rows referencing that orphaned parent are automatically removed as well.
This ensures perfect referential integrity, computing exact subsets via recursive subqueries directly in the database engine without PHP memory pressure.
Example:
'limits' => [ 'activity_log' => ['max_rows' => 1000, 'order_by' => 'created_at', 'direction' => 'desc'], 'telescope_entries' => ['max_rows' => 500], 'notifications' => ['max_rows' => 2000, 'order_by' => 'id'], ],
Excluded Tables
'exclude_tables' => [ 'telescope_entries', 'telescope_entries_tags', 'telescope_monitoring', 'failed_jobs', ],
Tables listed here are completely skipped during sync. No data is transferred. Use this for:
- Debug/monitoring tables (Telescope, Horizon)
- Job tables (failed_jobs, job_batches)
- Cache tables
- Any table with highly volatile data that isn't useful in development
Commands
backfill:install
Interactively set up Backfill for your environment.
php artisan backfill:install
- Token handling: If a token already exists in
.env, offers to keep it or configure a new one. Otherwise asks if you have an existing token to enter or generates a new 64-character random token. - Environment detection: Asks whether you're setting up the Server (production) or Client (local/staging) and shows the appropriate configuration.
- Auto-write to
.env: WritesBACKFILL_TOKEN,BACKFILL_SERVER_ENABLED, andBACKFILL_SOURCE_URLdirectly to your.envfile. - Config publishing: Publishes
config/backfill.phpif it doesn't exist yet. - Auto-sanitization scan: If the sanitization config is still empty (default), offers to scan the database for columns that may contain sensitive data (emails, passwords, phones, addresses, IPs) and writes suggested rules to the config file. Works on both the server (scans local DB) and client (fetches schema from the remote server).
- Connection test (client): Optionally tests the connection to the production server and reports the result.
- Idempotent: Safe to re-run. Detects existing configuration and skips writing identical values.
backfill:pull
The main command. Pulls sanitized data from the production server.
php artisan backfill:pull [options]
# Or simply
php artisan backfill [options]
| Option | Description |
|---|---|
--full |
Force a complete re-sync, ignoring the last pull timestamp. Truncates all local tables before importing. |
--tables=users,orders |
Only sync specific tables (comma-separated). |
--dry-run |
Show what would be synced without making any changes. Displays a table with row counts, sanitization status, and limit status. |
--force |
Accept all questions and warnings automatically, useful for non-interactive or automated scripts. |
--fresh |
Download fresh data even if a recent local copy exists (local caches are used by default if they are less than the time defined in local_cache_hours, which defaults to 1). |
How it works:
The pull command operates in three phases:
- Download — All SQL dump files are downloaded from the server first, without touching the local database. Downloads are saved to a temporary directory in
storage/app/. - Schema comparison — The remote column definitions from the manifest are compared against the local database schema. If there are mismatches (missing tables, extra/missing columns), a warning table is displayed and you can choose to abort and run migrations first. The downloaded data is preserved so you don't need to re-download.
- Import — Each downloaded dump file is imported into the local database.
Resuming interrupted downloads & caching:
By default, download directories are locally cached for 1 hour (configurable via local_cache_hours in config/backfill.php).
If a previous local cache exists, the command will use it, saving a full download cycle. If you need fresh data within that 1 hour window, use the --fresh flag.
If an interrupted download was detected outside of the cache window (less than 24 hours old), the command will offer a choice:
Found a recent download from 2 hours ago with 76 table dumps.
Would you like to resume importing the existing data or download a fresh copy?
[0] Resume existing download
[1] Download fresh data
Choosing "Resume" skips the download phase entirely and goes straight to schema comparison → import. This is useful if the download completed but the import was interrupted, or if you need to run migrations before importing.
Examples:
# First sync — always a full pull php artisan backfill:pull --full # Incremental sync — only new/updated data since last pull php artisan backfill:pull # Re-sync only the users and orders tables php artisan backfill:pull --full --tables=users,orders # Preview what will happen php artisan backfill:pull --dry-run
Dry run output:
📊 Dry run — tables that would be synced:
+------------------+--------+-----------+---------+------------+
| Table | Rows | Sanitized | Limited | Timestamps |
+------------------+--------+-----------+---------+------------+
| users | 5,240 | ✓ | | ✓ |
| orders | 45,000 | | | ✓ |
| activity_log | 890,000| | ✓ | ✓ |
| products | 320 | | | ✓ |
+------------------+--------+-----------+---------+------------+
backfill:status
Shows the history of past sync operations.
php artisan backfill:status
Example output:
📊 Database Sync History (last 10)
+----+-------+-------------+--------+-----------+----------+---------------------+
| ID | Mode | Status | Rows | Tables | Duration | Started |
+----+-------+-------------+--------+-----------+----------+---------------------+
| 3 | DELTA | ✅ Complete | 1,250 | 12 tables | 45s | 2024-03-15 10:30:00 |
| 2 | FULL | ✅ Complete | 85,000 | 28 tables | 3m | 2024-03-01 09:00:00 |
| 1 | FULL | ✅ Complete | 82,000 | 28 tables | 3m | 2024-02-15 14:00:00 |
+----+-------+-------------+--------+-----------+----------+---------------------+
Last successful sync: 2024-03-15T10:30:45+00:00
A delta sync will pull data created/updated after this timestamp.
backfill:cleanup
Drops orphaned temporary databases and tables left behind by failed or interrupted sync operations.
php artisan backfill:cleanup [options]
| Option | Description |
|---|---|
--force |
Skip the confirmation prompt. Required for scheduled/automated runs. |
--max-age=60 |
Only drop temp databases older than this many minutes (default: 60). Prevents killing an active sync. |
This command is automatically scheduled to run hourly on the server when BACKFILL_SERVER_ENABLED=true. You don't need to set up a cron job for it.
Events
SyncCompleted
Dispatched after a successful backfill:pull operation. Useful for clearing caches, triggering Laravel Scout re-indexing, or sending notifications.
use Elliptic\Backfill\Events\SyncCompleted; use Illuminate\Support\Facades\Event; Event::listen(function (SyncCompleted $event) { if ($event->isFullSync) { // e.g., php artisan scout:import "App\Models\User" } $syncedTables = $event->tables; $totalRows = $event->rowsSynced; });
How It Works
Architecture
Local (Client) Production (Server)
───────────────── ──────────────────────────
1. GET /manifest ─────────────────────▶ Reads INFORMATION_SCHEMA
(Bearer token) Returns table list, row counts,
◀──── JSON response── FK ordering, column metadata
── Phase 1: Download All ──────────────
2. GET /dump/users ────────────────────▶ CREATE DATABASE _backfill_temp_*
GET /dump/orders CREATE TABLE ... LIKE ...
GET /dump/... INSERT INTO temp SELECT * FROM prod
(all tables downloaded first) UPDATE temp (sanitize via SQL)
◀── streamed .sql ─── DELETE excess rows (limits)
DROP temp table
── Phase 2: Schema Comparison ─────────
3. Compare remote columns vs local
Schema::getColumnListing()
Alert on mismatches before import
── Phase 3: Import All ────────────────
4. For each downloaded .sql file:
mysql < users.sql
(or PHP fallback for non-MySQL)
5. Record pull timestamp in
storage/backfill-state.json
Error Handling & Troubleshooting
When pulling data from the server, errors are handled gracefully and presented clearly to the client:
- 500 Server Errors: If a database privilege issue or other exception occurs during temp database setup, the server catches the exception and returns the exact error as JSON. The local client reads this JSON and outputs the clear error message, rather than a raw HTML stack trace.
- 404 Not Found Handling: If the remote server returns a
404(often due to missing package installation, missingBACKFILL_SERVER_ENABLED=true, or an incorrect routing configuration), the client will automatically suggest actionable recommendations to resolve the issue.
Temporary Database Strategy
The package supports two strategies for creating the temporary workspace where data is copied and sanitized:
database strategy (default)
Creates a separate temporary database named _backfill_temp_{timestamp}_{random}.
BACKFILL_TEMP_STRATEGY=database
Pros:
- Complete isolation from production tables
- No risk of accidental production data modification
- Easy to identify and clean up
Cons:
- Requires a DB user with
CREATE DATABASE/DROP DATABASEprivileges - May need alternate credentials
tables strategy
Creates temporary tables in the same database with a _backfill_ prefix.
BACKFILL_TEMP_STRATEGY=tables
Pros:
- Works with restricted DB users (no extra privileges needed)
- Simpler setup
Cons:
- Temporary tables live alongside production tables (though they're prefixed and cleaned up)
SQL-Level Sanitization
All data sanitization happens via UPDATE statements executed directly in the database engine. This means:
- No PHP memory pressure — even a 10-million-row users table is sanitized in a single
UPDATEstatement - Database-engine speed — the DB engine processes the update natively, far faster than PHP could iterate rows
- Transactional safety — the update is atomic
For example, sanitizing emails with exclude patterns generates:
UPDATE `_backfill_temp_1234`.`users` SET `email` = CASE WHEN `email` LIKE '%@yourcompany.com' THEN `email` ELSE CONCAT(UUID(), '@example.test') END
Incremental (Delta) Sync
After your first full sync, subsequent runs use delta mode by default:
- The package checks
storage/backfill-state.jsonfor the last successful sync timestamp - The server filters the temp data:
DELETE FROM temp WHERE created_at < ? AND updated_at < ? - Only remaining (new/updated) rows are included in the dump
- The client uses
REPLACE INTOinstead ofINSERT INTOto upsert without conflicts
Requirements for delta sync:
- The table must have both
created_atandupdated_atcolumns - Tables without timestamps always get a full re-pull
Limitations:
- Deleted rows are not synced. If a row was deleted on production, it will still exist locally after a delta sync. Use
--fullperiodically to get a clean state. - Schema changes are not synced. If production adds a new column, the pull command will detect the mismatch during schema comparison and warn you before importing. Run
php artisan migratelocally first, then retry.
Foreign Key Handling
The package handles FK dependencies at three levels:
-
Import order: Tables are topologically sorted so parent tables are imported before children. Discovered automatically via
INFORMATION_SCHEMA.KEY_COLUMN_USAGE. -
Row limiting: When a parent table has a row limit, child rows referencing deleted parents are cleaned up first to prevent orphan records.
-
Import execution: FK checks are disabled during import (
SET FOREIGN_KEY_CHECKS=0) and re-enabled afterward.
Self-referencing tables (e.g., categories with parent_id) and circular references are detected and handled gracefully — cycles are broken in the topological sort.
Alternate Database Credentials
If your application's database user doesn't have CREATE DATABASE / DROP DATABASE privileges, you can configure a separate, more privileged user for temp database operations:
BACKFILL_TEMP_USERNAME=backfill_admin BACKFILL_TEMP_PASSWORD=secure-password
This user is only used for:
- Creating/dropping the temp database
- Copying tables into the temp database
- Running sanitization
UPDATEstatements on the temp database - Running
mysqldumpon the temp database
It is never used for reading schema information or touching the production database.
Recommended MySQL grant (least privilege):
-- Create the user CREATE USER 'backfill_admin'@'%' IDENTIFIED BY 'secure-password'; -- Only allow operations on databases matching the temp naming convention GRANT ALL PRIVILEGES ON `_backfill\_temp\_%`.* TO 'backfill_admin'@'%'; -- Allow reading schema info from production GRANT SELECT ON `INFORMATION_SCHEMA`.* TO 'backfill_admin'@'%'; -- Allow reading production data (for CREATE TABLE ... LIKE and INSERT ... SELECT) GRANT SELECT ON `your_production_db`.* TO 'backfill_admin'@'%'; FLUSH PRIVILEGES;
Cleanup & Crash Recovery
Temporary databases are cleaned up through multiple safety layers:
| Layer | When it runs | Catches |
|---|---|---|
finally blocks in controllers |
After each table is streamed | Normal completion and caught exceptions |
register_shutdown_function |
When PHP process exits | Fatal errors, OOM kills, uncaught exceptions |
| Hourly scheduled job | Every hour (automatic) | Server crashes, reboots, kill -9, network drops |
php artisan backfill:cleanup |
On demand | Manual cleanup when needed |
The scheduled job is automatically registered when BACKFILL_SERVER_ENABLED=true. It runs with --force --max-age=60, meaning it only drops temp databases older than 60 minutes (to avoid interfering with an active long-running sync).
How orphans are detected: The cleanup command runs SHOW DATABASES LIKE '_backfill_temp_%' and parses the Unix timestamp embedded in the database name. If it's older than the --max-age threshold, it's considered orphaned.
Security
| Protection | How |
|---|---|
| Token authentication | All API endpoints require Authorization: Bearer <token>. Validated with timing-safe hash_equals. |
| Environment guard | backfill:pull refuses to run unless app()->environment() is in the allowed_environments list. Default: local, staging. |
| No production writes | The server never modifies production data. All operations happen in a temporary database that is dropped after use. |
| SQL injection prevention | Table and column names are validated against the actual schema. Exclude patterns use parameterized LIKE clauses. |
| Transport | Use HTTPS in production. The token is sent as a Bearer token in the Authorization header. |
⚠️ Important: Always use HTTPS for your production URL. The auth token and sanitized data travel over this connection. HTTP would expose both to man-in-the-middle attacks.
Edge Cases & Limitations
| Scenario | Behavior |
|---|---|
| Table has no primary key | Uses id column as fallback. If that doesn't exist either, the table is still synced but delta upserts won't work (use --full). |
| Table has no timestamps | Delta sync falls back to full sync for that specific table. Other tables with timestamps still use delta. |
| Deleted rows on production | Not reflected in delta sync. Use --full periodically for a clean state. |
| Schema changes on production | The pull command automatically compares remote and local schemas before importing. If column mismatches are detected, you'll see a warning table and can choose to abort, run php artisan migrate, and retry. |
| Very large tables (100M+ rows) | The mysqldump + mysql import path handles this well. Consider using row limits to cap development data size. |
| Circular FK references | Detected and handled — cycles are broken in the topological sort. FK checks are disabled during import. |
| Self-referencing tables | Supported (e.g., categories.parent_id → categories.id). Self-references are excluded from FK sorting. |
| Multiple databases | Currently targets the default database connection only. Multi-database support is not yet implemented. |
| Supported Engines | The server-side sanitization and transport require MySQL or MariaDB. The package explicitly checks engine variants and will immediately fail if run against PostgreSQL, SQLite, or SQL Server. |
| Binary/blob columns | Handled natively by mysqldump. No special configuration needed. |
Testing
The package includes a test suite using Pest:
composer test
Tests cover:
- Authentication middleware (valid/invalid/missing tokens)
- Subsetting & Row Limiting (Top-down exclusions & Bottom-up inclusions)
- Sanitization SQL generation (all types + exclude patterns)
- Import service (full import, empty files, temp table name rewriting)
- Artisan commands (interactive install, environment guards, status logic)
License
MIT