betoalien / pardox-php
High-Performance DataFrame Engine powered by Rust (The PardoX Project)
Installs: 3
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
pkg:composer/betoalien/pardox-php
Requires
- php: >=8.1
- ext-ffi: *
README
The Speed of Rust. The Simplicity of PHP.
PardoX is a high-performance DataFrame engine for PHP. A single Rust core handles all computation — CSV parsing, arithmetic, database I/O, and sorting — exposed to PHP through the native FFI extension. No Python. No Node. No middleware.
v0.3.1 is now available. Native database connectivity for PostgreSQL, MySQL, SQL Server, and MongoDB. Full Observer export. GPU sort. All from PHP.
⚡ Why PardoX for PHP?
| Capability | How |
|---|---|
| Zero-copy ingestion | Multi-threaded Rust CSV parser — no PHP string processing |
| SIMD arithmetic | AVX2 / NEON — 5x–20x faster than PHP loops |
| Native database I/O | Rust drivers for PostgreSQL, MySQL, SQL Server, MongoDB — no PHP extensions needed |
| GPU sort | WebGPU Bitonic sort with transparent CPU fallback |
| No dependencies | Only requires PHP's built-in ext-ffi and ext-json |
| Cross-platform | Linux x64 · Windows x64 · macOS Intel · macOS Apple Silicon |
📦 Installation
composer require pardox/pardox
Requirements:
- PHP 7.4 or higher
ext-ffienabled inphp.iniext-json(standard, usually enabled by default)
Enable the FFI extension:
; php.ini extension=ffi ffi.enable=true
🚀 Quick Start
<?php require_once 'vendor/autoload.php'; use PardoX\DataFrame; use PardoX\IO; // Load 50,000 rows — parallel Rust CSV parser $df = DataFrame::read_csv('./sales_data.csv'); echo "Loaded " . number_format($df->shape[0]) . " rows\n"; // SIMD-accelerated arithmetic $meanDiscount = $df->mean('discount'); echo sprintf("Mean discount: %.4f\n", $meanDiscount); // Value frequency table $stateCounts = $df->value_counts('state'); echo "Unique states: " . count($stateCounts) . "\n"; arsort($stateCounts); foreach (array_slice($stateCounts, 0, 3, true) as $state => $count) { echo " $state: $count\n"; } // Write to MySQL — chunked batch INSERT, auto LOAD DATA if server allows $MYSQL = 'mysql://user:password@localhost:3306/mydb'; IO::executeMysql($MYSQL, 'CREATE TABLE IF NOT EXISTS sales (price DOUBLE, quantity DOUBLE)'); $rows = $df->to_mysql($MYSQL, 'sales', 'append'); echo "Written $rows rows to MySQL\n";
🗄️ What's New in v0.3.1
1. Relational Conqueror — Native Database I/O
Connect to PostgreSQL, MySQL, SQL Server, and MongoDB entirely through the Rust core. No PDO, no mysqli, no MongoDB PHP extension required.
use PardoX\DataFrame; use PardoX\IO; // ── PostgreSQL ──────────────────────────────────────────────── $PG = 'postgresql://user:pass@localhost:5432/mydb'; $df = IO::read_sql($PG, 'SELECT * FROM orders WHERE status = \'active\''); IO::executeSql($PG, 'DROP TABLE IF EXISTS orders_archive'); IO::executeSql($PG, 'CREATE TABLE orders_archive (id BIGINT, amount FLOAT, region TEXT)'); // COPY FROM STDIN auto-activated for > 10,000 rows $rows = $df->to_sql($PG, 'orders_archive', 'append'); // Upsert — INSERT … ON CONFLICT DO UPDATE $rows = $df->to_sql($PG, 'orders_archive', 'upsert', ['id']); // ── MySQL ───────────────────────────────────────────────────── $MY = 'mysql://user:pass@localhost:3306/mydb'; $df = IO::read_mysql($MY, 'SELECT * FROM products WHERE active = 1'); IO::executeMysql($MY, 'CREATE TABLE IF NOT EXISTS products_bak (id BIGINT, price DOUBLE)'); $rows = $df->to_mysql($MY, 'products_bak', 'append'); // chunked INSERT 1k/stmt $rows = $df->to_mysql($MY, 'products_bak', 'replace'); // REPLACE INTO $rows = $df->to_mysql($MY, 'products_bak', 'upsert', ['id']); // ON DUPLICATE KEY UPDATE // ── SQL Server ──────────────────────────────────────────────── $MS = 'Server=localhost,1433;Database=mydb;UID=sa;PWD=MyPwd;TrustServerCertificate=Yes'; $df = IO::read_sqlserver($MS, 'SELECT TOP 5000 * FROM dbo.transactions'); IO::executeSqlserver($MS, 'DROP TABLE IF EXISTS dbo.transactions_bak'); $rows = $df->to_sqlserver($MS, 'dbo.transactions_bak', 'append'); // 500 rows/stmt batch INSERT $rows = $df->to_sqlserver($MS, 'dbo.transactions_bak', 'upsert', ['id']); // MERGE INTO // ── MongoDB ─────────────────────────────────────────────────── $MG = 'mongodb://admin:pass@localhost:27017'; $df = IO::read_mongodb($MG, 'mydb.orders'); IO::executeMongodb($MG, 'mydb', '{"drop": "orders_archive"}'); $rows = $df->to_mongodb($MG, 'mydb.orders_archive', 'append'); // 10k docs/batch, ordered:false $rows = $df->to_mongodb($MG, 'mydb.orders_archive', 'replace'); // drop + insert
Write modes:
| Database | append |
replace |
upsert |
|---|---|---|---|
| PostgreSQL | INSERT (COPY for >10k rows) | — | ON CONFLICT DO UPDATE |
| MySQL | INSERT 1k/stmt (LOAD DATA for >10k) | REPLACE INTO | ON DUPLICATE KEY UPDATE |
| SQL Server | INSERT 500/stmt | INSERT 500/stmt | MERGE INTO |
| MongoDB | insert_many 10k/batch | drop + insert_many | — |
Note on SQL Server passwords: Avoid using
!in SQL Server passwords. A known issue in the tiberius v0.12 Rust driver causes authentication failure when!is present when connecting via TCP. Use only[A-Za-z0-9_\-@#$]. Fix planned for v0.3.2.
2. The Observer — Full DataFrame Export & EDA
// Value frequency table $stateCounts = $df->value_counts('state'); // ['TX' => 6345, 'CA' => 6301, ...] // Unique values $categories = $df->unique('category'); // ['Electronics', 'Books', ...] // Full export $records = $df->to_dict(); // array of associative arrays $json = $df->to_json(); // JSON string "[{...}, ...]" $matrix = $df->tolist(); // array of arrays (values only)
3. Native Math Foundation
// DataFrame arithmetic methods — return new DataFrame $revenueDf = $df->mul('price', 'quantity'); // result column: 'result_mul' $profitDf = $df->sub('revenue', 'cost'); // result column: 'result_sub' $totalDf = $df->add('amount', 'tax'); // result column: 'result_add' // Standard deviation (scalar) $std = $revenueDf->std('result_mul'); echo sprintf("Revenue std dev: %.2f\n", $std); // Min-Max normalization to [0, 1] $normedDf = $df->min_max_scale('price'); // result column: 'result_minmax' // Sort $sorted = $df->sort_values('price', false); // descending, CPU sort
4. GPU Sort
// GPU Bitonic sort — falls back to CPU silently if GPU unavailable $sorted = $df->sort_values('revenue', true, true); // ($by, $ascending, $gpu)
📋 Full API Overview
DataFrame — Factory Methods
$df = DataFrame::read_csv('./file.csv'); $df = DataFrame::read_csv('./file.csv', ['price' => 'Float64']); // with schema $df = IO::read_sql($pgConn, 'SELECT * FROM orders'); $df = IO::read_mysql($myConn, 'SELECT * FROM products'); $df = IO::read_sqlserver($msConn, 'SELECT TOP 100 * FROM dbo.t'); $df = IO::read_mongodb($mgConn, 'mydb.collection');
DataFrame — Properties
$df->shape // [rows, cols] $df->columns // ['col1', 'col2', ...] $df->dtypes // ['col1' => 'Float64', ...]
DataFrame — Inspection
$df->show(10); // ASCII table (stdout) $df->head(5); // → DataFrame $df->tail(5); // → DataFrame $df->iloc(0, 100); // → DataFrame (rows 0–99)
DataFrame — Arithmetic & Transform
$df->cast('quantity', 'Float64'); $df->fillna(0.0); $df->round(2); $df->mul('price', 'quantity'); // → DataFrame with 'result_mul' $df->sub('revenue', 'cost'); // → DataFrame with 'result_sub' $df->add('amount', 'tax'); // → DataFrame with 'result_add' $df->std('column'); // float $df->min_max_scale('col'); // → DataFrame with 'result_minmax' $df->sort_values('col', true); // → DataFrame (ascending) $df->sort_values('col', false, true); // → DataFrame (descending, GPU)
DataFrame — Series Operators
$total = $df['price'] * $df['quantity']; // → Series $net = $df['revenue'] - $df['discount']; // → Series $df['total'] = $df['price'] * $df['quantity']; // assign new column
Series — Aggregations
$df['col']->sum(); // float $df['col']->mean(); // float $df['col']->min(); // float $df['col']->max(); // float $df['col']->std(); // float $df['col']->count(); // int
Observer
$df->value_counts('col'); // ['value' => count, ...] $df->unique('col'); // ['val1', 'val2', ...] $df->to_dict(); // [['col' => val, ...], ...] $df->to_json(); // string $df->tolist(); // [[val, val, ...], ...]
Write
$df->to_prdx('./out.prdx'); $df->to_csv('./out.csv'); $df->to_sql($pgConn, 'table', 'append', ['id']); $df->to_mysql($myConn, 'table', 'upsert', ['id']); $df->to_sqlserver($msConn, 'dbo.table', 'append'); $df->to_mongodb($mgConn, 'db.collection', 'append');
IO — Static Helpers
IO::executeSql($pgConn, 'CREATE TABLE ...'); IO::executeMysql($myConn, 'DROP TABLE IF EXISTS ...'); IO::executeSqlserver($msConn, 'TRUNCATE TABLE dbo.staging'); IO::executeMongodb($mgConn, 'mydb', '{"drop": "col"}');
📊 Benchmarks
| Operation | PHP PDO | PardoX v0.3.1 | Speedup |
|---|---|---|---|
| Read CSV (1 GB) | ~12s | ~0.8s | ~15x |
| Column multiply (1M rows) | ~0.8s | ~0.02s | ~40x |
| PostgreSQL write 50k rows | ~20s (PDO execute) | ~0.6s (COPY) | ~33x |
| MySQL write 50k rows | ~25s (PDO execute) | ~3s (batch INSERT) | ~8x |
🗺️ Roadmap
| Version | Status | Highlights |
|---|---|---|
| v0.1 | ✅ Released | CSV, arithmetic, aggregations, .prdx format |
| v0.3.1 | ✅ Released | Databases (PG/MySQL/MSSQL/MongoDB), Observer, Math, GPU sort |
| v0.3.2 | 🔜 Planned | SQL Server ! password fix, error hierarchy, GroupBy, Parquet reader |
🌐 Platform Support
| OS | Architecture | Status |
|---|---|---|
| Linux | x86_64 | ✅ Stable |
| Windows | x86_64 | ✅ Stable |
| macOS | ARM64 (M1/M2/M3) | ✅ Stable |
| macOS | x86_64 (Intel) | ✅ Stable |
📘 Documentation
📄 License
MIT License — free for commercial and personal use.
by Alberto Cardenas
www.albertocardenas.com · www.pardox.io