csvtoolkit / fastcsv
High-performance PHP extension for CSV file handling with RFC 4180 compliance
Installs: 0
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Language:C
Type:php-ext
Requires
- php: >=8.2.0
This package is auto-updated.
Last update: 2025-06-28 08:10:57 UTC
README
A high-performance PHP extension for reading and writing CSV files with full RFC 4180 compliance and advanced features.
Tip: For a unified API that automatically leverages this extension when available, check out PHP-CSVHelper. It provides a consistent interface that uses FastCSV for maximum performance while gracefully falling back to SplFileObject when needed.
Features
- High Performance: Native C implementation with optimized memory management using Arena allocation
- RFC 4180 Compliant: Full compliance with CSV standard including proper quote handling and multi-line records
- Flexible API: Support for both file paths and configuration objects in constructors
- Configurable Flushing: Auto-flush mode for immediate data visibility or manual flush for maximum performance
- Advanced CSV Handling:
- Proper quote escaping and unescaping (
""
→"
) - Multi-line quoted fields support
- Configurable delimiters, quotes, and escape characters
- CRLF and LF line ending support
- Immediate or buffered write operations
- Proper quote escaping and unescaping (
- Memory Efficient: Arena-based memory management for optimal performance
- Comprehensive Testing: Extensive test suite with 100% pass rate (12/12 tests passing)
Classes
FastCSVReader
High-performance CSV reader with advanced navigation capabilities.
// Create reader with file path $reader = new FastCSVReader('/path/to/file.csv'); // Or with configuration object $config = new FastCSVConfig(); $config->delimiter = ';'; $config->quote = '"'; $reader = new FastCSVReader($config); $reader->open('/path/to/file.csv'); // Read records while ($reader->hasNext()) { $record = $reader->nextRecord(); print_r($record); } // Navigation methods $reader->rewind(); $reader->seek(100); $position = $reader->getPosition(); $count = $reader->getRecordCount(); $headers = $reader->getHeaders();
FastCSVWriter
High-performance CSV writer with proper quoting and escaping.
// Create writer with headers $config = new FastCSVConfig(); $config->setPath('/path/to/output.csv'); $writer = new FastCSVWriter($config, ['Name', 'Age', 'City']); // Write records (auto-flushed by default) $writer->writeRecord(['John Doe', '30', 'New York']); $writer->writeRecord(['Jane Smith', '25', 'Los Angeles']); // For high-performance scenarios, disable auto-flush $config->setAutoFlush(false); $writer = new FastCSVWriter($config, ['ID', 'Data']); for ($i = 0; $i < 100000; $i++) { $writer->writeRecord([$i, "Data$i"]); // Manual flush every 1000 records for optimal performance if ($i % 1000 == 0) { $writer->flush(); } } $writer->close(); // Final flush on close
FastCSVConfig
Configuration class for customizing CSV parsing and writing behavior.
$config = new FastCSVConfig(); $config->setDelimiter(';'); // Field delimiter (default: ',') $config->setEnclosure('"'); // Quote character (default: '"') $config->setEscape('\\'); // Escape character (default: '\\') $config->setHasHeader(true); // First row contains headers (default: true) $config->setAutoFlush(true); // Auto-flush after each write (default: true) $config->setStrictMode(false); // Strict quoting mode (default: false) $config->setSkipEmptyLines(false); // Skip empty lines (default: false) $config->setTrimFields(false); // Trim whitespace from fields (default: false) $config->setWriteBOM(false); // Write BOM for Unicode files (default: false)
Installation
Recommended: Using PIE (PHP Installer for Extensions)
The easiest way to install FastCSV is using PIE:
# Install the latest version pie install csvtoolkit/fastcsv # Install a specific version pie install csvtoolkit/fastcsv:^0.0.1
For detailed PIE installation instructions, see PIE_INSTALLATION.md.
Manual Installation
Prerequisites
- PHP 8.2 or higher
- GCC or compatible C compiler
- PHP development headers (
php-dev
package)
From Source
# Clone the repository git clone <repository-url> cd fastcsv-extension # Initialize submodules (for the lib directory) git submodule update --init --recursive # Build the extension phpize ./configure make make install
Enable Extension
Add to your php.ini
:
extension=fastcsv
Or load dynamically:
dl('fastcsv.so'); // Linux/macOS dl('fastcsv.dll'); // Windows
Development
Building
The extension uses an external C library located in the lib/
directory (git submodule). The build process automatically includes:
lib/arena.c
- Arena memory managementlib/csv_parser.c
- Core CSV parsing logiclib/csv_reader.c
- CSV reader implementationlib/csv_writer.c
- CSV writer implementationlib/csv_utils.c
- Utility functions
Testing
Run the test suite:
# Run all tests make test # Run specific test php run-tests.php tests/FastCSVReader_001.phpt # Run with verbose output php run-tests.php -v tests/
Current Test Status
- Passing: 12/12 tests (100% success rate)
- Comprehensive Coverage: Edge cases, performance, memory management, and RFC 4180 compliance
- Stable: All tests consistently pass across different environments
Technical Details
Architecture
The extension is built on top of a high-performance C library with the following components:
- Arena Memory Management: Efficient memory allocation and cleanup
- Character-by-Character Parsing: Proper handling of quoted fields and line endings
- State Machine Parser: RFC 4180 compliant parsing with quote state tracking
- Configurable Parameters: Flexible delimiter, quote, and escape character support
Memory Management
The extension uses Arena-based memory allocation for optimal performance:
- Each reader/writer instance maintains its own arena
- Automatic cleanup when objects are destroyed
- Minimal memory fragmentation
- Efficient bulk allocations
RFC 4180 Compliance
Full compliance with CSV standard including:
- Proper quote character handling
- Multi-line field support
- CRLF and LF line ending support
- Quote escaping (
""
becomes"
) - Whitespace preservation in quoted fields
Performance
The extension is optimized for high-performance CSV processing with significant improvements over PHP's native SplFileObject:
Benchmark Results (PHP 8.4.8, 1GB Memory Limit)
Read Operations Performance
Data Size | Implementation | Median Time (ms) | Throughput (records/sec) | Speed Improvement |
---|---|---|---|---|
Small (1K rows) | FastCSV | 3.67 | 272,410 | 4.1x faster |
SplFileObject | 15.03 | 66,520 | ||
Medium (100K rows) | FastCSV | 176.04 | 568,049 | 3.6x faster |
SplFileObject | 639.51 | 156,370 | ||
Large (1M rows) | FastCSV | 1,987.23 | 503,212 | 4.8x faster |
SplFileObject | 9,468.64 | 105,612 |
Combined Read/Write Operations
Data Size | Implementation | Median Time (ms) | Throughput (records/sec) | Speed Improvement |
---|---|---|---|---|
Small (1K rows) | FastCSV | 22.76 | 87,870 | 1.6x faster |
SplFileObject | 35.5 | 56,341 | ||
Medium (100K rows) | FastCSV | 590.78 | 338,535 | 2.5x faster |
SplFileObject | 1,469.31 | 136,118 | ||
Large (1M rows) | FastCSV | 7,088.8 | 282,135 | 2.9x faster |
SplFileObject | 20,513.19 | 97,498 |
Key Performance Advantages
- Read Operations: 3.6x to 4.8x performance improvement over SplFileObject
- Combined Operations: 1.6x to 2.9x advantage for read/write operations
- Scalability: Performance advantage increases with data size
- Memory Efficiency: Constant ~2MB memory usage regardless of file size
- Consistency: Lower standard deviation, indicating more predictable performance
Technical Performance Features
- Native C Implementation: Direct memory access and optimized algorithms
- Arena Memory Management: Efficient bulk allocations and minimal fragmentation
- Streaming Operations: Constant memory usage regardless of file size
- RFC 4180 Optimized: Fast parsing with proper quote handling
Contributing
See CONTRIBUTING.md for development guidelines.
License
This project is licensed under the terms specified in the LICENSE file.
Changelog
Recent Updates
- Arena Integration: Updated to use Arena memory management system
- API Improvements: Flexible constructors accepting both strings and config objects
- RFC 4180 Compliance: Complete rewrite of parser for standards compliance
- Multi-line Support: Proper handling of quoted fields spanning multiple lines
- Error Handling: Improved PHP error reporting and exception handling
- Test Coverage: Comprehensive test suite with extensive edge case coverage
For the latest updates and bug reports, please check the project's issue tracker.