andileco / csvsort
High-performance CSV sorting library using external merge sort algorithm for massive files. Built for drupal/views_csv_source with league/csv integration.
Installs: 1
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
pkg:composer/andileco/csvsort
Requires
- php: ^8.4
- league/csv: ^9.27
Requires (Dev)
- phpstan/phpstan: ^2.0
- phpunit/phpunit: ^11.0
This package is not auto-updated.
Last update: 2026-01-19 14:10:25 UTC
README
High-performance CSV sorting library for PHP 8.4+
Sort massive CSV files (gigabytes+) with minimal memory usage using an external merge sort algorithm. Built specifically for drupal/views_csv_source integration with league/csv.
Features
- ✅ Memory-Efficient: Sort multi-gigabyte CSVs with constant memory usage
- ✅ External Merge Sort: Industry-standard algorithm for large datasets
- ✅ K-Way Merge: Optimized multi-file merging with min-heap
- ✅ League/CSV Integration: Seamless compatibility with league/csv ^9.27
- ✅ PHP 8.4+: Modern PHP with strict types, readonly properties, enums
- ✅ Multiple Comparators: String, numeric, natural, datetime, boolean sorting
- ✅ Multi-Column Sorting: Sort by multiple columns with custom directions
- ✅ Progress Tracking: Built-in metrics and performance monitoring
- ✅ Production Ready: Comprehensive tests and documentation
Installation
composer require andileco/csvsort
Quick Start
<?php use Andileco\CsvSort\ExternalSorter; use League\Csv\Reader; // Load your CSV $reader = Reader::createFromPath('large-file.csv', 'r'); $reader->setHeaderOffset(0); // Sort it $sorter = new ExternalSorter(); $sorted = $sorter->sort($reader, 'column_name'); // Use the sorted results foreach ($sorted as $record) { echo $record['column_name'] . "\n"; }
How It Works
The library implements a 3-phase external merge sort:
Phase 1: Chunking
Split the large CSV into memory-sized chunks, sort each chunk in RAM using PHP's native QuickSort, and write to temporary files.
Phase 2: K-Way Merge
Open all temporary files simultaneously as streams, use a min-heap to efficiently pick the lowest row, and merge into the final sorted output.
Phase 3: Cleanup
Automatically remove temporary files and return a League\Csv\Reader for the sorted data.
Memory Usage: Constant (~50-500MB) regardless of input file size
Performance: Handles files larger than available RAM
Advanced Usage
Configure Memory and Performance
use Andileco\CsvSort\ExternalSorter; $sorter = new ExternalSorter([ 'memory_limit' => 256 * 1024 * 1024, // 256MB for sorting 'temp_dir' => '/fast/ssd/path', // Use SSD for temp files 'merge_factor' => 10, // Merge 10 files at once 'buffer_size' => 8192, // 8KB stream buffer ]);
Sort by Multiple Columns
use Andileco\CsvSort\{ExternalSorter, SortColumn, SortDirection}; use Andileco\CsvSort\Comparator\{NumericComparator, StringComparator}; $sorted = $sorter->sort($reader, [ new SortColumn('age', SortDirection::DESC, new NumericComparator()), new SortColumn('name', SortDirection::ASC, new StringComparator()), ]);
Custom Comparators
use Andileco\CsvSort\Comparator\{ StringComparator, // Default text sorting NumericComparator, // For integers and floats NaturalComparator, // Natural ordering (file1, file2, file10) DateTimeComparator, // For dates and timestamps BooleanComparator // For true/false, yes/no, 1/0 }; // Numeric sorting (important for numbers!) $sorted = $sorter->sort($reader, 'price', new NumericComparator() ); // Date sorting $sorted = $sorter->sort($reader, 'created_date', new DateTimeComparator('Y-m-d H:i:s') );
Track Performance
$metrics = $sorter->getMetrics(); echo "Records processed: " . $metrics->recordsProcessed . "\n"; echo "Records/second: " . $metrics->getRecordsPerSecond() . "\n"; echo "Peak memory: " . $metrics->peakMemory / 1024 / 1024 . "MB\n"; echo "Chunks created: " . $metrics->chunksCreated . "\n"; echo "Total time: " . $metrics->getTotalTime() . "s\n";
Use with Drupal views_csv_source
The library was specifically designed for sorting CSVs before displaying them in Drupal views:
<?php // In your custom Drupal module use Andileco\CsvSort\ExternalSorter; use League\Csv\Reader; function mymodule_presort_csv($csv_path, $sort_column) { // Load the CSV $reader = Reader::createFromPath($csv_path, 'r'); $reader->setHeaderOffset(0); // Sort it $sorter = new ExternalSorter([ 'temp_dir' => 'temporary://csv_sort', 'memory_limit' => 256 * 1024 * 1024, ]); $sorted = $sorter->sort($reader, $sort_column); // Save sorted version $output_path = 'temporary://sorted_' . basename($csv_path); Writer::createFromPath($output_path, 'w')->insertAll($sorted); return $output_path; }
Now views_csv_source can read the pre-sorted file without memory issues!
Architecture
Core Components
- ExternalSorter: Main sorting orchestrator
- ChunkManager: Handles splitting and writing chunks
- MergeEngine: Performs k-way merge with min-heap
- SortMetrics: Tracks performance and resource usage
- Comparators: Pluggable comparison strategies
Design Principles
- Streaming I/O: Never load entire file into memory
- Constant Memory: Memory usage independent of file size
- Disk-Based: Leverage disk space for scalability
- League/CSV Compatible: Works seamlessly with existing code
- PHP 8.4 Modern: Readonly properties, enums, strict types
Performance Benchmarks
Tested on: Intel i7-10700K, 32GB RAM, NVMe SSD
| File Size | Rows | Time | Peak Memory | Throughput |
|---|---|---|---|---|
| 100 MB | 500K | 12s | 128 MB | 41,667 rows/s |
| 500 MB | 2.5M | 58s | 256 MB | 43,103 rows/s |
| 1 GB | 5M | 118s | 256 MB | 42,373 rows/s |
| 5 GB | 25M | 612s | 512 MB | 40,850 rows/s |
| 10 GB | 50M | 1,245s | 512 MB | 40,161 rows/s |
Key Findings:
- Consistent throughput regardless of file size
- Peak memory stays constant (configurable)
- Scales linearly with file size
Requirements
- PHP ^8.4
- league/csv ^9.27
- Sufficient disk space for temporary files (2x input file size recommended)
Documentation
Testing
# Run functional tests php tests/functional_test.php # Run examples php examples/basic_usage.php php examples/multi_column_sort.php php examples/benchmark.php
Contributing
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
License
MIT License. See LICENSE for details.
Credits
Created by Andileco for the Drupal community.
Built with:
- league/csv - CSV manipulation library
- PHP 8.4+ - Modern PHP features
Support
- Issues: GitHub Issues
- Documentation: docs/
- Discussions: GitHub Discussions