kolay/xlsx-stream

High-performance XLSX streaming writer for Laravel with zero disk I/O and direct S3 support

v1.0.1 2025-09-10 11:52 UTC

This package is auto-updated.

Last update: 2025-09-10 11:54:24 UTC


README

Latest Version on Packagist Total Downloads License PHP Version

High-performance XLSX streaming writer for Laravel with zero disk I/O and direct S3 support. Perfect for exporting millions of rows without memory issues.

Why This Package?

The Problem with Existing Solutions

Most PHP Excel libraries (PHPSpreadsheet, Spout, Laravel Excel) have critical limitations:

  • Memory Issues: Load entire documents in RAM (unusable for large files)
  • Disk I/O: Write temporary files then upload to S3 (2x I/O, slow)
  • No True Streaming: Can't stream directly to S3

Our Solution

  • Zero Disk I/O: Direct streaming to S3 using multipart upload
  • Constant Memory: O(1) memory usage - only 32MB buffer regardless of file size
  • Blazing Fast: 2-3x faster than alternatives
  • Production Tested: Successfully exported 4.65 million rows (500MB+ files)

Performance Comparison

Comprehensive Benchmark Results (September 2025)

Rows Local Speed Local Memory Local Time S3 Speed S3 Memory S3 Time File Size
100 47,913 rows/s 2 MB 0.00s 99 rows/s 0 MB 1.01s 0.01 MB
500 161,183 rows/s 0 MB 0.00s 362 rows/s 0 MB 1.38s 0.02 MB
1,000 161,711 rows/s 0 MB 0.01s 913 rows/s 0 MB 1.09s 0.04 MB
5,000 167,101 rows/s 0 MB 0.03s 3,092 rows/s 0 MB 1.62s 0.2 MB
10,000 183,281 rows/s 0 MB 0.05s 5,411 rows/s 0 MB 1.85s 0.4 MB
25,000 187,322 rows/s 0 MB 0.13s 11,482 rows/s 0 MB 2.18s 1 MB
50,000 187,455 rows/s 2 MB 0.27s 5,829 rows/s 2 MB 8.58s 2 MB
100,000 182,167 rows/s 0 MB 0.55s 9,288 rows/s 4 MB 10.77s 4 MB
250,000 185,586 rows/s 0 MB 1.35s 59,744 rows/s 12 MB (±6) 4.18s 10 MB
500,000 184,268 rows/s 0 MB 2.71s 28,553 rows/s 20 MB (±18) 17.51s 20 MB
750,000 182,648 rows/s 0 MB 4.11s 33,504 rows/s 30 MB (±26) 22.39s 30 MB
1,000,000 182,693 rows/s 0 MB 5.47s 43,215 rows/s 40 MB (±38) 23.14s 40 MB
1,500,000 180,578 rows/s 0 MB 8.31s 36,733 rows/s 60 MB (±58) 40.84s 60 MB
2,000,000 177,012 rows/s 0 MB 11.30s 51,323 rows/s 79 MB (±77) 38.97s 79 MB
3,000,000 - - - 39,150 rows/s 117 MB (±117) 76.63s 119 MB
4,000,000 - - - 42,500 rows/s 160 MB (±156) 94.12s 158 MB
4,500,000 - - - 46,462 rows/s 178 MB (±178) 96.85s 178 MB

Note: Tests with 1M+ rows automatically create multiple sheets (Excel limit: 1,048,576 rows per sheet)
Note: ± values in S3 Memory column indicate memory fluctuation during streaming due to periodic part uploads

Understanding Memory Behavior

Local File System

  • True O(1) Memory: Constant memory usage regardless of file size
  • No Growth: Memory stays at 0-2MB even for millions of rows
  • Speed: 180,000+ rows/second consistently

S3 Streaming Memory Fluctuation

The ± values in S3 memory represent normal memory fluctuation during streaming:

  1. Buffer Accumulation Phase (↑ Memory Growth)

    • Data is compressed and buffered until reaching 32MB
    • Memory grows gradually as buffer fills
  2. Part Upload Phase (↓ Memory Drop)

    • When buffer reaches 32MB, it's uploaded to S3
    • After upload, memory drops back to baseline
    • This creates the characteristic sawtooth pattern
  3. Example: 1M Rows Test

    • Average memory: 40MB
    • Fluctuation: ±38MB
    • Pattern: Memory oscillates between ~2MB (after upload) and ~78MB (before upload)
    • This is completely normal and expected behavior

Performance Highlights

  • Local File System: ~180,000 rows/second with true O(1) memory
  • S3 Streaming: 30,000-50,000 rows/second with periodic memory fluctuation
  • Memory Efficiency: Local uses <2MB, S3 averages 40MB per million rows
  • Multi-sheet Support: Automatic sheet creation at Excel's 1,048,576 row limit
  • Production Ready: Successfully tested with 4.5 million rows

Comparison with Other Libraries

Package 1M Rows Time Memory Usage Disk Usage S3 Support
PHPSpreadsheet ❌ Crashes ~8GB Full file Indirect
Spout ~60 sec ~100MB+ Full file Indirect
Laravel Excel ~90 sec ~500MB+ Full file Indirect
Kolay XLSX Stream (Local) 5.5 sec 0 MB Zero N/A
Kolay XLSX Stream (S3) 23 sec 40MB avg Zero Direct

Requirements

  • PHP 8.0+
  • Laravel 9.0+ (also supports Laravel 10 & 11)
  • AWS SDK (only if using S3 streaming)

Installation

composer require kolay/xlsx-stream

Publish Configuration (Optional)

php artisan vendor:publish --tag=xlsx-stream-config

Quick Start

Basic Usage - Local File

use Kolay\XlsxStream\Writers\SinkableXlsxWriter;
use Kolay\XlsxStream\Sinks\FileSink;

// Create writer with file sink
$sink = new FileSink('/path/to/output.xlsx');
$writer = new SinkableXlsxWriter($sink);

// Set headers
$writer->startFile(['Name', 'Email', 'Phone']);

// Write rows
$writer->writeRow(['John Doe', 'john@example.com', '+1234567890']);
$writer->writeRow(['Jane Smith', 'jane@example.com', '+0987654321']);

// Or write multiple rows at once
$writer->writeRows([
    ['Bob Johnson', 'bob@example.com', '+1111111111'],
    ['Alice Brown', 'alice@example.com', '+2222222222'],
]);

// Finish and close file
$stats = $writer->finishFile();

echo "Generated {$stats['rows']} rows in {$stats['sheets']} sheet(s)";

Direct S3 Streaming (Zero Disk I/O)

use Kolay\XlsxStream\Writers\SinkableXlsxWriter;
use Kolay\XlsxStream\Sinks\S3MultipartSink;
use Aws\S3\S3Client;

// Create S3 client
$s3Client = new S3Client([
    'region' => 'us-east-1',
    'version' => 'latest',
    'credentials' => [
        'key' => env('AWS_ACCESS_KEY_ID'),
        'secret' => env('AWS_SECRET_ACCESS_KEY'),
    ],
]);

// Create S3 sink with 32MB parts
$sink = new S3MultipartSink(
    $s3Client,
    'my-bucket',
    'exports/report.xlsx',
    32 * 1024 * 1024 // 32MB parts for optimal performance
);

$writer = new SinkableXlsxWriter($sink);

// Configure for maximum performance
$writer->setCompressionLevel(1)      // Fastest compression
       ->setBufferFlushInterval(10000); // Flush every 10K rows

$writer->startFile(['ID', 'Name', 'Email', 'Status']);

// Stream millions of rows with constant 32MB memory
User::query()
    ->select(['id', 'name', 'email', 'status'])
    ->chunkById(1000, function ($users) use ($writer) {
        foreach ($users as $user) {
            $writer->writeRow([
                $user->id,
                $user->name,
                $user->email,
                $user->status
            ]);
        }
    });

$stats = $writer->finishFile();

Laravel Job Example

<?php

namespace App\Jobs;

use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
use Kolay\XlsxStream\Writers\SinkableXlsxWriter;
use Kolay\XlsxStream\Sinks\S3MultipartSink;
use Aws\S3\S3Client;
use App\Models\User;

class ExportUsersJob implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public function handle()
    {
        // Create S3 client from Laravel config
        $s3Config = config('filesystems.disks.s3');
        $s3Client = new S3Client([
            'region' => $s3Config['region'],
            'version' => 'latest',
            'credentials' => [
                'key' => $s3Config['key'],
                'secret' => $s3Config['secret'],
            ],
        ]);
        
        // Setup S3 streaming
        $filename = 'exports/users-' . now()->format('Y-m-d-H-i-s') . '.xlsx';
        $sink = new S3MultipartSink(
            $s3Client,
            $s3Config['bucket'],
            $filename,
            32 * 1024 * 1024
        );
        
        $writer = new SinkableXlsxWriter($sink);
        $writer->setCompressionLevel(1)
               ->setBufferFlushInterval(10000);
        
        // Export headers
        $writer->startFile([
            'ID',
            'Name',
            'Email',
            'Created At',
            'Status'
        ]);
        
        // Export data with chunking
        User::query()
            ->orderBy('id')
            ->chunkById(1000, function ($users) use ($writer) {
                foreach ($users as $user) {
                    $writer->writeRow([
                        $user->id,
                        $user->name,
                        $user->email,
                        $user->created_at->format('Y-m-d H:i:s'),
                        $user->status
                    ]);
                }
                
                // Clear Eloquent cache periodically (optional)
                // You can track rows externally if needed
            });
        
        $stats = $writer->finishFile();
        
        \Log::info('Export completed', [
            'filename' => $filename,
            'rows' => $stats['rows'],
            'sheets' => $stats['sheets'],
            'bytes' => $stats['bytes']
        ]);
    }
}

Advanced Features

Multi-Sheet Support (Automatic)

The writer automatically creates new sheets when reaching Excel's row limit (1,048,576 rows):

$writer = new SinkableXlsxWriter($sink);
$writer->startFile(['Column1', 'Column2']);

// Write 2 million rows - will create 2 sheets automatically
for ($i = 1; $i <= 2000000; $i++) {
    $writer->writeRow(["Row $i", "Data $i"]);
}

$stats = $writer->finishFile();
// $stats['sheets'] = 2

Performance Tuning

// Ultra-fast mode for maximum speed
$writer->setCompressionLevel(1)        // Minimal compression
       ->setBufferFlushInterval(50000); // Large buffer

// Balanced mode (default)
$writer->setCompressionLevel(6)        // Balanced compression
       ->setBufferFlushInterval(10000); // Medium buffer

// Maximum compression (slower, smaller files)
$writer->setCompressionLevel(9)        // Maximum compression
       ->setBufferFlushInterval(1000);  // Small buffer for streaming

Custom S3 Parameters

$sink = new S3MultipartSink(
    $s3Client,
    'my-bucket',
    'path/to/file.xlsx',
    32 * 1024 * 1024,
    [
        'ACL' => 'public-read',
        'ContentType' => 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
        'ContentDisposition' => 'attachment; filename="report.xlsx"',
        'Metadata' => [
            'generated-by' => 'kolay-xlsx-stream',
            'timestamp' => time()
        ]
    ]
);

Error Handling

try {
    $writer = new SinkableXlsxWriter($sink);
    $writer->startFile(['Column1', 'Column2']);
    
    // Write data...
    
    $stats = $writer->finishFile();
} catch (\Exception $e) {
    // The sink will automatically abort and cleanup on error
    // S3 multipart uploads are automatically aborted
    // Partial files are deleted
    
    \Log::error('Export failed: ' . $e->getMessage());
}

Configuration

Published configuration file (config/xlsx-stream.php):

return [
    's3' => [
        'part_size' => 32 * 1024 * 1024,  // S3 multipart chunk size
        'retry_attempts' => 3,             // Retry failed uploads
        'retry_delay_ms' => 100,           // Delay between retries
    ],
    
    'writer' => [
        'compression_level' => 1,          // 1-9 (1=fastest, 9=smallest)
        'buffer_flush_interval' => 10000,  // Rows to buffer
        'max_rows_per_sheet' => 1048575,   // Excel limit - 1
    ],
    
    'memory' => [
        'file_buffer_size' => 1024 * 1024, // 1MB file write buffer
    ],
    
    'logging' => [
        'enabled' => true,
        'channel' => null,                  // null = default channel
        'log_progress' => false,            // Log progress updates
        'progress_interval' => 10000,       // Log every N rows
    ],
];

How It Works

The Architecture

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Your App  │────▶│ XlsxWriter   │────▶│    Sink     │
└─────────────┘     └──────────────┘     └─────────────┘
                            │                     │
                    ┌───────▼────────┐   ┌───────▼────────┐
                    │ ZIP Generation │   │  Destination   │
                    │  & Compression │   │ • FileSink     │
                    └────────────────┘   │ • S3Sink       │
                                         └────────────────┘

The Magic Behind Zero Disk I/O

  1. Binary XLSX Generation

    • XLSX files are ZIP archives containing XML files
    • We generate ZIP structure directly in memory
    • No intermediate files or DOM tree building
  2. Streaming Compression

    • Data is compressed using PHP's deflate_add() in chunks
    • Each row is immediately compressed and streamed
    • No need to store uncompressed data
  3. Smart Buffering

    • Configurable row buffer (default 10,000 rows)
    • Flushes periodically to maintain streaming
    • Prevents memory accumulation
  4. S3 Multipart Upload

    • Direct streaming to S3 using multipart upload
    • 32MB parts uploaded as they're ready
    • No local file required at any point

Memory Efficiency Explained

Local Files: True O(1) Memory

// Only these stay in memory:
$rowBuffer = '';        // Current batch of rows (flushed every 10K rows)
$deflateContext = ...;  // Compression context (minimal)
$zipMetadata = [];      // File entry metadata (bytes per file)

S3 Streaming: Near O(1) with Controlled Growth

// Memory components:
$buffer = '';           // 32MB max (uploaded when full)
$parts = [];           // Part metadata (ETag + number per part)
// Growth: ~1KB per part × (filesize / 32MB) parts
// Example: 100MB file = 3 parts = ~3KB metadata

Why Memory Fluctuates in S3

The sawtooth memory pattern in S3 streaming is by design:

Memory
  ▲
78MB │     ╱╲      ╱╲      ╱╲
     │    ╱  ╲    ╱  ╲    ╱  ╲
40MB │   ╱    ╲  ╱    ╲  ╱    ╲
     │  ╱      ╲╱      ╲╱      ╲
2MB  │─╯                         
     └────────────────────────────▶ Time
       ↑Upload  ↑Upload  ↑Upload

Each peak represents a full 32MB buffer ready for upload. After upload, memory drops as the buffer is cleared. This is optimal for streaming large files.

Real-World Performance

Production Test Results (4.5 Million Rows)

Our production systems successfully export massive datasets daily:

Dataset: 4.5 million rows (Employee competency evaluation data)
File Size: 178 MB compressed XLSX
Sheets: 5 (automatic splitting at Excel limit)
Total Time: 96.85 seconds
Average Speed: 46,462 rows/second
Memory Usage: 178 MB average with ±178 MB fluctuation
Peak Memory: 356 MB (during S3 part upload)

Key Performance Metrics

Speed

  • Local Files: 180,000+ rows/second sustained
  • S3 Streaming: 30,000-50,000 rows/second
  • Network Impact: S3 speed varies with network latency and bandwidth

Memory Usage

  • Local Files: True O(1) - constant 0-2MB regardless of size
  • S3 Streaming: Averages 40MB per million rows
  • Fluctuation: Normal sawtooth pattern during S3 uploads

Scalability

  • 100 rows: Instant (< 0.01s local, ~1s S3)
  • 1 Million rows: 5.5 seconds local, 23 seconds S3
  • 4.5 Million rows: 97 seconds S3 with automatic multi-sheet
  • Memory stable: No memory leaks, predictable usage
  • Production proven: Running in production since 2025

Use Cases

Perfect for:

  • Large data exports (millions of rows)
  • Memory-constrained environments
  • Kubernetes/Docker with small pods
  • AWS Lambda functions
  • Real-time streaming exports
  • Multi-tenant SaaS applications

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Testing

composer test

License

The MIT License (MIT). Please see License File for more information.

Credits

Support

For issues and questions, please use the GitHub issue tracker.