sqrtspace/spacetime

High-performance PHP library for memory-efficient processing of large datasets with streaming, batching, and time/space complexity optimization

v1.0.0 2025-07-20 19:10 UTC

This package is auto-updated.

Last update: 2025-07-20 19:56:29 UTC


README

Latest Stable Version Total Downloads License PHP Version Require

Memory-efficient algorithms and data structures for PHP using Williams' √n space-time tradeoffs.

Paper Repository: github.com/sqrtspace/sqrtspace-paper

Installation

composer require sqrtspace/spacetime

Core Concepts

SpaceTime implements theoretical computer science results showing that many algorithms can achieve better memory usage by accepting slightly slower runtime. The key insight is using √n memory instead of n memory, where n is the input size.

Key Features

  • External Sorting: Sort large datasets that don't fit in memory
  • External Grouping: Group and aggregate data with minimal memory usage
  • Streaming Operations: Process files and data streams efficiently
  • Memory Pressure Handling: Automatic response to low memory conditions
  • Checkpoint/Resume: Save progress and resume long-running operations
  • Laravel Integration: Deep integration with Laravel collections and queries

Quick Start

use SqrtSpace\SpaceTime\Collections\SpaceTimeArray;
use SqrtSpace\SpaceTime\Algorithms\ExternalSort;

// Handle large arrays with automatic memory management
$array = new SpaceTimeArray();
for ($i = 0; $i < 10000000; $i++) {
    $array[] = random_int(1, 1000000);
}

// Sort large datasets using only √n memory
$sorted = ExternalSort::sort($array);

// Process in optimal chunks
foreach ($array->chunkBySqrtN() as $chunk) {
    processChunk($chunk);
}

Examples

Basic Examples

See examples/comprehensive_example.php for a complete demonstration of all features including:

  • Memory-efficient arrays and dictionaries
  • External sorting and grouping
  • Stream processing
  • CSV import/export
  • Batch processing with checkpoints
  • Memory pressure monitoring

Laravel Application

Check out examples/laravel-app/ for a complete Laravel application demonstrating:

  • Streaming API endpoints
  • Memory-efficient CSV exports
  • Background job processing with checkpoints
  • Real-time analytics with SSE
  • Production-ready configurations

See the Laravel example README for setup instructions and detailed usage.

Features

1. Memory-Efficient Collections

use SqrtSpace\SpaceTime\Collections\SpaceTimeArray;
use SqrtSpace\SpaceTime\Collections\AdaptiveDictionary;

// Adaptive array - automatically switches between memory and disk
$array = new SpaceTimeArray();
$array->setThreshold(10000); // Switch to external storage after 10k items

// Adaptive dictionary with optimal memory usage
$dict = new AdaptiveDictionary();
for ($i = 0; $i < 1000000; $i++) {
    $dict["key_$i"] = "value_$i";
}

2. External Algorithms

use SqrtSpace\SpaceTime\Algorithms\ExternalSort;
use SqrtSpace\SpaceTime\Algorithms\ExternalGroupBy;

// Sort millions of records using minimal memory
$data = getData(); // Large dataset
$sorted = ExternalSort::sort($data, fn($a, $b) => $a['date'] <=> $b['date']);

// Group by with external storage
$grouped = ExternalGroupBy::groupBy($data, fn($item) => $item['category']);

3. Streaming Operations

use SqrtSpace\SpaceTime\Streams\SpaceTimeStream;

// Process large files with bounded memory
$stream = SpaceTimeStream::fromFile('large_file.csv')
    ->map(fn($line) => str_getcsv($line))
    ->filter(fn($row) => $row[2] > 100)
    ->chunkBySqrtN()
    ->each(function($chunk) {
        processBatch($chunk);
    });

4. Database Integration

use SqrtSpace\SpaceTime\Database\SpaceTimeQueryBuilder;

// Process large result sets efficiently
$query = new SpaceTimeQueryBuilder($pdo);
$query->from('orders')
    ->where('status', '=', 'pending')
    ->orderByExternal('created_at', 'desc')
    ->chunkBySqrtN(function($orders) {
        foreach ($orders as $order) {
            processOrder($order);
        }
    });

// Stream results for minimal memory usage
$stream = $query->from('logs')
    ->where('level', '=', 'error')
    ->stream();
    
$stream->filter(fn($log) => strpos($log['message'], 'critical') !== false)
    ->each(fn($log) => alertAdmin($log));

5. Laravel Integration

// In AppServiceProvider
use SqrtSpace\SpaceTime\Laravel\SpaceTimeServiceProvider;

public function register()
{
    $this->app->register(SpaceTimeServiceProvider::class);
}

// Collection macros
$collection = collect($largeArray);

// Sort using external memory
$sorted = $collection->sortByExternal('price');

// Group by with external storage
$grouped = $collection->groupByExternal('category');

// Process in √n chunks
$collection->chunkBySqrtN()->each(function ($chunk) {
    processBatch($chunk);
});

// Query builder extensions
DB::table('orders')
    ->chunkBySqrtN(function ($orders) {
        foreach ($orders as $order) {
            processOrder($order);
        }
    });

6. Memory Pressure Handling

use SqrtSpace\SpaceTime\Memory\MemoryPressureMonitor;
use SqrtSpace\SpaceTime\Memory\Handlers\LoggingHandler;
use SqrtSpace\SpaceTime\Memory\Handlers\CacheEvictionHandler;
use SqrtSpace\SpaceTime\Memory\Handlers\GarbageCollectionHandler;

$monitor = new MemoryPressureMonitor('512M');

// Add handlers
$monitor->registerHandler(new LoggingHandler($logger));
$monitor->registerHandler(new CacheEvictionHandler());
$monitor->registerHandler(new GarbageCollectionHandler());

// Check pressure in your operations
if ($monitor->check() === MemoryPressureLevel::HIGH) {
    // Switch to more aggressive memory saving
    $processor->useExternalStorage();
}

7. Checkpointing for Fault Tolerance

use SqrtSpace\SpaceTime\Checkpoint\CheckpointManager;

$checkpoint = new CheckpointManager('import_job_123');

foreach ($largeDataset->chunkBySqrtN() as $chunk) {
    processChunk($chunk);
    
    // Save progress every √n items
    if ($checkpoint->shouldCheckpoint()) {
        $checkpoint->save([
            'processed' => $processedCount,
            'last_id' => $lastId
        ]);
    }
}

Real-World Examples

Processing Large CSV Files

use SqrtSpace\SpaceTime\File\CsvReader;
use SqrtSpace\SpaceTime\Algorithms\ExternalGroupBy;

$reader = new CsvReader('sales_data.csv');

// Get column statistics
$stats = $reader->getColumnStats('amount');
echo "Average order: $" . $stats['avg'];

// Process with type conversion
$totals = $reader->readWithTypes([
    'amount' => 'float',
    'quantity' => 'int',
    'date' => 'date'
])->reduce(function ($totals, $row) {
    $month = $row['date']->format('Y-m');
    $totals[$month] = ($totals[$month] ?? 0) + $row['amount'];
    return $totals;
}, []);

Large Data Export

use SqrtSpace\SpaceTime\File\CsvExporter;
use SqrtSpace\SpaceTime\Database\SpaceTimeQueryBuilder;

$exporter = new CsvExporter('users_export.csv');
$query = new SpaceTimeQueryBuilder($pdo);

// Export with headers
$exporter->writeHeaders(['ID', 'Name', 'Email', 'Created At']);

// Stream data directly to CSV
$query->from('users')
    ->orderBy('created_at', 'desc')
    ->chunkBySqrtN(function($users) use ($exporter) {
        $exporter->writeRows(array_map(function($user) {
            return [
                $user['id'],
                $user['name'],
                $user['email'],
                $user['created_at']
            ];
        }, $users));
    });

echo "Exported " . number_format($exporter->getBytesWritten()) . " bytes\n";

Batch Processing with Memory Limits

use SqrtSpace\SpaceTime\Batch\BatchProcessor;

$processor = new BatchProcessor([
    'memory_threshold' => 0.8,
    'checkpoint_enabled' => true,
    'progress_callback' => function($batch, $size, $result) {
        echo "Processed batch $batch ($size items)\n";
    }
]);

$result = $processor->process($millionItems, function($batch) {
    $processed = [];
    foreach ($batch as $key => $item) {
        $processed[$key] = expensiveOperation($item);
    }
    return $processed;
}, 'job_123');

echo "Success: " . $result->getSuccessCount() . "\n";
echo "Errors: " . $result->getErrorCount() . "\n";
echo "Time: " . $result->getExecutionTime() . "s\n";

Configuration

use SqrtSpace\SpaceTime\SpaceTimeConfig;

// Global configuration
SpaceTimeConfig::configure([
    'memory_limit' => '512M',
    'external_storage_path' => '/tmp/spacetime',
    'chunk_strategy' => 'sqrt_n', // or 'memory_based', 'fixed'
    'enable_checkpointing' => true,
    'compression' => true,
    'compression_level' => 6
]);

// Per-operation configuration
$array = new SpaceTimeArray(10000); // threshold

// Check configuration
echo "Chunk size for 1M items: " . SpaceTimeConfig::calculateSqrtN(1000000) . "\n";
echo "Storage path: " . SpaceTimeConfig::getStoragePath() . "\n";

Advanced Usage

JSON Lines Processing

use SqrtSpace\SpaceTime\File\JsonLinesProcessor;

// Process large JSONL files
JsonLinesProcessor::processInChunks('events.jsonl', function($events) {
    foreach ($events as $event) {
        if ($event['type'] === 'error') {
            logError($event);
        }
    }
});

// Split large file
$files = JsonLinesProcessor::split('huge.jsonl', 100000, 'output/chunk');
echo "Split into " . count($files) . " files\n";

// Merge multiple files
$count = JsonLinesProcessor::merge($files, 'merged.jsonl');
echo "Merged $count records\n";

Streaming Operations

use SqrtSpace\SpaceTime\Streams\SpaceTimeStream;

// Chain operations efficiently
SpaceTimeStream::fromCsv('sales.csv')
    ->filter(fn($row) => $row['region'] === 'US')
    ->map(fn($row) => [
        'product' => $row['product'],
        'revenue' => $row['quantity'] * $row['price']
    ])
    ->chunkBySqrtN()
    ->each(function($chunk) {
        $total = array_sum(array_column($chunk, 'revenue'));
        echo "Chunk revenue: \$$total\n";
    });

Custom Batch Jobs

use SqrtSpace\SpaceTime\Batch\BatchJob;

class ImportJob extends BatchJob
{
    private string $filename;
    
    public function __construct(string $filename)
    {
        parent::__construct();
        $this->filename = $filename;
    }
    
    protected function getItems(): iterable
    {
        return SpaceTimeStream::fromCsv($this->filename);
    }
    
    public function processItem(array $batch): array
    {
        $results = [];
        foreach ($batch as $key => $row) {
            $user = User::create([
                'name' => $row['name'],
                'email' => $row['email']
            ]);
            $results[$key] = $user->id;
        }
        return $results;
    }
    
    protected function getUniqueId(): string
    {
        return md5($this->filename);
    }
}

// Run job with automatic checkpointing
$job = new ImportJob('users.csv');
$result = $job->execute();

// Or resume if interrupted
if ($job->canResume()) {
    $result = $job->resume();
}

Testing

# Run all tests
vendor/bin/phpunit

# Run specific test suite
vendor/bin/phpunit tests/Algorithms

# With coverage
vendor/bin/phpunit --coverage-html coverage

Performance Considerations

  1. Chunk Size: The default √n chunk size is optimal for most cases, but you can tune it:

    SpaceTimeConfig::configure(['chunk_strategy' => 'fixed', 'fixed_chunk_size' => 5000]);
  2. Compression: Enable for text-heavy data, disable for already compressed data:

    SpaceTimeConfig::configure(['compression' => false]);
  3. Storage Location: Use fast local SSDs for external storage:

    SpaceTimeConfig::configure(['external_storage_path' => '/mnt/fast-ssd/spacetime']);

Framework Integration

Laravel

// config/spacetime.php
return [
    'memory_limit' => env('SPACETIME_MEMORY_LIMIT', '256M'),
    'storage_driver' => env('SPACETIME_STORAGE', 'file'),
    'redis_connection' => env('SPACETIME_REDIS', 'default'),
];

// In controller
public function exportOrders()
{
    return SpaceTimeResponse::stream(function() {
        Order::orderByExternal('created_at')
            ->chunkBySqrtN(function($orders) {
                foreach ($orders as $order) {
                    echo $order->toCsv() . "\n";
                }
            });
    });
}

Symfony

For a complete Symfony integration example, see our Symfony bundle documentation.

# config/bundles.php
return [
    // ...
    SqrtSpace\SpaceTime\Symfony\SpaceTimeBundle::class => ['all' => true],
];
# config/packages/spacetime.yaml
spacetime:
    memory_limit: '%env(SPACETIME_MEMORY_LIMIT)%'
    storage_path: '%kernel.project_dir%/var/spacetime'
    chunk_strategy: 'sqrt_n'
    enable_checkpointing: true
    compression: true
// In controller
use SqrtSpace\SpaceTime\Batch\BatchProcessor;
use SqrtSpace\SpaceTime\File\CsvReader;

#[Route('/import')]
public function import(BatchProcessor $processor): Response
{
    $reader = new CsvReader($this->getParameter('import_file'));
    
    $result = $processor->process(
        $reader->stream(),
        fn($batch) => $this->importBatch($batch)
    );
    
    return $this->json([
        'imported' => $result->getSuccessCount(),
        'errors' => $result->getErrorCount()
    ]);
}
# Console command
php bin/console spacetime:process-file input.csv output.csv --format=csv --checkpoint

Troubleshooting

Out of Memory Errors

  1. Reduce chunk size:

    SpaceTimeConfig::configure(['chunk_strategy' => 'fixed', 'fixed_chunk_size' => 1000]);
  2. Enable more aggressive memory handling:

    $monitor = new MemoryPressureMonitor('128M'); // Lower threshold
  3. Use external storage earlier:

    $array = new SpaceTimeArray(100); // Smaller threshold

Performance Issues

  1. Check disk I/O speed
  2. Enable compression for text data
  3. Use memory-based external storage:
    SpaceTimeConfig::configure(['external_storage_path' => '/dev/shm/spacetime']);

Checkpoint Recovery

$checkpoint = new CheckpointManager('job_id');
if ($checkpoint->exists()) {
    $state = $checkpoint->load();
    echo "Resuming from: " . json_encode($state) . "\n";
}

Requirements

  • PHP 8.1 or higher
  • ext-json
  • ext-mbstring

Optional Extensions

  • ext-apcu for faster caching
  • ext-redis for distributed operations
  • ext-zlib for compression

Contributing

Please see CONTRIBUTING.md for details.

License

The Apache 2.0 License. Please see LICENSE for details.