arraypress/wp-csv-utils

A secure and efficient WordPress library for handling CSV operations with built-in validation and sanitization

Installs: 0

Dependents: 0

Suggesters: 0

Security: 0

Stars: 0

Watchers: 0

Forks: 0

Open Issues: 0

pkg:composer/arraypress/wp-csv-utils

dev-main 2025-10-15 15:15 UTC

This package is auto-updated.

Last update: 2025-10-18 17:05:46 UTC


README

A secure and efficient WordPress library for handling CSV operations with built-in validation and sanitization.

Features

  • ๐Ÿ”’ Security First: Built-in CSV formula injection prevention
  • โœ… WordPress Integration: Native attachment validation and permission checks
  • ๐Ÿš€ Memory Efficient: Batch processing for large files
  • ๐Ÿ›ก๏ธ Input Validation: MIME type, file size, and structure validation
  • ๐Ÿงน Auto Sanitization: Configurable cell and header sanitization
  • ๐Ÿ“Š Flexible API: Read, preview, iterate, and batch process CSV data
  • โšก Performance: Optimized for large file handling

Installation

Install via Composer:

composer require arraypress/wp-csv-utils

Requirements

  • PHP 7.4 or later
  • WordPress 5.0 or later

Quick Start

Basic Usage

use ArrayPress\CSVUtils\CSV;

// Create from WordPress attachment with full validation
$csv = CSV::from_attachment( $attachment_id );

if ( is_wp_error( $csv ) ) {
    // Handle error
    wp_die( $csv->get_error_message() );
}

// Get headers
$headers = $csv->get_headers();

// Preview first 5 rows
$preview = $csv->preview( 5 );

// Get total row count
$total = $csv->count_rows();

Import Processing

// Process entire CSV with callback
$csv->each( function( $row, $index ) {
    // $row is associative array with headers as keys
    wp_insert_post( [
        'post_title'   => $row['Title'],
        'post_content' => $row['Content'],
        'post_status'  => 'publish'
    ] );
}, 100 ); // Batch size of 100 rows

Batch Processing

// Manual batch processing (useful for AJAX/REST)
$offset = 0;
$limit  = 100;

$batch = $csv->get_batch( $offset, $limit, true );

foreach ( $batch as $row ) {
    // Process row
    process_row( $row );
}

Custom Configuration

$csv = CSV::from_attachment( $attachment_id, [
    'max_file_size_mb' => 50,        // Maximum file size
    'delimiter'        => ',',       // CSV delimiter
    'enclosure'        => '"',       // CSV enclosure
    'escape'           => '\\',      // Escape character
    'sanitize_cells'   => true,      // Auto-sanitize cells
    'skip_empty_rows'  => true,      // Skip empty rows
] );

API Reference

CSV Class

Static Methods

from_attachment( int $attachment_id, array $config = [] )

Create CSV instance from WordPress attachment with full validation.

Parameters:

  • $attachment_id - WordPress attachment ID
  • $config - Optional configuration array

Returns: CSV|WP_Error - CSV instance or error

Security Checks:

  • Validates attachment exists
  • Checks user permissions
  • Verifies file is in uploads directory
  • Validates MIME type (text/csv, text/plain)
  • Checks file size limits

Example:

$csv = CSV::from_attachment( 123 );
if ( is_wp_error( $csv ) ) {
    echo $csv->get_error_message();
}

Instance Methods

get_headers()

Get CSV headers (first row).

Returns: array|WP_Error - Array of headers or error

Example:

$headers = $csv->get_headers();
// ['Name', 'Email', 'Phone']
count_rows()

Get total number of rows (excluding header).

Returns: int|WP_Error - Row count or error

Example:

$total = $csv->count_rows();
// 1500
get_batch( int $offset = 0, int $limit = 100, bool $assoc = false )

Get a batch of rows.

Parameters:

  • $offset - Starting row (0-based, after header)
  • $limit - Number of rows to retrieve
  • $assoc - Return associative arrays (headers as keys)

Returns: array|WP_Error - Array of rows or error

Example:

// Get rows 100-199 as indexed arrays
$batch = $csv->get_batch( 100, 100 );

// Get rows as associative arrays
$batch = $csv->get_batch( 0, 100, true );
foreach ( $batch as $row ) {
    echo $row['Email']; // Access by header name
}
preview( int $limit = 5, bool $assoc = false )

Get preview rows (first N rows).

Parameters:

  • $limit - Maximum rows to preview
  • $assoc - Return associative arrays

Returns: array|WP_Error - Preview data or error

Example:

$preview = $csv->preview( 5, true );
each( callable $callback, int $batch_size = 100 )

Iterate through all rows with callback.

Parameters:

  • $callback - Function to call for each row function( $row, $index )
  • $batch_size - Batch size for memory efficiency

Returns: bool|WP_Error - True on success, error on failure

Example:

$csv->each( function( $row, $index ) {
    // Process row
    create_user_from_csv( $row );
    
    // Return false to stop iteration
    if ( $index >= 1000 ) {
        return false;
    }
}, 100 );
get_info()

Get file information.

Returns: array - File metadata

Example:

$info = $csv->get_info();
// [
//     'path'      => '/path/to/file.csv',
//     'size'      => 1024000,
//     'mime_type' => 'text/csv',
//     'rows'      => 1500
// ]

Validator Class

validate_attachment( int $attachment_id )

Validate WordPress attachment for CSV import.

Security Checks:

  • Attachment exists and is valid
  • User has read permissions
  • File exists on filesystem
  • File is in uploads directory (prevents path traversal)
  • MIME type is text/csv or text/plain
  • File size is within limits (default 50MB)

Returns: string|WP_Error - File path or error

Example:

use ArrayPress\CSVUtils\Validator;

$validator = new Validator();
$file_path = $validator->validate_attachment( 123 );

if ( is_wp_error( $file_path ) ) {
    wp_die( $file_path->get_error_message() );
}

validate_structure( string $file_path )

Validate CSV has proper structure (headers and data).

Returns: true|WP_Error - True if valid, error otherwise

Sanitizer Class

Static Methods

cell( string $value )

Sanitize CSV cell to prevent formula injection.

Prevents CSV formula injection by prepending single quote to cells starting with: =, +, -, @, tab, or carriage return

Example:

use ArrayPress\CSVUtils\Sanitizer;

$safe = Sanitizer::cell( '=SUM(A1:A10)' );
// Returns: "'=SUM(A1:A10)"
row( array $row )

Sanitize entire row of cells.

Example:

$safe_row = Sanitizer::row( $row );
headers( array $headers )

Sanitize and normalize CSV headers.

Handles:

  • Empty headers (replaces with "Column N")
  • Whitespace trimming
  • Text field sanitization

Example:

$headers = Sanitizer::headers( [ '', 'Name', '  Email  ' ] );
// Returns: ['Column 1', 'Name', 'Email']

Security Features

Formula Injection Prevention

The library automatically prevents CSV formula injection attacks:

// Malicious input
$row = [ '=cmd|"/c calc"', '+1+1', '-1-1', '@SUM(A1:A10)' ];

// Automatically sanitized (if sanitize_cells is enabled)
$safe_row = $csv->get_batch( 0, 1 );
// Returns: ["'=cmd|'/c calc'", "'+1+1", "'-1-1", "'@SUM(A1:A10)"]

File Validation

All file operations include:

  • โœ… Permission checks (current_user_can)
  • โœ… Path traversal prevention
  • โœ… MIME type validation
  • โœ… File size limits
  • โœ… Upload directory verification

Best Practices

// Always check for errors
$csv = CSV::from_attachment( $attachment_id );
if ( is_wp_error( $csv ) ) {
    return $csv; // Return error to REST API
}

// Validate structure
$validator = new Validator();
$valid = $validator->validate_structure( $file_path );
if ( is_wp_error( $valid ) ) {
    return $valid;
}

// Process with error handling
$result = $csv->each( function( $row, $index ) {
    try {
        process_row( $row );
    } catch ( Exception $e ) {
        error_log( "Error processing row {$index}: " . $e->getMessage() );
    }
} );

Real-World Examples

WordPress User Import

use ArrayPress\CSVUtils\CSV;

function import_users_from_csv( $attachment_id ) {
    $csv = CSV::from_attachment( $attachment_id );
    
    if ( is_wp_error( $csv ) ) {
        return $csv;
    }
    
    $imported = 0;
    $errors   = [];
    
    $csv->each( function( $row, $index ) use ( &$imported, &$errors ) {
        // Validate required fields
        if ( empty( $row['email'] ) ) {
            $errors[] = "Row {$index}: Email is required";
            return;
        }
        
        // Create user
        $user_id = wp_create_user(
            $row['username'],
            wp_generate_password(),
            $row['email']
        );
        
        if ( is_wp_error( $user_id ) ) {
            $errors[] = "Row {$index}: " . $user_id->get_error_message();
            return;
        }
        
        // Update user meta
        update_user_meta( $user_id, 'first_name', $row['first_name'] );
        update_user_meta( $user_id, 'last_name', $row['last_name'] );
        
        $imported++;
    }, 50 );
    
    return [
        'imported' => $imported,
        'errors'   => $errors
    ];
}

REST API Import with Progress

// Start import endpoint
function start_import( WP_REST_Request $request ) {
    $file_id = $request->get_param( 'file_id' );
    
    $csv = CSV::from_attachment( $file_id );
    if ( is_wp_error( $csv ) ) {
        return $csv;
    }
    
    $import_id = wp_generate_uuid4();
    $total = $csv->count_rows();
    
    set_transient( 'import_' . $import_id, [
        'file_id' => $file_id,
        'total'   => $total
    ], HOUR_IN_SECONDS );
    
    return [
        'import_id' => $import_id,
        'total'     => $total,
        'batch_size' => 100
    ];
}

// Process batch endpoint
function process_batch( WP_REST_Request $request ) {
    $import_id = $request->get_param( 'import_id' );
    $batch_num = $request->get_param( 'batch' );
    
    $config = get_transient( 'import_' . $import_id );
    if ( ! $config ) {
        return new WP_Error( 'invalid_import', 'Import expired' );
    }
    
    $csv = CSV::from_attachment( $config['file_id'] );
    
    $offset = $batch_num * 100;
    $batch = $csv->get_batch( $offset, 100, true );
    
    $processed = 0;
    foreach ( $batch as $row ) {
        process_row( $row );
        $processed++;
    }
    
    $is_complete = ( $offset + $processed ) >= $config['total'];
    
    return [
        'processed'   => $processed,
        'is_complete' => $is_complete
    ];
}

Export with Writer (Future Feature)

use ArrayPress\CSVUtils\Writer;

$writer = new Writer( 'export.csv' );
$writer->set_headers( [ 'Name', 'Email', 'Status' ] );

$users = get_users();
foreach ( $users as $user ) {
    $writer->add_row( [
        $user->display_name,
        $user->user_email,
        'Active'
    ] );
}

$file_path = $writer->save();

Configuration Options

[
    // Maximum file size in megabytes
    'max_file_size_mb' => 50,
    
    // CSV delimiter character
    'delimiter' => ',',
    
    // CSV enclosure character
    'enclosure' => '"',
    
    // CSV escape character
    'escape' => '\\',
    
    // Auto-sanitize cells (prevents formula injection)
    'sanitize_cells' => true,
    
    // Skip empty rows during processing
    'skip_empty_rows' => true,
]

Error Handling

The library uses WP_Error for all error conditions:

$csv = CSV::from_attachment( $attachment_id );

if ( is_wp_error( $csv ) ) {
    $error_code = $csv->get_error_code();
    $error_message = $csv->get_error_message();
    $error_data = $csv->get_error_data();
    
    // Common error codes:
    // - invalid_file: File doesn't exist
    // - forbidden: No permission
    // - invalid_format: Not a CSV file
    // - file_too_large: Exceeds size limit
    // - invalid_path: Path traversal attempt
}

Performance Tips

  1. Use batch processing for large files:

    $csv->each( $callback, 100 ); // Process 100 rows at a time
  2. Use associative arrays only when needed:

    // Faster (indexed arrays)
    $batch = $csv->get_batch( 0, 100, false );
    
    // Slower but more convenient (associative)
    $batch = $csv->get_batch( 0, 100, true );
  3. Process in background for very large imports:

    wp_schedule_single_event( time(), 'process_csv_import', [ $file_id ] );

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

GPL-2.0-or-later

Support

Credits

Developed and maintained by ArrayPress.

Changelog

1.0.0 - 2025-01-15

  • Initial release
  • CSV reading with WordPress integration
  • Built-in security features
  • Batch processing support
  • Memory-efficient iteration
  • Comprehensive validation and sanitization