juvo / as-processor
Process huge datasets for import or sync with ease.
Requires
- php: >=8.0.0
- cardinalby/content-disposition: ^1.1
- halaxa/json-machine: ^1.2
- league/csv: ^9.0
- phpoffice/phpspreadsheet: ^2.1
- phpseclib/phpseclib: ^3.0.37
- sabre/dav: ^4.6
- woocommerce/action-scheduler: ^3.7
Requires (Dev)
- phpstan/extension-installer: ^1.1
- phpstan/phpstan: ^1.10.6
- phpunit/phpunit: 10
- szepeviktor/phpstan-wordpress: ^v1.1.7
- wp-coding-standards/wpcs: ^3.1
- dev-main
- 3.1.2
- 3.1.1
- 3.1.0
- 3.0.5
- 3.0.4
- 3.0.3
- 3.0.2
- 3.0.1
- 3.0.0
- 2.3.10
- 2.3.9
- 2.3.8
- 2.3.7
- 2.3.6
- 2.3.5
- 2.3.4
- 2.3.3
- 2.3.2
- 2.3.1
- 2.3.0
- 2.2.0
- 2.1.4
- 2.1.3
- 2.1.2
- 2.1.1
- 2.1.0
- 2.0.7
- 2.0.6
- 2.0.5
- 2.0.4
- 2.0.3
- 2.0.2
- 2.0.1
- 2.0.0
- 1.0.4
- 1.0.3
- 1.0.2
- 1.0.1
- 1.0.0
- dev-use-db-to-count-chunks
- dev-fix/sequential-sync
- dev-fix/empty-check
- dev-improve-lifecycle-hooks
- dev-fix/chunker-filename
- dev-renovate/configure
This package is auto-updated.
Last update: 2025-02-14 16:41:03 UTC
README
The AS Processor library is a robust synchronization and data chunking framework designed specifically for WordPress environments. Leveraging asynchronous task management through the Action Scheduler, it provides a flexible and efficient orchestration for large-scale data processing tasks, such as API synchronizations, file-based (CSV, Excel, JSON) imports, and seamless chunk-wise data management.
Core Features
-
Data Chunking and Processing:
- The library introduces a consistent chunking mechanism to split large datasets (from files or APIs) into smaller, manageable pieces. Each chunk is processed asynchronously, reducing memory usage and improving load balancing.
- Multiple data sources are supported, including API endpoints, Excel files, CSV files, and JSON files.
-
Asynchronous Task Management:
- Powered by the awesome Action Scheduler, tasks can be managed asynchronously, ensuring smooth execution without blocking other site processes.
- Tasks are queued for execution, with efficient handling of task timeouts, retries, and cancellations.
-
Data Source Adaptability:
- The library provides abstract, extensible classes for different data formats:
- CSV Processor (
CSV_Sync
): Handles CSV imports split into chunks, supporting UTF-8 conversions, custom delimiters, optional headers, and efficient file cleanup. - Excel Processor (
Excel
): Manages Excel files with optional headers and the ability to process specific worksheets. Includes support for rich text cell values and row skipping. - JSON Processor (
JSON
): Enables chunked processing of JSON files and supports JSON Pointer for extracting specific portions of the data. - API Processor (
API
): Works with paginated APIs, automatically handles rate limiting, request intervals, and pagination (by page, offset, or URL).
- CSV Processor (
- The library provides abstract, extensible classes for different data formats:
-
Highly Extensible and Customizable:
- Abstract classes allow developers to implement their own data-fetching or processing methods tailored to their use case.
- A robust foundation supports advanced features like progressive pagination, deep merging, lock management, and transient-based sync data storage.
-
Reliable Sync Management:
- Built-in sync lifecycle management, including hooks for starting, progressing, and completing synchronizations.
- Automatic cleanup of completed tasks and retention of synchronization data for specified durations.
-
Error Handling and Recovery:
- Exception handling is seamlessly integrated at every critical stage, ensuring the process fails gracefully and any issues are recorded for diagnosis.
- Built-in support for handling job timeouts, cancellations, and retries with appropriate sync lifecycle callbacks.
Use Cases
- Importing Data: Import and process data from large datasets such as customer lists in Excel, product catalogs in JSON, or user data from CSV files.
- API Data Sync: Synchronize large-scale data from external APIs, with built-in features like pagination handling and rate limiting.
- Scheduled and Batch Processing: Execute complex batch processing tasks (like order processing or generating reports) asynchronously without affecting website performance.
- Custom Data Processing Flows: Build scalable workflows for chunking and processing any large dataset with minimal memory consumption and maximum fault tolerance.
Technical Highlights
-
Core Components:
- The
Chunker
trait schedules and processes data chunks, leveraging Action Scheduler for asynchronous execution. - The
Sync
class serves as an abstract base that manages the entire synchronization process lifecycle. - The
Sync_Data
trait provides reliable synchronization data storage using WordPress's transients, enabling flexible data sharing and locking mechanisms.
- The
-
Focus on Memory Efficiency:
- Uses iterative processing (e.g., PHP Generators, chunk-based file reading) to minimize memory usage and ensure scalability for large datasets.
-
Modular Design:
- The clean separation of concerns allows developers to modify or extend specific aspects like chunk processing or API fetching independently.
-
Action Scheduler Integration:
- Uses the Action Scheduler's event-driven architecture to manage background tasks effectively, along with group-level synchronization to maintain context-aware processing.
Ideal For
This library is an excellent choice for WordPress developers and enterprises dealing with:
- High-volume data integration from various sources.
- Automating repetitive and resource-intensive synchronization tasks.
- Optimizing workflows for applications that rely on large datasets or slow APIs.
- Frequent e-commerce product imports
AS Processor combines the power of modern WordPress development practices, Action Scheduler's asynchronous processing capabilities, and a highly abstracted framework to enable seamless and fault-tolerant data processing at scale. It offers developers a solid foundation for building efficient, scalable synchronization solutions tailored to their applications' unique requirements.