1biot/fiquela

FiQueLa - File Query Language - PHP library for querying files

v2.0.0-beta 2025-01-28 14:50 UTC

This package is auto-updated.

Last update: 2025-01-28 14:52:10 UTC


README

[fi-kju-ela]

Packagist Version GitHub Actions Workflow Status Packagist Dependency Version Packagist License

Static Badge Static Badge Static Badge Static Badge

FiQueLa is a powerful PHP library that brings SQL-inspired querying capabilities to structured data formats like XML, CSV, JSON, YAML and NEON. Designed for simplicity and modularity, it allows you to filter, join, and aggregate data with a familiar and efficient syntax. Whether you're working with large datasets or integrating various sources, FiQueLa provides a seamless way to manipulate and explore your data.

Features:

  • 📂 Supports multiple formats: Work seamlessly with XML, CSV, JSON, YAML, and NEON.
  • 🛠️ SQL-inspired syntax: Perform SELECT, JOIN, WHERE, GROUP BY, ORDER BY and more.
  • ✍️ Flexible Querying: Write SQL-like strings or use the fluent API for maximum flexibility.
  • 📊 Advanced functions: Access features like SUM, COUNT, AVG, GROUP_CONCAT, MD5, UPPER, and many more.
  • 🚀 Efficient with Large Files: Optimized for processing JSON, XML, and CSV files with tens of thousands of rows using stream processing.
  • 🧑‍💻 Developer-Friendly: Map results to DTOs for easier data manipulation.
  • Unified API across all supported formats: Use a consistent API for all your data needs.

Table of Contents:

1. Overview

Why limit SQL to databases when it can be just as effective for querying structured data? FiQueLa (File Query Language) brings the power of SQL to your files. Whether you're working with JSON, XML, CSV, or YAML, FiQueLa enables you to interact with these formats using familiar SQL syntax.

Key highlights:

  • Universal Querying: Use SQL-like queries to filter, sort, join, and aggregate data across multiple file types.
  • Data Formats Support: Seamlessly work with JSON, XML, CSV, YAML, and more.
  • Powerful Features: Access advanced SQL features like GROUP BY, HAVING, and functions for data transformation directly on your file-based datasets.
  • Developer-Friendly: Whether you're a beginner or an experienced developer, FiQueLa offers a simple and consistent API for all your data needs.
  • Flexible Integration: Ideal for scenarios where data lives in files rather than traditional databases.
  • SQL-Like Strings: Write and execute SQL-like string queries directly, providing an alternative to fluent syntax for greater flexibility and familiarity.

Use FiQueLa to:

  • Simplify data extraction and analysis from structured files.
  • Combine data from multiple sources with ease.
  • Create lightweight data processing pipelines without a full-fledged database.

FiQueLa empowers developers to unlock the potential of file-based data with the familiar and expressive language of SQL.

2. Installation

Install via Composer:

composer require 1biot/fiquela

Install packages for optional features:

composer require league/csv halaxa/json-machine symfony/yaml nette/neon tracy/tracy
  • league/csv: Required for CSV file support.
  • halaxa/json-machine: Required for JSON stream support.
  • symfony/yaml: Required for YAML file support.
  • nette/neon: Required for NEON file support.
  • tracy/tracy: Optional for using Debugger

3. Supported Formats

4. Getting Started

Here’s a quick example of how FiQueLa can simplify your data queries:

use FQL\Enum;
use FQL\Query;

$results = Query\Provider::fromFileQuery('(./path/to/file.xml).SHOP.SHOPITEM')
    ->selectAll()
    ->where('EAN', Enum\Operator::EQUAL, '1234567891011')
    ->or('PRICE', Enum\Operator::LESS_THAN_OR_EQUAL, 200)
    ->orderBy('PRICE')->desc()
    ->limit(10)
    ->execute()
    ->fetchAll();

print_r(iterator_to_array($results));

Output:

Array
(
    [0] => Array
        (
            [NAME] => "Product 1"
            [EAN] => "1234567891011"
            [PRICE] => 300.00
        )
    [1] => Array
        (
            [NAME] => "Product 2"
            [EAN] => "1234567891012"
            [PRICE] => 150.00
        )
    [2] => Array
        (
            [NAME] => "Product 3"
            [EAN] => "1234567891013"
            [PRICE] => 150.00
        )
    ...
)

5. Documentation

For more details about FiQueLa and her capabilities, explore the documentation sections.

6. Examples

Check the examples and run them using Composer. All examples uses \FQL\Query\Debugger and methods inspectQuery, inspectSql, inspectStreamSql or benchmarkQuery to show the results.

composer examples
# or
composer example:csv
composer example:join
composer example:json
composer example:neon
composer example:sql
composer example:xml
composer example:yaml

Check step Examples at actions or run composer example:csv and output will look like this:

=========================
### Debugger started: ###
=========================
> Memory usage (MB): 1.1899 (emalloc)
> Memory peak usage (MB): 1.5636 (emalloc)
------------------------------
> Execution time (s): 5.3E-5
> Execution time (ms): 0.053
> Execution time (µs): 53
=========================
### Inspecting query: ###
=========================
==================
### SQL query: ###
==================
> SELECT
>   ean ,
>   defaultCategory ,
>   EXPLODE(" > ", defaultCategory) AS categoryArray ,
>   price ,
>   ROUND(price, 2) AS price_rounded ,
>   MOD(price, 100) AS modulo_100 ,
>   MOD(price, 54) AS modulo_54
> FROM [csv](products-w-1250.csv).*
> GROUP BY defaultCategory
> ORDER BY defaultCategory DESC
================
### Results: ###
================
> Result class: FQL\Results\InMemory
> Results size memory (KB): 3.55
> Result exists: TRUE
> Result count: 15
========================
### Fetch first row: ###
========================
array (7)
   'ean' => 5010232964877
   'defaultCategory' => 'Testování > Drogerie'
   'categoryArray' => array (2)
   |  0 => 'Testování'
   |  1 => 'Drogerie'
   'price' => 121.0
   'price_rounded' => 121.0
   'modulo_100' => 21.0
   'modulo_54' => 13.0

>>> SPLIT TIME <<<
> Memory usage (MB): 2.4509 (emalloc)
> Memory peak usage (MB): 2.5911 (emalloc)
------------------------------
> Execution time (s): 0.024494
> Execution time (ms): 24.494
> Execution time (µs): 24494
========================
### Benchmark Query: ###
========================
> 2 500 iterations
==================
### SQL query: ###
==================
> SELECT
>   ean ,
>   defaultCategory ,
>   EXPLODE(" > ", defaultCategory) AS categoryArray ,
>   price ,
>   ROUND(price, 2) AS price_rounded ,
>   MOD(price, 100) AS modulo_100 ,
>   MOD(price, 54) AS modulo_54
> FROM [csv](products-w-1250.csv).*
> GROUP BY defaultCategory
> ORDER BY defaultCategory DESC
=========================
### STREAM BENCHMARK: ###
=========================
> Size (KB): 2.61
> Count: 15
> Iterated results: 37 500
>>> SPLIT TIME <<<
> Memory usage (MB): 2.4404 (emalloc)
> Memory peak usage (MB): 2.6138 (emalloc)
------------------------------
> Execution time (s): 10.107137
> Execution time (ms): 10107.137
> Execution time (µs): 10107137
============================
### IN_MEMORY BENCHMARK: ###
============================
> Size (KB): 3.55
> Count: 15
> Iterated results: 37 500
>>> SPLIT TIME <<<
> Memory usage (MB): 2.4509 (emalloc)
> Memory peak usage (MB): 2.6138 (emalloc)
------------------------------
> Execution time (s): 0.00673
> Execution time (ms): 6.73
> Execution time (µs): 6730
=======================
### Debugger ended: ###
=======================
> Memory usage (MB): 2.44 (emalloc)
> Memory peak usage (MB): 2.6138 (emalloc)
------------------------------
> Final execution time (s): 10.138434
> Final execution time (ms): 10138.434
> Final execution time (µs): 10138434

7. Knowing issues

  • ⚠️ Functions JOIN, ORDER BY and GROUP BY are not memory efficient, because joining data or sorting data requires to load all data into memory. It may cause memory issues for large datasets. But everything else is like ⚡️.

8. Planning Features

  • Operator BETWEEN: Add operator BETWEEN for filtering data and add support for dates and ranges.
  • Next file formats: Add next file formats like NDJson and MessagePack
  • Documentation: Create detailed guides and examples for advanced use cases.
  • Add explain method: Add method explain() for explaining query execution from actual query debugger and provide more complex information about query.
  • PHPStan 8: Fix all PHPStan 8 errors.
  • Tests: Increase test coverage.
  • Optimize GROUP BY: Optimize GROUP BY for more memory efficient data processing.
  • Hashmap cache: Add hashmap cache (Redis, Memcache) for more memory efficient data processing.
  • DELETE, UPDATE, INSERT: Support for manipulating data in files. - Instead of this, it will comes support for exporting data to files (CSV, NDJson, MessagePack, and more...).

9. Contributions

If you have suggestions or would like to contribute to these features, feel free to open an issue or a pull request!

How to contribute:

  • Fork the repository
  • Create a new branch
  • Make your changes
  • Create a pull request
  • All tests must pass
  • Wait for approval
  • 🚀