mostafax / ai-reporting-engine
AI Reporting Engine β turn natural-language prompts into report queries via a unified Regex + OpenAI pipeline, with prompt normalization, fingerprint caching, and a pluggable DSL builder.
Requires
- php: ^8.1
- illuminate/contracts: ^10.0|^11.0|^12.0|^13.0
- illuminate/http: ^10.0|^11.0|^12.0|^13.0
- illuminate/support: ^10.0|^11.0|^12.0|^13.0
- mostafax/dual-layer-reporting-engine: dev-main
- mostafax/dynamic-hybrid-reporting-engine: dev-main
Requires (Dev)
- orchestra/testbench: ^8.0|^9.0|^10.0
- phpunit/phpunit: ^10.0|^11.0
Suggests
- mongodb/laravel-mongodb: Required for the MongoDB-backed mapping and prompt-cache stores (^4.0|^5.0)
This package is auto-updated.
Last update: 2026-07-05 13:04:27 UTC
README
π€ AI Reporting Engine
Turn natural-language prompts into report queries β securely, cheaply, and deterministically.
Natural language β normalized prompt β cache β regex β OpenAI β unified intent β validated DSL β execution. The AI never touches your database; it only proposes an intent.
π Documentation Β· Ψ§ΩΨΉΨ±Ψ¨ΩΨ©
Table of Contents
- Overview
- The Hybrid Intent-to-DSL Pattern
- Feature Summary
- Requirements
- Quick Install
- Detailed Setup
- The Pipeline
- Usage
- The Unified Intent Object
- Parsers
- Prompt Normalization
- Cache Storage
- DSL Builder
- Configuration
- Extending (SOLID)
- Data Stores (MongoDB)
- Logging
- Error Handling
- Testing
- Architecture Reference
- License
π Overview
AI Reporting Engine converts a natural-language question (Arabic or English) into an executable reporting DSL through a layered, cache-first pipeline. Unlike naive "prompt β SQL" approaches, the AI is only ever allowed to produce a structured intent β never a query, never SQL, never DSL. A dedicated, centralized DslBuilder is the single component that turns that validated intent into an executable DSL.
The engine produces a DSL only. Execution is delegated to
mostafax/dynamic-hybrid-reporting-engine,
fed by data replicated through
mostafax/dual-layer-reporting-engine.
Both are hard requirements and are pulled in automatically.
Why not just prompt β SQL? Because it burns tokens on every request, reprocesses identical questions, lets the model hallucinate invalid or unsafe queries, and couples business rules to the AI. This package fixes all five.
π§ The Hybrid Intent-to-DSL Pattern
This package is a concrete implementation of the Hybrid Intent-to-DSL Reporting Pattern by Mostafa Elbayyar.
Core principles
- AI never generates executable queries β it produces a validated intent only.
- Intent before execution β parsing and execution are fully separated.
- Cache first β identical questions never re-hit the AI.
- Rules before AI β a deterministic regex engine runs before any token is spent.
- Validation before execution β no invalid intent or DSL ever reaches the database.
Data flow
Prompt β Normalized Prompt β Intent β Validated Intent β DSL β Validated DSL β Execution
π― Feature Summary
| Feature | Description |
|---|---|
| π§Ή Prompt Normalization | Trim, lowercase, Arabic letter unification, tashkeel & punctuation removal β then a stable sha256 fingerprint |
| π Fingerprint Cache | Equivalent prompts share one fingerprint; a repeated question skips parsing entirely and costs zero tokens |
| βοΈ Regex Parser | Deterministic, token-free. Matches editable MongoDB mappings & column concepts β a unified intent |
| π§ OpenAI Parser | Runs only when regex misses. Returns strict JSON β never DSL, never SQL. Tracks latency + token usage |
| π― Unified Intent Object | Both parsers emit the identical IntermediateQuery; the rest of the system is source-agnostic |
| ποΈ Central DSL Builder | The single place that builds an executable DSL. No parser ever builds DSL directly |
| π Method Switch | auto (regex β OpenAI), regex (no tokens), or openai β runtime-configurable |
| π MongoDB Stores | Editable regex mappings, column concepts, and the prompt/DSL cache β all runtime-editable |
| π§© Pluggable Resolvers | Resolve entity names ("class 1-A") to ids for filtering, via a swappable EntityResolverInterface |
| π SOLID / DI | Every collaborator sits behind an interface β swap any piece via a container binding |
| π Arabic-aware | First-class Arabic normalization so "ΨΉΨ―Ψ― Ψ§ΩΨ·ΩΨ§Ψ¨" and "ΨΉΨ―Ψ― Ψ§ΩΨ·ΩΩΩΨ§Ψ¨" hit the same cache entry |
| π Execution Logging | Parser used, cache hit, execution ms, OpenAI latency, token usage, and the generated DSL |
π Requirements
Required
| Dependency | Version |
|---|---|
| PHP | 8.1+ |
| Laravel | 10 / 11 / 12 / 13 |
mostafax/dynamic-hybrid-reporting-engine |
dev-main (executes the DSL) |
mostafax/dual-layer-reporting-engine |
dev-main (replicates data for reporting) |
Suggested
| Package | Purpose |
|---|---|
mongodb/laravel-mongodb |
MongoDB-backed mapping & prompt-cache stores (^4.0 | ^5.0) |
β‘ Quick Install
# 1. Install the package (both reporting engines are pulled in automatically) composer require mostafax/ai-reporting-engine # 2. Publish the config php artisan vendor:publish --tag=ai-reporting-config
Done. Inject ReportQueryPipeline, call process($prompt), and hand the returned DSL to the reporting engine.
π§ Detailed Setup
Step 1 β Install
composer require mostafax/ai-reporting-engine
The service provider is auto-discovered (AiReportingEngineServiceProvider) and binds every interface to a sensible default.
Step 2 β Publish config
php artisan vendor:publish --tag=ai-reporting-config
# β config/ai-reporting.php
Step 3 β Environment variables
AI_REPORTING_ENABLED=true AI_REPORTING_METHOD=auto # auto | regex | openai OPENAI_API_KEY=sk-proj-... OPENAI_MODEL=gpt-4o-mini OPENAI_TIMEOUT=30 AI_REPORTING_MONGO_CONNECTION=mongodb
Step 4 β Seed the regex mappings (host-defined)
The regex parser reads section mappings and column concepts from MongoDB
(ai_report_mappings, ai_column_concepts). Seed them once from your domain
schema, then edit them at runtime without a deploy.
Step 5 β (Optional) Bind host implementations
To enable entity lookups (e.g. class-name β class_id) or a DB-backed AI config,
bind your own implementations β see Extending.
π The Pipeline
Every prompt passes through the same 7-step pipeline before a single DB call:
| # | Step | Component | Responsibility |
|---|---|---|---|
| 01 | Normalize | PromptNormalizer |
Canonicalize text (Arabic, case, punctuation) β fingerprint |
| 02 | Cache Lookup | MongoPromptCache |
Hit β return stored DSL, skip regex + OpenAI |
| 03 | Regex | RegexParser |
Token-free match β IntermediateQuery |
| 04 | OpenAI | OpenAiParser |
On regex miss β strict JSON β IntermediateQuery |
| 05 | Build DSL | DslBuilder |
Validate + map + build the executable DSL |
| 06 | Cache Store | MongoPromptCache |
Persist intent + DSL (never rows) |
| 07 | Execute | ReportEngine |
Run the DSL β the unchanged reporting engine |
ReportQueryPipeline orchestrates steps 1β6 and returns the DSL; step 7 is your call.
π Usage
use Mostafax\AiReportingEngine\ReportQueryPipeline; use Mostafax\ReportingEngine\Core\Engine\ReportEngine; public function search(Request $request, ReportQueryPipeline $pipeline, ReportEngine $engine) { $result = $pipeline->process($request->string('question')); // [ // 'source' => 'regex' | 'openai' | 'cache', // 'origin' => the engine that first produced it, // 'collection' => 'students', // 'query' => [...unified intent...], // 'dsl' => [...executable DSL...], // 'cached' => bool, // ] $rows = $engine->run($result['dsl']); // execute via the reporting engine return response()->json([ 'data' => $rows->data, 'source' => $result['source'], ]); }
Unrecognized prompts throw ReportParsingException β catch it and return a helpful message.
π― The Unified Intent Object
Both parsers return the identical IntermediateQuery. The DSL builder and the rest
of the app never know whether the source was regex or AI β only the parser sets _source.
{
"module": "students",
"operation": "list",
"metrics": [],
"dimensions": [],
"filters": [ { "field": "class_id", "operator": "=", "value": 1 } ],
"groupBy": [],
"sort": [],
"limit": 20,
"dateRange": null,
"projection": ["first_name_ar", "status"],
"_source": "regex"
}
βοΈ Parsers
RegexParser (deterministic, token-free)
- Matches section patterns and column concepts stored in MongoDB.
- Prefers the primary entity (students/teachers/parents) over a container mentioned only as a filter β so "students in class 1-A" resolves to
studentsfiltered byclass_id, not the classes report. - Extracts status filters (accepted, unpaidβ¦), relative dates (today, this monthβ¦), free-text titles ("book X"), and explicit column requests ("names only").
- Resolves entity names to ids via a pluggable
EntityResolverInterface.
OpenAiParser (fallback only)
- Runs only when regex cannot match β keeping token cost near zero.
- Returns strict JSON matching the intent shape; never DSL, never SQL.
- Exposes
lastLatencyMs()andlastTokens()for the execution log. - The system/schema prompt is host-provided via config, so the package stays domain-agnostic.
Parse method
| Method | Behavior |
|---|---|
auto |
Regex first, OpenAI on miss β recommended |
regex |
Local regex only β no tokens, no network |
openai |
OpenAI only |
π§Ή Prompt Normalization
PromptNormalizer canonicalizes prompts so equivalent phrasings share one cache entry:
| Step | Effect |
|---|---|
| Trim + collapse spaces | " ΨΉΨ―Ψ― Ψ§ΩΨ·ΩΨ§Ψ¨ " β "ΨΉΨ―Ψ― Ψ§ΩΨ·ΩΨ§Ψ¨" |
| Arabic letter unify | Ψ£ Ψ₯ Ψ’ β Ψ§, Ω β Ω, Ψ© β Ω, Ψ€ β Ω, Ψ¦ β Ω |
| Strip tashkeel & tatweel | "Ψ§ΩΨ·ΩΩΩΩΩΨ§Ψ¨" β "Ψ§ΩΨ·ΩΨ§Ψ¨" |
| Lowercase | Latin case-folding |
| Strip punctuation | keeps letters, digits, spaces |
| Fingerprint | sha256(normalized) β the cache key |
πΎ Cache Storage
Each cache entry keeps the parsed intent only β never result rows (data changes over time; rows are always fetched fresh).
| Field | Description |
|---|---|
question |
Original prompt |
normalized_prompt |
Canonicalized prompt |
question_hash |
sha256 fingerprint (the key) |
collection |
Resolved module |
source |
regex | openai | cache |
query |
The unified intent |
dsl |
The generated DSL |
created_at |
First seen |
last_used_at |
Last hit |
usage_count |
Hit counter |
Default TTL is 24 hours (configurable).
ποΈ DSL Builder
The single, centralized place that turns a validated IntermediateQuery into a
ReportingEngine DSL. No parser builds DSL directly.
- Rejects an invalid intent with
ReportParsingException. - Keeps only flat (non-relation) columns for MongoDB documents.
- Wraps filters in a
FilterGroup({operator, conditions}) so the DSL is directly executable by the reporting engine. - Wraps
likevalues in%β¦%.
[ 'source' => 'mongodb', 'connection' => 'mongodb', 'table' => 'invoices', 'pagination' => ['page' => 1, 'per_page' => 20], 'fields' => ['invoice_number', 'status'], 'filters' => ['operator' => 'AND', 'conditions' => [ ['column' => 'status', 'operator' => '!=', 'value' => 'paid'], ]], ]
π§ Configuration
config/ai-reporting.php:
return [ 'enabled' => env('AI_REPORTING_ENABLED', true), 'parse_method' => env('AI_REPORTING_METHOD', 'auto'), // auto | regex | openai 'openai' => [ 'api_key' => env('OPENAI_API_KEY', ''), 'model' => env('OPENAI_MODEL', 'gpt-4o-mini'), 'timeout' => (int) env('OPENAI_TIMEOUT', 30), 'schema_prompt' => 'β¦your collections + columnsβ¦', 'endpoint' => 'https://api.openai.com/v1/chat/completions', ], 'storage' => [ 'connection' => env('AI_REPORTING_MONGO_CONNECTION', 'mongodb'), 'cache_collection' => 'ai_queries', 'mappings_collection' => 'ai_report_mappings', 'concepts_collection' => 'ai_column_concepts', 'cache_ttl_hours' => 24, ], 'dsl' => [ 'source' => 'mongodb', 'connection' => 'mongodb', ], ];
π Extending (SOLID)
Every collaborator is an interface β bind your own in a service provider:
| Interface | Default | Bind to override |
|---|---|---|
PromptNormalizerInterface |
PromptNormalizer |
custom normalization |
AiConfigInterface |
ConfigAiConfig (config) |
DB-backed settings |
PromptCacheInterface |
MongoPromptCache |
Redis / SQL cache |
MappingRepositoryInterface |
MongoMappingRepository |
your mapping store |
EntityResolverInterface |
(none) | resolve class name β id, etc. |
PromptParserInterface |
RegexParser, OpenAiParser |
add a custom parser |
// Enable class-name β class_id lookups $this->app->bind(EntityResolverInterface::class, MyClassResolver::class); // Read AI settings from your own store $this->app->singleton(AiConfigInterface::class, SystemSettingAiConfig::class);
π Data Stores (MongoDB)
Three collections back the engine (all runtime-editable):
| Collection | Purpose |
|---|---|
ai_report_mappings |
Section patterns + default projection + text-search column per module |
ai_column_concepts |
Keyword β columns-per-module (e.g. "name" β first_name_ar, family_name_ar) |
ai_queries |
Prompt/DSL cache (intent only) |
Section mapping
{
"collection": "students",
"patterns": "Ψ·Ψ§ΩΨ¨|Ψ·ΩΨ§Ψ¨|student|students",
"projection": ["student_code", "first_name_ar", "status"],
"text_column": "first_name_ar",
"priority": 150,
"is_active": true
}
Column concept
{
"label": "name",
"patterns": "Ψ§Ψ³Ω
|Ψ£Ψ³Ω
Ψ§Ψ‘|name",
"columns_per_collection": {
"students": ["first_name_ar", "family_name_ar"],
"teachers": ["name_ar"]
},
"is_active": true
}
π Logging
Every generated query is logged with:
parser Β· cache_hit Β· execution_ms Β· openai_ms Β· openai_tokens Β· dsl
Use it to monitor token spend, cache-hit ratio, and slow prompts.
π‘οΈ Error Handling
- No match (regex miss + AI unavailable/disabled) β
ReportParsingException. - Invalid intent / DSL β
ReportParsingExceptionfromDslBuilder. - OpenAI failure / invalid JSON β logged, returns
null, pipeline falls back per method. - Cache/store failure β logged, never breaks generation.
An invalid report is never executed.
π§ͺ Testing
composer install vendor/bin/phpunit
Unit tests cover the DSL builder (filter groups, like wrapping, validation) and
can be extended with your own parser and normalization cases.
π Architecture Reference
User Prompt
β
βΌ
Prompt Normalization (PromptNormalizer)
β
βΌ
Intent Cache Lookup (MongoPromptCache)
ββββββββββββββ΄βββββββββββββ
Cache Hit Cache Miss
β β
βΌ βΌ
Return Stored DSL Regex Rule Engine (RegexParser)
β
ββββββββββββββ΄βββββββββββββ
Match Found No Match
β β
βΌ βΌ
Intent Object OpenAI Intent Parser (OpenAiParser)
ββββββββββββββ¬βββββββββββββ
βΌ
DSL Builder (DslBuilder β validates)
βΌ
Cache Storage (MongoPromptCache)
βΌ
Report Execution (ReportEngine β unchanged)
π License
MIT Β© Mostafa Elbayyar