insightbase / invoice-parser-nette
Nette package for parsing invoice/accounting documents from PDF (including scanned PDFs) using Azure Document Intelligence, LLM normalization and Czech-specific validation.
Package info
github.com/insightbase/InvoiceParser-nette
pkg:composer/insightbase/invoice-parser-nette
Requires
- php: >=8.1
- ext-json: *
- guzzlehttp/guzzle: ^7.9
- nette/di: ^3.1
- nette/schema: ^1.3
- nette/utils: ^4.0
Requires (Dev)
- phpunit/phpunit: ^10.5 || ^11.0
Suggests
- contributte/rabbitmq: For asynchronous invoice processing workers.
This package is auto-updated.
Last update: 2026-05-16 08:24:23 UTC
README
Nette balik pro vytezovani faktur a ucetnich dokladu z PDF (vcetne skenu) pres:
- Azure Document Intelligence (OCR + strukturovana extrakce)
- LLM normalizaci (Azure OpenAI)
- ceske regex fallbacky (
VS,DUZP,ICO,DIC) - validacni vrstvu a asynchronni worker pattern
Instalace
composer require insightbase/invoice-parser-nette
Azure setup (API key + endpoint + deployment)
Niz je doporuceny postup. Urceno pro stav k 15. 3. 2026.
1) Azure ucet a subscription
- Vytvor nebo pouzij existujici Azure account.
- Over, ze mas aktivni subscription a opravneni aspon
Contributorna resource group.
2) Azure Document Intelligence (azureDi)
- V Azure Portal vytvor resource typu
Document Intelligence(historickyForm Recognizer). - Vyber region, kde sluzbu chces provozovat.
- Po vytvoreni otevri
Keys and Endpoint. - Zkopiruj:
Endpoint-> pouzij jakoAZURE_DI_ENDPOINTKey 1neboKey 2-> pouzij jakoAZURE_DI_KEY
3) Azure OpenAI (llm)
- V Azure Portal vytvor resource
Azure OpenAI. - V resource otevri
Keys and Endpoint. - Zkopiruj:
Endpoint->AZURE_OPENAI_ENDPOINTKey 1neboKey 2->AZURE_OPENAI_KEY
- Otevri Azure AI Foundry / model deployment panel pro tento resource.
- Vytvor model deployment (napr. GPT model) a zapamatuj
deployment name->AZURE_OPENAI_DEPLOYMENT.
Poznamka:
- Pokud nejde Azure OpenAI resource nebo deployment vytvorit, jde obvykle o chybejici quota/permission v tenantu nebo regionu. V tom pripade je potreba pozadat Azure admina o povoleni.
4) Promenne prostredi
Minimalne nastav:
AZURE_DI_ENDPOINT=https://<your-di-resource>.cognitiveservices.azure.com AZURE_DI_KEY=<your-di-key> AZURE_OPENAI_ENDPOINT=https://<your-openai-resource>.openai.azure.com AZURE_OPENAI_KEY=<your-openai-key> AZURE_OPENAI_DEPLOYMENT=<your-deployment-name>
5) Konfigurace extension v Nette
extensions: invoiceParser: InsightBase\InvoiceParserNette\DI\InvoiceParserExtension invoiceParser: azureDi: endpoint: %env(AZURE_DI_ENDPOINT)% apiKey: %env(AZURE_DI_KEY)% model: prebuilt-invoice apiVersion: 2023-07-31 maxPollAttempts: 25 pollIntervalMs: 1000 llm: enabled: true endpoint: %env(AZURE_OPENAI_ENDPOINT)% deployment: %env(AZURE_OPENAI_DEPLOYMENT)% apiKey: %env(AZURE_OPENAI_KEY)% apiVersion: 2024-10-21
6) Odkazy na oficialni dokumentaci
- Azure Document Intelligence quickstart: https://learn.microsoft.com/azure/ai-services/document-intelligence/quickstarts/get-started-sdks-rest-api
- Azure OpenAI chat completions quickstart: https://learn.microsoft.com/azure/ai-services/openai/chatgpt-quickstart
- Azure OpenAI role-based access control: https://learn.microsoft.com/azure/ai-services/openai/how-to/role-based-access-control
Pouziti
<?php declare(strict_types=1); use InsightBase\InvoiceParserNette\Parser\InvoiceParser; final class InvoiceService { public function __construct( private InvoiceParser $invoiceParser, ) { } public function parse(string $pdfPath): array { $pdfContent = file_get_contents($pdfPath); $result = $this->invoiceParser->parsePdf((string) $pdfContent); return $result->invoice->toArray(); } }
Asynchronni worker (Contributte RabbitMQ)
Knihovna obsahuje worker service InvoiceParseWorker::process(array $message).
Priklad payloadu zpravy:
{
"pdfPath": "/data/invoices/invoice-2026-001.pdf"
}
Nebo:
{
"pdfBase64": "JVBERi0xLjQKJ..."
}
Ukazkova integrace je v examples/rabbitmq.neon a examples/InvoiceConsumer.php.
Poznamky
- Pro oskenovane PDF se OCR resi na strane Azure Document Intelligence.
- Regex fallback slouzi jako doplnek, kdyz DI/LLM vrati neuplna data.
- Validator hlida zakladni konzistenci castek a dat.