nystronsolar / electric-bill-extractor
Extract the data from an Electric Bill
Requires
- smalot/pdfparser: ^2.3
- thedevick/precise-money: ^2.0
Requires (Dev)
- friendsofphp/php-cs-fixer: ^3.21
- phpunit/phpunit: ^10.2
- symfony/var-dumper: ^6.2
- vimeo/psalm: ^5.13
README
Logo made with Canva
Getting Started
First, instal the package with composer:
composer require nystronsolar/electric-bill-extractor
Next, extract the data from an electric bill:
<?php use NystronSolar\ElectricBillExtractor\ExtractorFactory; require_once __DIR__.'/vendor/autoload.php'; $bill = ExtractorFactory::extractFromFile('bill.pdf');
How the library works?
The Extractors
Each type of bill needs to have a custom extractor. Each extractor extends the Extractor Abstract Class, which when instantiating, you need to provide the parsed content of an bill. After it, you can extract its content by using the extract()
method, that it will returns a bill object or false in case of error.
List of all Extractors
The Extractor Factory
The Extractor Factory is a class that helps creating extractors
With the Extractor Factory, you can:
- Identify the extractor class from an string content or file path with the
identifyExtractorClassFromContent()
oridentifyExtractorClassFromFile()
methods - Instantiate the extractor class from an string content or file path with the
instantiateExtractorClassFromFile()
andinstantiateExtractorClassFromFile()
methods - Extract the bill from an string content or file path with the
extractFromFile()
andextractFromFile()
methods
<?php use NystronSolar\ElectricBillExtractor\ExtractorFactory; require_once __DIR__.'/vendor/autoload.php'; // Returns an class-string of an extractor that can be used or false ExtractorFactory::identifyExtractorClassFromFile('bill.pdf'); ExtractorFactory::identifyExtractorClassFromContent('An parsed bill content'); // Return an new extractor instance or false ExtractorFactory::instantiateExtractorFromFile('bill.pdf'); ExtractorFactory::instantiateExtractorFromContent('An parsed bill content'); // Extract an Bill - Return an bill object or false ExtractorFactory::extractFromFile('bill.pdf'); ExtractorFactory::extractFromContent('An parsed bill content');
Tests
The tests for the Electric Bill Extractor are pretty!
Content
To test an File Extractor library, we need to have many files to extract and check the results. And that's because, in the Electric Bill Extractor, under the tests/Content/bills/
folder, you can find many .txt
files that represents many fake bills, following the pattern for each extractor.
And to compare the results by the extractor and the actual bills, you can find .json
files in tests/Content/expected/
that when the tests run, it will compare the expected (json) with the actual bills (txt)
Why not use PDF files?
In most real-cases, projects will extract the data from an PDF file. In that approach, the Electric Bill Extractor Factory will parse the PDF file into text, using the https://github.com/smalot/pdfparser.
After the parsing, the library will use the text returned from the Parser and extract all the data for the bill.
But in tests, we can skip all the parsing. And that's because we use .txt
instead of .pdf
The Test Case
Since the extractor have this "strange" way to assert the classes, we built the Extractor Test Case, which is a Custom PHPUnit Test Case. This test case have an assertByContentFolder
method, which you need to provide the folder name and the extractor class.
Uploading a new Test Bill
The bills under tests/Content/bills
are usually real bills, but with some values changed. If you found an error with your bill and wan't to upload it to the repository, it's extremely important to change some values, such as real name, CPF, installation code, etc, since this project is Open Source, so anyone can see the bills.
You can use the command composer upload-bill
to help you adding a new bill, but you still have to remove all the secrets and fill the JSON expected file.
Here you can see all the values that must have to be changed in all bills.
RGE V3
Values are named in Portuguese, in order.
- Nome
- Rua e Número
- Bairro
- Cep, Cidade e Estado
- Lote
- Endereço de Leitura
- N° do Medidor
- CPF
- Código de Instalação
- Nota Fiscal
- Chave de Acesso Nota Fiscal QRCode
- Protocolo de Autorização Nota Fiscal
- Valores secretos dentro da caixa "Aviso Importante"
- Medidor
- Conta de Energia Elétrica (Número ID)
- Código de Débito Automático do Banco
RGE V2
Values are named in Portuguese, in order.
-
Nome
-
Rua e Número
-
Bairro
-
Cep, Cidade e Estado
-
Nota Fiscal - Ato Declaratório
-
Conta de Energia Elétrica (Número ID) e Série
-
Conta Contrato
-
Lote
-
Roteiro de Leitura
-
N° do Medidor
-
PN
-
Valores secretos dentro da caixa "PREZADO(A) CLIENTE"
-
RG
-
Código de Instalação
-
Código
-
Descrição da Operação
-
CPF
-
Código de Instalação
-
Nota Fiscal
-
Chave de Acesso Nota Fiscal QRCode
-
Protocolo de Autorização Nota Fiscal
-
Valores secretos dentro da caixa "Aviso Importante"
-
Medidor
-
Código de Débito Automático do Banco