nystronsolar/electric-bill-extractor

Extract the data from an Electric Bill

2.2.2 2023-09-01 00:29 UTC

README

NystronSolarBadge PHPUnitBadge PHPBadge PsalmBadge SemanticReleaseBadge

Electric Bill Extractor

Logo made with Canva

Getting Started

First, instal the package with composer:

composer require nystronsolar/electric-bill-extractor

Next, extract the data from an electric bill:

<?php

use NystronSolar\ElectricBillExtractor\ExtractorFactory;

require_once __DIR__.'/vendor/autoload.php';

$bill = ExtractorFactory::extractFromFile('bill.pdf');

How the library works?

The Extractors

Each type of bill needs to have a custom extractor. Each extractor extends the Extractor Abstract Class, which when instantiating, you need to provide the parsed content of an bill. After it, you can extract its content by using the extract() method, that it will returns a bill object or false in case of error.

List of all Extractors

The Extractor Factory

NystronSolar\ElectricBillExtractor\ExtractorFactory

The Extractor Factory is a class that helps creating extractors

With the Extractor Factory, you can:

  • Identify the extractor class from an string content or file path with the identifyExtractorClassFromContent() or identifyExtractorClassFromFile() methods
  • Instantiate the extractor class from an string content or file path with the instantiateExtractorClassFromFile() and instantiateExtractorClassFromFile() methods
  • Extract the bill from an string content or file path with the extractFromFile() and extractFromFile() methods
<?php

use NystronSolar\ElectricBillExtractor\ExtractorFactory;

require_once __DIR__.'/vendor/autoload.php';

// Returns an class-string of an extractor that can be used or false
ExtractorFactory::identifyExtractorClassFromFile('bill.pdf');
ExtractorFactory::identifyExtractorClassFromContent('An parsed bill content');

// Return an new extractor instance or false
ExtractorFactory::instantiateExtractorFromFile('bill.pdf');
ExtractorFactory::instantiateExtractorFromContent('An parsed bill content');

// Extract an Bill - Return an bill object or false
ExtractorFactory::extractFromFile('bill.pdf');
ExtractorFactory::extractFromContent('An parsed bill content');

Tests

The tests for the Electric Bill Extractor are pretty!

Content

To test an File Extractor library, we need to have many files to extract and check the results. And that's because, in the Electric Bill Extractor, under the tests/Content/bills/ folder, you can find many .txt files that represents many fake bills, following the pattern for each extractor.

And to compare the results by the extractor and the actual bills, you can find .json files in tests/Content/expected/ that when the tests run, it will compare the expected (json) with the actual bills (txt)

Why not use PDF files?

In most real-cases, projects will extract the data from an PDF file. In that approach, the Electric Bill Extractor Factory will parse the PDF file into text, using the https://github.com/smalot/pdfparser.

After the parsing, the library will use the text returned from the Parser and extract all the data for the bill.

But in tests, we can skip all the parsing. And that's because we use .txt instead of .pdf

The Test Case

Since the extractor have this "strange" way to assert the classes, we built the Extractor Test Case, which is a Custom PHPUnit Test Case. This test case have an assertByContentFolder method, which you need to provide the folder name and the extractor class.

Uploading a new Test Bill

The bills under tests/Content/bills are usually real bills, but with some values changed. If you found an error with your bill and wan't to upload it to the repository, it's extremely important to change some values, such as real name, CPF, installation code, etc, since this project is Open Source, so anyone can see the bills.

You can use the command composer upload-bill to help you adding a new bill, but you still have to remove all the secrets and fill the JSON expected file.

Here you can see all the values that must have to be changed in all bills.

RGE V3

Values are named in Portuguese, in order.

  • Nome
  • Rua e Número
  • Bairro
  • Cep, Cidade e Estado
  • Lote
  • Endereço de Leitura
  • N° do Medidor
  • CPF
  • Código de Instalação
  • Nota Fiscal
  • Chave de Acesso Nota Fiscal QRCode
  • Protocolo de Autorização Nota Fiscal
  • Valores secretos dentro da caixa "Aviso Importante"
  • Medidor
  • Conta de Energia Elétrica (Número ID)
  • Código de Débito Automático do Banco

RGE V2

Values are named in Portuguese, in order.

  • Nome

  • Rua e Número

  • Bairro

  • Cep, Cidade e Estado

  • Nota Fiscal - Ato Declaratório

  • Conta de Energia Elétrica (Número ID) e Série

  • Conta Contrato

  • Lote

  • Roteiro de Leitura

  • N° do Medidor

  • PN

  • Valores secretos dentro da caixa "PREZADO(A) CLIENTE"

  • RG

  • Código de Instalação

  • Código

  • Descrição da Operação

  • CPF

  • Código de Instalação

  • Nota Fiscal

  • Chave de Acesso Nota Fiscal QRCode

  • Protocolo de Autorização Nota Fiscal

  • Valores secretos dentro da caixa "Aviso Importante"

  • Medidor

  • Código de Débito Automático do Banco