kaleu62/pi-reader

A Simple PDF and Image parser

v0.9 2023-01-09 08:44 UTC

This package is auto-updated.

Last update: 2024-04-09 11:44:38 UTC


README

Build Status 68747470733a2f2f6170692e636f6465636c696d6174652e636f6d2f76312f6261646765732f61393961383864323861643337613739646266362f6d61696e7461696e6162696c697479 68747470733a2f2f6170692e636f6465636c696d6174652e636f6d2f76312f6261646765732f61393961383864323861643337613739646266362f746573745f636f766572616765

It's a PDF and Image reader using spatie/pdf-to-text and API from https://ocr.space/

How To Use?

$pireader = new PIReader(
    [
        'apiKey' => 'xxxxxxxxx', // ocr.space API Key
        'production' => false
    ]
);

Due to the limitation of requests in ocr.space, the 'apiKey' parameter is mandatory, but its apiKey will only be used if the 'production' parameter is set to true.

The application currently consists of some really basic functions:

  • Return the OCR parsed Text
  • Verify the existence of a text in the document
  • Count the number of incidents of a text in the document
  • Perform a search in the Text through a regular expression

getArchive($filePath)

This function returns an array with text of the parsed contents of the file (Pdf or image) present in the path informed

$pireader->getArchive("http://my_fake_pdf_path/file.pdf");

existsInFile($filePath, $string)

This function returns a boolean with informing if the text is present in the file of the informed path, if the file is opened correctly it returns boolean, otherwise it returns null

$pireader->existsInFile("http://my_fake_pdf_path/file.pdf", "John Doe");

countOccurrences($filePath, $string)

$pireader->countOccurrences("http://my_fake_pdf_path/file.pdf", "John Doe");

regexFind($filePath, $regex)

$pireader->regexFind("http://my_fake_pdf_path/file.pdf", "[\d{5}\.\d{5} \d{5}\.\d{6} \d{5}\.\d{6} \d{1} \d{14}]");