kaleu62 / pi-reader
A Simple PDF and Image parser
v0.9
2023-01-09 08:44 UTC
Requires
- php: >=7.1
- guzzlehttp/guzzle: >=6.3
- spatie/pdf-to-text: ^1.1
Requires (Dev)
- phpunit/phpunit: >=7.0
Suggests
- spatie/pdf-to-text: Needed to install the lib pdftotext, in Debian: 'apt-get install -y xpdf', or in CentOS: 'yum install poppler-utils'
README
It's a PDF and Image reader using spatie/pdf-to-text and API from https://ocr.space/
How To Use?
$pireader = new PIReader(
[
'apiKey' => 'xxxxxxxxx', // ocr.space API Key
'production' => false
]
);
Due to the limitation of requests in ocr.space, the 'apiKey' parameter is mandatory, but its apiKey will only be used if the 'production' parameter is set to true.
The application currently consists of some really basic functions:
- Return the OCR parsed Text
- Verify the existence of a text in the document
- Count the number of incidents of a text in the document
- Perform a search in the Text through a regular expression
getArchive($filePath)
This function returns an array with text of the parsed contents of the file (Pdf or image) present in the path informed
$pireader->getArchive("http://my_fake_pdf_path/file.pdf");
existsInFile($filePath, $string)
This function returns a boolean with informing if the text is present in the file of the informed path, if the file is opened correctly it returns boolean, otherwise it returns null
$pireader->existsInFile("http://my_fake_pdf_path/file.pdf", "John Doe");
countOccurrences($filePath, $string)
$pireader->countOccurrences("http://my_fake_pdf_path/file.pdf", "John Doe");
regexFind($filePath, $regex)
$pireader->regexFind("http://my_fake_pdf_path/file.pdf", "[\d{5}\.\d{5} \d{5}\.\d{6} \d{5}\.\d{6} \d{1} \d{14}]");