tiefan / google-pdf-scraper
A php library to filter pdf documents in google driver for Daniel Fischl
v0.17
2019-01-13 15:22 UTC
Requires
- php: ^7.2
- ext-json: *
- ext-mysqli: *
- google/apiclient: ^2.0
- smalot/pdfparser: ^0.13.2
README
This is a php library to filter pdf documents in google driver for Daniel Fischl.
To import this into your project, use composer.
composer require tiefan/google-pdf-scraper
Extract text from PDF document
$text = PdfScraper::textFromDriveId(string $fileId);
$text = PdfScraper::textFromDriveUrl(string $url);
Check Document with "Begin" and "End" Keyword
$isThatDocument = PdfScraper::checkKeywordsFromDriveId(string $fileId, string $begin, string $end = null);
$isThatDocument = PdfScraper::checkKeywordsFromDriveUrl(string $url, string $begin, string $end = null);
$scraper = new PdfScraper($doc, $isURL = true); // $isURL: true for url, false for id $isThatDocument = $scraper->checkKeywords(string $begin, string $end = null);
Using MySQL or MariaDB to process data at once
Following code is using db schema in Sample\db_pdf_scraper.sql
$pdfDB = new PdfDB($host, $username, $password, $database);
$processed_count = $pdfDB->checkPdfs();