mishagp / ocrmypdf
A simple PHP wrapper for OCRmyPDF
Requires
- php: ^8.2
- psr/log: ^1.0 || ^2.0 || ^3.0
Requires (Dev)
- phpstan/extension-installer: ^1.4
- phpstan/phpstan: ^2.0
- phpstan/phpstan-strict-rules: ^2.0
- phpunit/php-code-coverage: ^11.0.0
- phpunit/phpunit: ^11.0
This package is auto-updated.
Last update: 2026-06-27 03:02:08 UTC
README
A simple PHP wrapper for OCRmyPDF
Installation
Via Composer:
$ composer require mishahawthorn/ocrmypdf
This library depends on OCRmyPDF. Please see the GitHub repository for instructions on how to install OCRmyPDF on your platform.
Usage
Basic example
use mishahawthorn\OCRmyPDF\OCRmyPDF; //Return file path of outputted, OCRed PDF echo OCRmyPDF::make('document.pdf')->run(); //Return file contents of outputted, OCRed PDF echo OCRmyPDF::make('scannedImage.png')->setOutputPDFPath(null)->run();
API
setParam
Define invocation parameters for ocrmypdf. See ocrmypdf --help for a list of available parameters.
Important
Parameters configured via setParam will override any other parameters or configurations set otherwise.
use mishahawthorn\OCRmyPDF\OCRmyPDF; //Passing a single parameter with a value OCRmyPDF::make('document_zh-CN.pdf') ->setParam('-l', 'chi_sim') ->run(); //Passing a single parameter without a value OCRmyPDF::make('document_withBackground.pdf') ->setParam('--remove-background') ->run(); //Passing multiple parameters OCRmyPDF::make('document_withoutAttribution.pdf') ->setParam('--title', 'Lorem Ipsum') ->setParam('--keywords', 'Lorem,Ipsum,dolor,sit,amet') ->run();
setInputData
Pass image/PDF data loaded in memory into ocrmypdf directly via stdin.
use mishahawthorn\OCRmyPDF\OCRmyPDF; //Using Imagick $data = $img->getImageBlob(); $size = $img->getImageLength(); //Using GD ob_start(); imagepng($img, null, 0); $size = ob_get_length(); $data = ob_get_clean(); OCRmyPDF::make() ->setInputData($data, $size) ->run();
setOutputPDFPath
Specify a writable path where ocrmypdf should generate output PDF.
use mishahawthorn\OCRmyPDF\OCRmyPDF; OCRmyPDF::make('document.pdf') ->setOutputPDFPath('/outputDir/ocr_document.pdf') ->run();
setExecutable
Define a custom location of the ocrmypdf executable, if by any reason it is not present in the $PATH.
use mishahawthorn\OCRmyPDF\OCRmyPDF; OCRmyPDF::make('document.pdf') ->setExecutable('/path/to/ocrmypdf') ->run();
extractText / getText
Write the recognized plaintext to a sidecar file and read it back after run(). Call extractText() with no
argument to use a temporary file that is read and discarded, or pass a path to keep the text file.
use mishahawthorn\OCRmyPDF\OCRmyPDF; $ocr = OCRmyPDF::make('document.pdf') ->language('eng') ->extractText(); $pdfPath = $ocr->run(); $text = $ocr->getText(); //Recognized plaintext
Convenience setters
Thin, type-safe wrappers over the most common ocrmypdf parameters. Each is equivalent to the matching
setParam call.
use mishahawthorn\OCRmyPDF\OCRmyPDF; OCRmyPDF::make('document.pdf') ->language('eng', 'deu') // -l eng+deu ->deskew() // --deskew ->rotatePages() // --rotate-pages ->clean() // --clean ->removeBackground() // --remove-background ->optimize(3) // --optimize 3 (0-3) ->skipText() // --skip-text (or ->forceOcr() / ->redoOcr()) ->setThreadLimit(4) // --jobs 4 ->setTempDir('/var/tmp') ->run();
Boolean setters accept false to remove the flag, e.g. ->deskew(false).
setTimeout
Terminate the ocrmypdf process if it runs longer than the given number of seconds. A
ProcessTimeoutException is thrown when the limit is exceeded.
use mishahawthorn\OCRmyPDF\OCRmyPDF; OCRmyPDF::make('document.pdf') ->setTimeout(120) //seconds ->run();
setLogger
Attach any PSR-3 logger to receive the generated command, stderr output and any failures.
use mishahawthorn\OCRmyPDF\OCRmyPDF; OCRmyPDF::make('document.pdf') ->setLogger($psr3Logger) ->run();
License
ocrmypdf-php is released under the MIT License.
Credits
Development of ocrmypdf-php is based on the tesseract-ocr-for-php PHP wrapper library for tesseract
developed by thiagoalessio and associated contributors.