mishagp/ocrmypdf

This package is abandoned and no longer maintained. The author suggests using the mishahawthorn/ocrmypdf package instead.

A simple PHP wrapper for OCRmyPDF

Maintainers

Package info

github.com/mishahawthorn/ocrmypdf-php

pkg:composer/mishagp/ocrmypdf

Statistics

Installs: 238 799

Dependents: 0

Suggesters: 0

Stars: 9

Open Issues: 0

v2.0.0 2026-06-27 02:57 UTC

README

A simple PHP wrapper for OCRmyPDF

Latest Stable Version Total Downloads License PHP Version Require codecov

Installation

Via Composer:

$ composer require mishahawthorn/ocrmypdf

This library depends on OCRmyPDF. Please see the GitHub repository for instructions on how to install OCRmyPDF on your platform.

Usage

Basic example

use mishahawthorn\OCRmyPDF\OCRmyPDF;

//Return file path of outputted, OCRed PDF
echo OCRmyPDF::make('document.pdf')->run();

//Return file contents of outputted, OCRed PDF
echo OCRmyPDF::make('scannedImage.png')->setOutputPDFPath(null)->run();

API

setParam

Define invocation parameters for ocrmypdf. See ocrmypdf --help for a list of available parameters.

Important

Parameters configured via setParam will override any other parameters or configurations set otherwise.

use mishahawthorn\OCRmyPDF\OCRmyPDF;

//Passing a single parameter with a value
OCRmyPDF::make('document_zh-CN.pdf')
    ->setParam('-l', 'chi_sim')
    ->run();

//Passing a single parameter without a value
OCRmyPDF::make('document_withBackground.pdf')
    ->setParam('--remove-background')
    ->run();

//Passing multiple parameters
OCRmyPDF::make('document_withoutAttribution.pdf')
    ->setParam('--title', 'Lorem Ipsum')
    ->setParam('--keywords', 'Lorem,Ipsum,dolor,sit,amet')
    ->run();

setInputData

Pass image/PDF data loaded in memory into ocrmypdf directly via stdin.

use mishahawthorn\OCRmyPDF\OCRmyPDF;

//Using Imagick
$data = $img->getImageBlob();
$size = $img->getImageLength();

//Using GD
ob_start();
imagepng($img, null, 0);
$size = ob_get_length();
$data = ob_get_clean();

OCRmyPDF::make()
    ->setInputData($data, $size)
    ->run();

setOutputPDFPath

Specify a writable path where ocrmypdf should generate output PDF.

use mishahawthorn\OCRmyPDF\OCRmyPDF;
OCRmyPDF::make('document.pdf')
    ->setOutputPDFPath('/outputDir/ocr_document.pdf')
    ->run();

setExecutable

Define a custom location of the ocrmypdf executable, if by any reason it is not present in the $PATH.

use mishahawthorn\OCRmyPDF\OCRmyPDF;
OCRmyPDF::make('document.pdf')
    ->setExecutable('/path/to/ocrmypdf')
    ->run();

extractText / getText

Write the recognized plaintext to a sidecar file and read it back after run(). Call extractText() with no argument to use a temporary file that is read and discarded, or pass a path to keep the text file.

use mishahawthorn\OCRmyPDF\OCRmyPDF;

$ocr = OCRmyPDF::make('document.pdf')
    ->language('eng')
    ->extractText();

$pdfPath = $ocr->run();
$text = $ocr->getText(); //Recognized plaintext

Convenience setters

Thin, type-safe wrappers over the most common ocrmypdf parameters. Each is equivalent to the matching setParam call.

use mishahawthorn\OCRmyPDF\OCRmyPDF;

OCRmyPDF::make('document.pdf')
    ->language('eng', 'deu') // -l eng+deu
    ->deskew()               // --deskew
    ->rotatePages()          // --rotate-pages
    ->clean()                // --clean
    ->removeBackground()     // --remove-background
    ->optimize(3)            // --optimize 3 (0-3)
    ->skipText()             // --skip-text  (or ->forceOcr() / ->redoOcr())
    ->setThreadLimit(4)      // --jobs 4
    ->setTempDir('/var/tmp')
    ->run();

Boolean setters accept false to remove the flag, e.g. ->deskew(false).

setTimeout

Terminate the ocrmypdf process if it runs longer than the given number of seconds. A ProcessTimeoutException is thrown when the limit is exceeded.

use mishahawthorn\OCRmyPDF\OCRmyPDF;

OCRmyPDF::make('document.pdf')
    ->setTimeout(120) //seconds
    ->run();

setLogger

Attach any PSR-3 logger to receive the generated command, stderr output and any failures.

use mishahawthorn\OCRmyPDF\OCRmyPDF;

OCRmyPDF::make('document.pdf')
    ->setLogger($psr3Logger)
    ->run();

License

ocrmypdf-php is released under the MIT License.

Credits

Development of ocrmypdf-php is based on the tesseract-ocr-for-php PHP wrapper library for tesseract developed by thiagoalessio and associated contributors.