A wrapper to work with TesseractOCR inside PHP

Installs: 15 288

Dependents: 2

Stars: 311

Watchers: 39

Forks: 93

Open Issues: 5

Language: PHP

0.2.1 2015-04-02 13:59 UTC


A wrapper to work with TesseractOCR inside your PHP scripts.


Via composer (

    "require": {
        "thiagoalessio/tesseract_ocr": ">= 0.2.0"

Or just clone and put somewhere inside your project folder.

$ cd myapp/vendor
$ git clone git://


IMPORTANT: Make sure that the tesseract binary is on your $PATH. If you're running PHP on a webserver, the user may be not you, but _www or similar. If you need, there is always the possibility of modify your $PATH:

$path = getenv('PATH');

Windows users

I received several messages from people trying to get this library running under Windows, so I decided to write a short tutorial that can be found here.


Basic usage

require_once '/path/to/TesseractOCR/TesseractOCR.php';
//or require_once 'vendor/autoload.php' if you are using composer

$tesseract = new TesseractOCR('images/some-words.jpg');
echo $tesseract->recognize();

Defining language

Tesseract has training data for several languages, which certainly improve the accuracy of the recognition.

require_once '/path/to/TesseractOCR/TesseractOCR.php';
//or require_once 'vendor/autoload.php' if you are using composer

$tesseract = new TesseractOCR('images/sind-sie-deutsch.jpg');
$tesseract->setLanguage('deu'); //same 3-letters code as tesseract training data packages
echo $tesseract->recognize();

Inducing recognition

Sometimes tesseract misunderstand some chars, such as:

0 - O
1 - l
j - ,
etc ...

But you can improve recognition accuracy by specifing what kind of chars you're sending, for example:

$tesseract = new TesseractOCR('my-image.jpg');
$tesseract->setWhitelist(range('a','z')); //tesseract will threat everything as downcase letters
echo $tesseract->recognize();

$tesseract = new TesseractOCR('my-image.jpg');
$tesseract->setWhitelist(range('A','Z'), range(0,9), '_-@.'); //you can pass as many ranges as you need

You can even do cool stuff like this one:

$tesseract = new TesseractOCR('617.jpg');
echo $tesseract->recognize(); //will return "GIT"


Warnings like Permission denied or No such file or directory

To solve this issue you can specify a custom directory for temp files:

$tesseract = new TesseractOCR('my-image.jpg');