bangpound / tika-rest-client
Web services client for Apache Tika
Installs: 4 652
Dependents: 0
Suggesters: 0
Security: 0
Stars: 2
Watchers: 3
Forks: 3
Open Issues: 1
Requires
- php: >=5.3.0
- guzzle/guzzle: ~3.7.2
Requires (Dev)
- apache/tika: ~1.4
- monolog/monolog: 1.*
- phpunit/phpunit: 3.7.*
- psr/log: 1.0.*
This package is auto-updated.
Last update: 2024-11-12 04:08:53 UTC
README
This PHP client interacts with the Tika REST Server for extracting content and metadata from a [wide variety of document file types][types]. There are [alternative PHP libraries][alternatives] that use the Tika command line client, but instantiating the JVM for each operation is slow and costly.
This client is built on Guzzle.
"see "Using Tika as a command line utility"" [types]: http://tika.apache.org/1.4/formats.html [alternatives]: https://packagist.org/search/?q=tika
Project Setup
This project is installed with composer.
In the shell, you can run this command:
composer require bangpound/tika-rest-client
Or you can edit your composer.json
file to include this requirement:
{ "require": { "bangpound/tika-rest-client": "^1.0" } }
Usage
<?php $client = new Bangpound\Tika\Client('http://localhost:9998'); $response = $client->tika(array( 'file' => 'TestPDF.pdf', )); // Metadata varies by file and file type, so refer to the Apache Tika docs for details. $all_metadata = $response->metadata; // If you know the metadata element you want to retrieve, specify it as the argument // to the response's metadata method. $author = $response->metadata('author'); // Extracted content can be retrieved as a SimpleXMLElement or a string of XML. $content_xml = $response->getBody(); $page_2 = $content_xml->children()->div[1]; $content_text = $response->getBody(true);
Testing
The Tika REST Client has an incomplete suite of tests. Run them using phpunit after installing the dev dependencies.
composer install phpunit
License
This code is released under the MIT license.