README

PHP adapter for use with Stanford CoreNLP

Features

Connect to Stanford University CoreNLP API online
Connect to Stanford CoreNLP 3.7.0 server
Annotators available: tokenize,ssplit,pos, parse, depparse, ner, regexner,lemma, mention, natlog, coref, openie, kbp
The package creates Part-Of-Speech Trees with depth, parent- and child ID

Requirements

PHP 5.5 or higher: it also works on PHP 7
Windows or Linux 64-bit, 8Gb memory or more recommended
Either Guzzle HTTP Client (installed by default) or only cURL.
Composer for PHP

    https://getcomposer.org/

Update 24th February 2018

PHP7 Type hinting removed, because it was causing issues for some users.

Update 28th January 2019

Fixed issue with PHP 7.1 upwards

Installation using ZIP files

Install Stanford CoreNLP Server. See the installation walkthrough below.
Download and unpack the files from this package.
Copy the files to your to your webserver directory. Usually "htdocs" or "var/www".
Run a Composer update

Installation using Composer

Insert the following line into the "require" of your "composer.json" file.

    {
        "require": {
            "dennis-de-swart/php-stanford-corenlp-adapter": "*"
        }
    }

Run a composer update

Using the Stanford CoreNLP online API service

The adapter by default uses Stanford's online API service. This should work right after the composer update. Note that the online API is a public service. If you want to analyze large volumes of text or sensitive data, please install the Java server version.

OpenIE

OpenIE creates "subject-relation-object" tuples. This is similar (but not the same) as the "Subject-Verb-Object" concept of the English language.

Notes:

OpenIE is only available on the Java offline version, not with the "online" mode. See the installation walkthrough below
OpenIE data is not always available. Sometimes the result array might show empty, this is not an error.

http://nlp.stanford.edu/software/openie.html
https://en.wikipedia.org/wiki/Subject-verb-object

Installation / Walkthrough for Java server version

Step 1: install Java

https://java.com/en/download/help/index_installing.xml?os=All+Platforms&j=8&n=20

Step 2: installing the Stanford CoreNLP 3.7.0 server

http://stanfordnlp.github.io/CoreNLP/index.html#download

Step 3: Port for server

Default port for the Java server is port 9000. If port 9000 is not available you can change the port in the "bootstrap.php" file. Example:

define('CURLURL' , 'http://localhost:9000/');

Step 4: Start the CoreNLP serve from the command line.

Go to the download directory, then enter the following command:

java -mx8g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000

Important note: the Stanford manual says "-mx4g", however I found that this can lead to a Java OutOfMemory error. It is also important to use a 64-bit operating system with at enough memory (8Gb or more recommended)

Step 5: Test if the server has started by surfing to it's URL

http://localhost:9000/

When you surf to this URL, you should see the CoreNLP GUI. If you have problems with installation you can check the manual:

http://stanfordnlp.github.io/CoreNLP/corenlp-server.html

Step 6: Set ONLINE_API to FALSE

In "bootstrap.php" set define('ONLINE_API' , FALSE). This tells the Adapter to use the Java version

Usage examples

Instantiate the adapter:

$coreNLP = new CorenlpAdapter();

To process a text, call the "getOutput" method:

 $text = 'The Golden Gate Bridge was designed by Joseph Strauss.'; 
 $coreNLP->getOutput($text);

Note that the first time that you process a text, the server takes about 20 to 30 seconds extra to load definitions. All other calls to the server after that will be much faster. Small texts are usually processed within seconds.

The results

If successful the following properties will be available:

 $coreNLP->serverMemory;      //contains all of the server output
 $coreNLP->trees;             //contains processed flat trees. Each part of the tree is assigned an ID key
 
 $coreNLP->getWordValues($coreNLP->trees[1])  // get just the words from a tree

Diagram A: Tree With Tokens

Diagram B: The ServerMemory contains all the server data

Any questions?

Please let me know.

Credits

Some functions are forked from this "Stanford parser" package:

 https://github.com/agentile/PHP-Stanford-NLP

dennis-de-swart / php-stanford-corenlp-adapter

Maintainers

Details