ppp/wikibase-entity-store

A small library to store Wikibase entities

0.3.1 2016-07-08 11:29 UTC

README

Build Status Code Coverage Scrutinizer Code Quality Dependency Status

On Packagist: Latest Stable Version Download count

WikibaseEntityStore is a small library that provides an unified interface to interact with Wikibase entities.

It currently has two backends:

  • An API based backend, slow but very easy to deploy
  • A MongoDB based backend, faster but requires to maintain a local copy of Wikibase content

Installation

Use one of the below methods:

1 - Use composer to install the library and all its dependencies using the master branch:

composer require "ppp/wikibase-entity-store":dev-master"

2 - Create a composer.json file that just defines a dependency on version 1.0 of this package, and run 'composer install' in the directory:

{
    "require": {
        "ppp/wikibase-entity-store": "~1.0"
    }
}

Usage

Features

The entity storage system is based on the abstract class EntityStore that provides different services to manipulate entities.

These services are:

$storeBuilder = new EntityStoreFromConfigurationBuilder();
$store = $storeBuilder->buildEntityStore( 'MY_CONFIG_FILE.json' ); //See backend section for examples of configuration file

//Retrieves the item Q1
try {
    $item = $store->getItemLookup()->getItemForId( new ItemId( 'Q1' ) );
} catch( ItemNotFoundException $e ) {
    //Item not found
}

//Retrieves the property P1
try {
    $item = $store->getPropertyLookup()->getPropertyForId( new PropertyId( 'P1' ) );
} catch( PropertyNotFoundException $e ) {
    //Property not found
}

//Retrieves the item Q1 as EntityDocument
try {
    $item = $store->getEntityLookup()->getEntityDocumentForId( new ItemId( 'Q1' ) );
} catch( EntityNotFoundException $e ) {
    //Property not found
}

//Retrieves the item Q1 and the property P1 as EntityDocuments
$entities = $store->getEntityLookup()->getEntityDocumentsForIds( array( new ItemId( 'Q1' ), new PropertyId( 'P1' ) ) );

//Retrieves the ids of the items that have as label or alias the term "Nyan Cat" in English (with a case insensitive compare)
$itemIds = $store->getItemIdForTermLookup()->getItemIdsForTerm( new Term( 'en', 'Nyan Cat' ) );

//Retrieves the ids of the properties that have as label or alias the term "foo" in French (with a case insensitive compare)
$propertyIds = $store->getPropertyIdForTermLookup()->getPropertyIdsForTerm( new Term( 'fr', 'Foo' ) );

//Do a query on items using the Ask query language: retrieves the first 10 items with P1: Q1
$itemIds = $store->getItemIdForQueryLookup()->getItemIdsForQuery(
    new Query(
        new SomeProperty(
	        new EntityIdValue( new PropertyId( 'P1' ) ),
			new ValueDescription( new EntityIdValue( new ItemId( 'Q1' ) ) )
		),
		array(),
		new QueryOptions( 10, 0 )
	)
);

Backends

API Backend

The API backend is the most easy to use one. It uses the API of a Wikibase instance and WikidataQuery if you use this EntityStore as a backend for Wikidata data and you want query support.

The configuration file looks like:

{
    "backend": "api",
    "api": {
        "url": "http://www.wikidata.org/w/api.php",
        "wikidataquery-url": "http://wdq.wmflabs.org/api"
    }
}

Replace http://www.wikidata.org/w/api.php with the URL of your WediaWiki API if you want to use your store with an other Wikibase instance than Wikidata.

The parameter wikidataquery-url is optional and may be unset if you don't want query support using Wikidata content.

Without configuration file:

$store = new \Wikibase\EntityStore\Api\ApiEntityStore(
    new \Mediawiki\Api\MediawikiApi( 'http://www.wikidata.org/w/api.php' ),
    new \WikidataQueryApi\WikidataQueryApi( 'http://wdq.wmflabs.org/api' )
);

MongoDB Backend

The MongoDB backend uses a MongoDB database. Requires doctrine/mongodb.

The configuration file looks like:

{
    "backend": "mongodb",
    "mongodb": {
        "server": SERVER,
        "database": DATABASE
    }
}

server should be a MongoDB server connection string and database the name of the database to use.

Without configuration file:

//Connect to MongoDB
$connection = new \Doctrine\MongoDB\Connection( MY_CONNECTION_STRING );
if( !$connection->connect() ) {
    throw new RuntimeException( 'Fail to connect to the database' );
}

//Gets the database where entities are stored
$database = $connection
    ->selectDatabase( 'wikibase' );

$store = new \Wikibase\EntityStore\MongoDB\MongDBEntityStore( $database );

You can fill the MongoDB database from Wikidata JSON dumps using this script:

 php entitystore import-json-dump MY_JSON_DUMP MY_CONFIGURATION_FILE

Or from incremental XML dumps using this script:

php entitystore import-incremental-xml-dump MY_XML_DUMP MY_CONFIGURATION_FILE

InMemory backend

Backend based on an array of EntityDocuments. Useful for tests.

$store = new \Wikibase\EntityStore\InMemory\InMemoryEntityStore( array(
    new Item( new ItemId( 'Q42' ) )
) );

Options

The different backends support a shared set of options. These options are:

  • string[] languages: Allows to filter the set of internationalized values. Default value: null (a.k.a. all languages).
  • bool languagefallback: Apply language fallback system to languages defined using languages option. Default value: false.

They can be injected in the configuration:

{
  "options": {
    "languages": ["en", "fr"],
    "languagefallback": true
  }
}

They can be also passed as last parameter of EntityStore constructors:

$options = new \Wikibase\EntityStore\EntityStoreOptions( array(
	EntityStore::OPTION_LANGUAGES => array( 'en', 'fr' ),
	EntityStore::OPTION_LANGUAGE_FALLBACK => true
) );

$store = new \Wikibase\EntityStore\Api\ApiEntityStore(
    new \Mediawiki\Api\MediawikiApi( 'http://www.wikidata.org/w/api.php' ),
    null,
    $options
);

Cache support

It is possible, in order to get far better performances, to add a cache layer on top of EntityStore:

Adds to the configuration file a cache section.

Example with a two layers cache. The first one is a PHP array and the second one a Memcached instance on localhost:11211.

{
    "cache": {
        "array": true,
        "memcached": {
            "host": "localhost",
            "port": 11211
        }
    }
}

Without configuration file:

$store = MY_ENTITY_STORE;

$cache = new \Doctrine\Common\Cache\ArrayCache(); //A very simple cache
$cacheLifeTime = 100000; //Life time of cache in seconds

$cachedStore = new \Wikibase\EntityStore\Cache\CachedEntityStore( $store, $cache, $cacheLifeTime );

Release notes

0.3.1 (2016-01-11)

  • Adds support of WikibaseDataModel 5.0 and 6.0 and Number 0.7

0.3.0 (2016-01-11)

  • Update to WikibaseDataModel 4.0, WikibaseDataModelServices 3.2 and MediaWikiApiBase 2.0
  • Change encoding of label indexes in MongoDB

0.1.0 (2015-04-21)

Initial release with these features:

  • Retrieve entities from ids or terms
  • Import Wikidata JSON full dumps and incremental XML dumps
  • Beginning of support of simple queries
  • API and MongoDB backends
  • Basic cache system