acdh-oeaw/repo-php-util

Set of classes for working with our repository stack

0.9.1 2017-04-21 18:52 UTC

README

Set of classes for working with the ACDH repository stack.

Installation

  • obtain composer (https://getcomposer.org/)
  • prepare composer.json file containing:
    {
      "require": {
          "acdh-oeaw/repo-php-util": "*"
      }
    }
    
  • Run composer install
  • Copy and adjust files:
    • vendor/acdh-oeaw/repo-php-util/config.ini.sample (service URLs, credentials, metadata schema fundamentals)
    • vendor/acdh-oeaw/repo-php-util/property_mappings.json (Redmine issues property mappings)

Initialization

  • Load composer
  • Read config from config.ini
  • Create an object of acdhOeaw\fedora\Fedora class
  • If you want to use the Indexer and/or the Redmine classes, call their static initialize methods
require_once 'vendor/autoload.php';

$config = new zozlak\util\Config('config.ini');

$fedora = new acdhOeaw\fedora\Fedora($conf);
acdhOeaw\redmine\Redmine::init($config, $fedora);
acdhOeaw\storage\Indexer::init($config);

Documentation

You can find detailed API documentation in the docs folder.

To read it on-line go to https://rawgit.com/acdh-oeaw/repo-php-util/master/docs/index.html

Usage

(it is assumed that you already run the initialization code, especially that the $fedora object is created)

Working with Fedora resources

A Fedora resource is represented by the achdOeaw\fedora\FedoraResource class.

This class provides you basic methods to manipulate both resource's metadata and binary content (see examples below).

In general you should not create FedoraResource objects directly but always use proper Fedora class method (see examples below).
If you want to know more, please read the Fedora class documentation, especially parts on transactions handling.

The metadata are represented by the EasyRdf Resource object.

Updating metadata in RDF can be tricky, so please read examples on this topic provided below.

All resource modifications must be done within a Fedora transaction so all the $fedora->begin() and $fedora->commit() in the code examples are really needed.

Creating a new Fedora resource

Prepare resource metadata and (optionally) its binary content and call the createResource() method of the Fedora class.

$graph = new EasyRdf\Graph();
$metadata = $graph->resource('.'); // the resource URI you provide here is irrelevant and can be any string, just it can not be empty; it is an EasyRdf library limitation
$metadata->addLiteral('http://my.data/#property', 'myDataPropertyValue');
$metadata->addResource('http://my.object/#property', 'http://my.Object/Property/Value');

$fedora->begin();
$resource1 = $fedora->createResource($metadata, 'pathToFile'); // with binary data from file
$resource1 = $fedora->createResource($metadata, 'myResourceData (...)'); // with binary data from string
$resource2 = $fedora->createResource($metadata); // without binary data
$fedora->commit();

Finding already existing Fedora resources

If you know the resource ACDH ID you can use the getResourceById() method.

$resource = $fedora->getResourceById('https://id.acdh.oeaw.ac.at/ba83b0d6-86cd-4340-bfd7-ab5a2edb345a');
echo $resource->__getSparqlTriples();

If you know resource's metadata property value, you can search for all resources having such a value with the getResourcesByProperty() method.

$resources = $fedora->getResourceByProperty('http://www.w3.org/2000/01/rdf-schema#seeAlso', 'https://redmine.acdh.oeaw.ac.at/issues/5488');
echo count($resources);
echo $resources[0]->__getSparqlTriples();

Of course if you know the resource's Fedora URI, you can use it as well (with the getResourceByUri() method).

$resource = $fedora->getResourceByUri('http://fedora.apollo.arz.oeaw.ac.at/rest/92/35/a8/40/9235a840-5f0e-4f24-971d-c0c557f43d9e');
echo $resource->__getSparqlTriples();

Updating resource metadata

Updating RDF metadata is a little tricky. The main problem is that an update of a metadata property value is not well defined therefore can not be done automatically for you.

Lets assume we have an existing metadata triple <ourResource> <ourProperty> "currentValue" and a new triple <ourResource> <ourProperty> "currentValue".
There is no way to outomatically decide if the new triple should replace the old one or be added next to it.
This is because RDF triples are uniquely identified by all their components (subject, property and object) and change in any of components (also in the object) alters this unique identifier and makes it unable to match it with a previous value of a triple.

This means the only way to avoid triples multiplication is to always delete all previous metadata and add all current values.
It is automatically done by the library but it means you must always provide a full metadata set when calling the setMetadata() method if you do not want to loose any metadata triples.

Remember:

  • Always take current resource metadata as a basis.
    • The only exception might be if you are sure the new triples do not exist in the current metadata and do not interfere in any way with current metadata.
      In such a case remember to use updateMetadata('ADD').
  • Remember to delete all metadata values before adding current ones (remember, there is no update, just delete and add).
    • If a property can have multiple values, assure you are deleting it only once (do not repeat deletion for the every new value you encounter).
  • Think twice when dealing with rdfs:identifier and rdf:isPartOf properties (these two are very important).

Good example.

$myProperty = 'http://my.new/#property'

$fedora->begin();

$resource = $fedora->getResourcesByProperty($conf->get('redmineIdProp'), 'https://redmine.acdh.oeaw.ac.at/issues/5488')[0];
$metadata = $resource->getMetadata();
$metadata->delete($myProperty));
foreach(array('value1', 'value2') as $i){
    $metadata->addLiteral($myProperty, $i);
}
$resource->setMetadata($metadata);
$resource->updateMetadata();

$fedora->commit();

Bad example 1. You will end up with a resource having only your new triples. All other metadata will be lost.

$myProperty = 'http://my.new/#property'

$fedora->begin();

$graph = new EasyRdf\Graph();
$metadata = $graph->resource('.');
foreach(array('value1', 'value2') as $i){
    $metadata->addLiteral($myProperty, $i);
}
$resource->setMetadata($metadata);
$resource->updateMetadata();

$fedora->commit();

Bad example 2. You will end up with both old and new values of your property.

$myExistingProperty = 'http://my.existing/#property'

$fedora->begin();

$resource = $fedora->getResourcesByProperty($conf->get('redmineIdProp'), 'https://redmine.acdh.oeaw.ac.at/issues/5488')[0];
$metadata = $resource->getMetadata();
foreach(array('value1', 'value2') as $i){
    $metadata->addLiteral($myExistingProperty, $i);
}
$resource->setMetadata($metadata);
$resource->updateMetadata();

$fedora->commit();

Bad example 3. You will end up with only last added value of your property.

$myMultivalueProperty = 'http://my.existing/#property'

$fedora->begin();

$resource = $fedora->getResourcesByProperty($conf->get('redmineIdProp'), 'https://redmine.acdh.oeaw.ac.at/issues/5488')[0];
$metadata = $resource->getMetadata();
foreach(array('value1', 'value2') as $i){
    $metadata->delete($myMultivalueProperty);
    $metadata->addLiteral($myMultivalueProperty, $i);
}
$resource->setMetadata($metadata);
$resource->updateMetadata();

$fedora->commit();

Updating resource binary data

Updating resource binary data is easy. Just obtain the acdhOeaw\fedora\FedoraResource object (see above) and call the updateContent() method.

$fedora->begin();

$resource = $fedora->getResourceById('https://id.acdh.oeaw.ac.at/myResource');
$resource->updateContent('pathToFile'); // with data in file
$resource->updateContent('new content of the resource'); // with data passed directly

$fedora->commit();

Synchronizing Redmine with Fedora

There is a set of classes for syncing various Redmine objects (projects, users and issues) with Fedora: acdhOeaw\redmine\Project, acdhOeaw\redmine\User and acdhOeaw\redmine\Issue

Using them is very simple - the static fetchAll() method creates PHP objects representing Redmine objects of a given kind which can be then saved/updated in the Fedora by calling their updateRms() method.

Additionally the Issue class fetchAll() method allows you to specify any filters accepted by the Redmine REST API.

E.g. synchronization of all the Redmine issues with tracker_id equal to 5 can be done like that:

$fedora->begin();

$issues = acdhOeaw\redmine\Redmine::fetchAllIssues(true, ['tracker_id' => 5]);
foreach ($issues as $i) {
    $i->updateRms();
}

$fedora->commit();

Indexing files in the filesystem

Library providex the acdhOeaw\storage\Indexer class which automates the process of ingesting/updating binary content into the Fedora.

The Indexer class is created on top of the acdhOeaw\fedora\FedoraResource object which means you must instanciate a FedoraResource object first.

The Indexer class is highly configurable - see the class documentation for all the details.

Below we will index all xml files in a given directory and its direct subdirectories putting them as a direct children of the FedoraResouce (meaning no Fedora collection resource will be created for subdirectories found in the file system). All files smaller then 100 MB will be ingested into the repository and for bigger files pure metadata Fedora resources will be created.

$fedora->begin();

$resource = $fedora->getResourcesByProperty($conf->get('redmineIdProp'), 'https://redmine.acdh.oeaw.ac.at/issues/5488')[0];
$ind = new acdhOeaw\storage\Indexer($resource);
$ind->setFilter('|[.]xml$|i');
$ind->setPaths(array('directoryToIndex')); // read next chapter
$ind->setUploadSizeLimit(100000000);
$ind->setDepth(1);
$ind->setFlatStructure(true);
$ind->index(true);

$fedora->commit();

How files are matched with repository resources

A file is matched with a repository resource if two conditions are met:

  • the file and the resource have the same parent resource
  • the relative file path is the same as resource's fedoraLocProperty metadata property value
    • to make your life easier the Indexer class switches all \ to / and character encoding is assured to be UTF-8 before the comparison

relative file path is a full path to the file with the containerDir configuration property value skipped.

E.g. if your containerDir is /mnt/acdh_resources/container/ and full file path is /mnt/acdh_resources/container/myProject/myFile, the relative file path is myProject/myFile.

It is extremely important to assure that your relative file paths are proper. If they are not, you risk data duplication on next import.

It is clearly wrong to use empty containerDir configuration property** and pass full path to Indexer::setPaths().**

Importing set of RDF data

If you have a bunch of data in a form of an RDF graph, you can ingest it easily with the MetadataCollection class.