acdh-oeaw/arche-metadata-crawler

Script and library for checking and generating ARCHE metadata in ACDH schema

0.8.1 2024-04-09 18:14 UTC

This package is auto-updated.

Last update: 2024-04-23 09:55:53 UTC


README

Functionality

A set of scripts:

  • Merging metadata of a collection from inputs in various formats
  • Validating the merged metadata
  • Generating XLSX metadata templates based on the current ontology (see the horizontal metadata files in metadata formats description)

used for the metadata curation during ARCHE ingestions.

Installation

Locally

  • Install PHP 8 and composer
  • Run:
    composer require acdh-oeaw/arche-metadata-crawler

As a docker image

  • Install docker.
  • Run the acdhch/repo-file-checker image mounting your data directory into it:
    docker run --rm -ti --entrypoint bash -u `id -u`:`id -g` \
               -v pathToYourDataDir:/data \
               acdhch/repo-file-checker
  • Run the scripts, e.g.
    /opt/vendor/bin/arche-create-metadata-template /data al
    and
    /opt/vendor/bin/arche-crawl-meta \
      /data/metadata \
      /data/merged.ttl \
      /ARCHE/staging/GlaserDiaries_16674/data \
      https://id.acdh.oeaw.ac.at/glaserdiaries
    
    • if you need the file-checker, it is available under /opt/vendor/bin/arche-filechecker

On ACDH Cluster

Nothing to be done. It is installed there already.

Usage

(For a full walk-trough using repo-ingestion@hephaistos and the Wollmilchsau test collection please look here)

On ACDH Cluster

First, get the arche-ingestion workload console by:

  • Opening this link (if you are redirected to the login page, open the link once again after you log in)
  • Clicking on the bluish button with three vertical dots in the top-right corner of the screen and and choosing > Execute Shell

Then:

  • Generate and validate the metadata:
    • Open a screen session (the shell disconnects after one minute of inactivity) with
      screen
      • If you need to reconnect to the screen session because it was disconnected, run
        screen -rd
    • Run the arche-crawl-meta script:
      /ARCHE/vendor/bin/arche-crawl-meta \
        <pathToMetadataDirectory> \
       --filecheckerReportDir <pathToTheFileCheckerReportDirectory> \
        <outputTtlPath> \
        <basePathOfTheCollection> \
        <idPrefix> \
        2>&1 | tee <pathToLogFile>
      e.g.
      /ARCHE/vendor/bin/arche-crawl-meta \
        /ARCHE/staging/GustavMahlerArchiv_22334/metadata \
        --filecheckerReportDir /ARCHE/staging/GustavMahlerArchiv_22334/checkReports/2024_04_08_09_19_24 \
        /ARCHE/staging/GustavMahlerArchiv_22334/scriptFiles/metadata.ttl \
        /ARCHE/staging/GustavMahlerArchiv_22334/data \
        https://id.acdh.oeaw.ac.at/GustavMahlerArchiv \
        2>&1 | tee /ARCHE/staging/GustavMahlerArchiv_22334/scriptFiles/metadata.log
      • If you are want to skip the checks (which speeds up the process significantly), add the --noCheck parameter, e.g.
        /ARCHE/vendor/bin/arche-crawl-meta \
          /ARCHE/staging/GustavMahlerArchiv_22334/metadata \
          --filecheckerReportDir /ARCHE/staging/GustavMahlerArchiv_22334/checkReports/2024_04_08_09_19_24 \
          /ARCHE/staging/GustavMahlerArchiv_22334/scriptFiles/metadata.ttl \
          /ARCHE/staging/GustavMahlerArchiv_22334/data \
          https://id.acdh.oeaw.ac.at/GustavMahlerArchiv \
          --noCheck \
          2>&1 | tee /ARCHE/staging/GustavMahlerArchiv_22334/scriptFiles/metadata.log
        
  • Create metadata templates:
    /ARCHE/vendor/bin/arche-create-metadata-template \
      <pathToDirectoryWhereTemplateShouldBeCreated> \
      all
    e.g. to create templates in the current directory
    /ARCHE/vendor/bin/arche-create-metadata-template . all

Locally

  • Generating and validaing the metadata:
    vendor/bin/arche-crawl-meta \
      --filecheckerOutput <pathTo_fileList.json_generatedBy_repo-filechecker> \
      <pathToCollectionData> \
      <pathToTargetMetadataFile>
    e.g.
    vendor/bin/arche-crawl-meta \
      metaDir \
      metadata.ttl
      `pwd`/data
      https://id.acdh.oeaw.ac.at/myCollection
  • Creating metadata templates:
    vendor/bin/arche-create-metadata-template \
      <pathToDirectoryWhereTemplateShouldBeCreated> \
      all
    e.g. to create templates in the current directory
    bin/arche-create-metadata-template . all

Remarks:

  • To get a list of all available parameters run:
    vendor/bin/arche-crawl-meta --help
    vendor/bin/arche-create-metadata-template --help

As a docker container

  • Creating metadata templates: Run a container mounting directory where templates should be created under /mnt inside the container:
    docker run \
      --rm -u `id -u`:`id -g`\
      -v <pathToDirectoryWhereTemplateShouldBeCreated:/mnt \
      acdhch/repo-file-checker createTemplate all
    e.g. to create the templates in the current directory
    docker run \
      --rm -u `id -u`:`id -g` -v `pwd`:/mnt acdhch/repo-file-checker createTemplate all