steelwagstaff/pressbooks-thoth

Pressbooks–Thoth Open Metadata integration

Maintainers

Package info

github.com/SteelWagstaff/pressbooks-thoth

Homepage

Type:wordpress-plugin

pkg:composer/steelwagstaff/pressbooks-thoth

Statistics

Installs: 5

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 1

dev-dev 2026-05-29 21:42 UTC

This package is auto-updated.

Last update: 2026-05-29 21:45:48 UTC


README

A WordPress multisite plugin that integrates Pressbooks with Thoth Open Metadata, enabling publishers to register books with Thoth and keep metadata in sync.

Features

  • Automatic sync: Registers books with Thoth on Book Info save and keeps metadata up to date (one-way: Pressbooks → Thoth)
  • Metadata mapping: Maps Pressbooks pb_* fields to Thoth Work, Contributor, Language, Subject, and Publication models
  • Bulk export: Download metadata for all books under a publisher in ONIX 3.0/2.1, MARC21, BibTeX, KBART, or CSV format — files download directly to your browser
  • Per-book export: Download a single work's metadata directly from the Book Info screen
  • Network settings: Configure your Thoth personal access token and default imprint from the network admin
  • Environment-aware: Configure custom GraphQL API endpoints for testing with dev/staging environments while using production export services

Requirements

  • PHP 8.3+
  • WordPress multisite 6.5+
  • Pressbooks plugin (network-activated)
  • A Thoth account with at least one publisher and imprint

Installation

  1. Clone or download this repository into your WordPress plugins directory.
  2. Run composer install to install PHP dependencies.
  3. Network-activate the plugin in WordPress.
  4. Go to Network Admin → Thoth → Settings, enter your personal access token, and click Refresh Publishers & Imprints.
  5. Select the default imprint and save.

Development

# Install dependencies
composer install

# Set up the WordPress test library (first time)
bin/install-wp-tests.sh wordpress_test root '' localhost latest

# Run tests
composer test

# Fix code style (PSR-12 via pint)
composer fix

# Check code style without fixing
composer standards

Architecture

src/
  Bootstrap.php      Entry point — wires WordPress hooks
  Settings.php       Network admin UI and option accessors
  Mapper.php         Pure data transformation: pb_* → Thoth models
  SyncService.php    Create/update lifecycle against Thoth GraphQL API
  ExportService.py   Bulk/single-work metadata downloads via Thoth REST API
  helpers.php        Namespaced constants (option keys, default URLs)
templates/
  network-settings.php  Network admin settings page
  network-export.php    Network admin bulk export page
  book-metabox.php      Per-book Thoth metabox on Book Info screen
tests/
  MapperTest.php     Unit tests for Mapper (34 assertions)

Export Workflow

When generating exports:

  1. Backend (PHP)Bootstrap::ajaxGenerateExports():

    • Queries Thoth GraphQL API for all works under the selected publisher
    • Calls Thoth REST API /publisher/{id} endpoint for each format
    • API returns binary file content (XML for ONIX, JSON for others, etc.)
    • File content is base64-encoded and returned to frontend with metadata (format, filename, MIME type)
  2. Frontend (JavaScript)network-export.php:

    • Receives base64-encoded file content
    • Decodes each file and creates a Blob with appropriate MIME type
    • Creates temporary download links and triggers browser downloads
    • Displays dismissible success notice with proper WordPress styling
  3. Thoth API:

    • Export generation always uses production endpoint (export.thoth.pub) regardless of GraphQL environment
    • This allows testing with dev/staging GraphQL data while using production export services
    • No local file storage needed — exports are generated on-demand and streamed directly to user

License

GPL-3.0-or-later


Or clone and install locally:

```bash
git clone https://github.com/SteelWagstaff/pressbooks-thoth.git
cd pressbooks-thoth
npm install

Configuration

Create a .env file in the project root with your Thoth credentials:

# Copy from template
cp .env.example .env

# Edit .env and add your bearer token:
# THOTH_BEARER_TOKEN=your_actual_token_here

Environment Variables

Variable Default Description
THOTH_BEARER_TOKEN (none) Bearer token for authentication (alternative to email/password)
THOTH_API_URL https://api.test.thoth.pub/graphql Thoth GraphQL endpoint
THOTH_EXPORT_API_URL https://export.test.thoth.pub/ Thoth Export API endpoint
THOTH_EXPORT_SPECIFICATION (none) Default export format (e.g., onix_3.0, csv, marc)
THOTH_EMAIL (none) Email for authentication (only needed if not using bearer token)
THOTH_PASSWORD (none) Password for authentication (only needed if not using bearer token)

Usage

CLI: Create/Update a Work

node src/index.js \
  --url https://iastate.pressbooks.pub/teachingpronunciation \
  --imprint-id 00000000-0000-0000-0000-000000000000

Output:

Work created: 123e4567-e89b-12d3-a456-426614174000
Publication created: 456f7890-a1b2-3456-c7d8-901234567890
Language attached: eng
Contributor attached: Jane Doe (Author)

CLI: Create/Update + Export Metadata

node src/index.js \
  --url https://iastate.pressbooks.pub/teachingpronunciation \
  --imprint-id 00000000-0000-0000-0000-000000000000 \
  --export-specification onix_3.0

Output:

Work created: 123e4567-e89b-12d3-a456-426614174000
Language attached: eng
Contributor attached: Jane Doe (Author)

--- Exported Metadata ---

<?xml version="1.0" encoding="UTF-8"?>
<ONIXmessage ...>
  ...ONIX XML content...
</ONIXmessage>

CLI: Override API Endpoints

node src/index.js \
  --url https://example.pressbooks.pub/book \
  --imprint-id <uuid> \
  --api-url https://api.thoth.pub/graphql \
  --export-specification csv \
  --export-api-url https://export.thoth.pub/

CLI: Batch Process Multiple Books

node src/index.js \
  --batch-file urls.txt \
  --imprint-id <uuid> \
  --token <bearer_token>

urls.txt format (one URL per line, # for comments):

# Academic titles
https://iastate.pressbooks.pub/teachingpronunciation
https://iastate.pressbooks.pub/globalhealth

# Community titles
https://pressbooks.bccampus.ca/readersresponse

Output:

Processing 3 books...
✓ Book 1 (Jane Doe): Work created (5 contributors)
✓ Book 2 (John Smith): Work updated (3 contributors)
✗ Book 3: Failed after 3 retries - Connection timeout

Summary: 2 succeeded, 1 failed
Total time: 45.2s

CLI: Discover and Sync a Pressbooks Network

node src/index.js \
  --network-url https://pressbooks.bccampus.ca \
  --imprint-id <uuid> \
  --token <bearer_token>

Output:

Discovering books on pressbooks.bccampus.ca...
Found 47 books
Processing...
✓ Completed: 45 books synced
✗ Failed: 2 books (network timeout)

CLI: Discover Network Without Syncing

node src/index.js \
  --network-url https://pressbooks.bccampus.ca \
  --network-discover-only

Output:

Discovering books on pressbooks.bccampus.ca...
Found 47 available books:
  1. Teaching Pronunciation - https://iastate.pressbooks.pub/teachingpronunciation
  2. Global Public Health - https://iastate.pressbooks.pub/globalhealth
  ...

Programmatic Usage

const { convertPressbooksToThoth } = require('pressbooks-thoth');

const result = await convertPressbooksToThoth({
  pressbooksUrl: 'https://iastate.pressbooks.pub/teachingpronunciation',
  imprintId: '00000000-0000-0000-0000-000000000000',
  token: process.env.THOTH_BEARER_TOKEN,
  exportSpecification: 'onix_3.0',
});

console.log(result.workId);           // Work UUID
console.log(result.created);          // true/false
console.log(result.exportedMetadata); // ONIX XML string (or null if no export)

API Reference

CLI Flags Reference

Flag Type Description
--url string Single Pressbooks book URL
--batch-file string Path to text file with URLs (one per line, # for comments)
--network-url string Pressbooks network URL (e.g., https://pressbooks.bccampus.ca)
--network-discover-only boolean List books in network without syncing
--imprint-id string Thoth imprint UUID (required for sync operations)
--token string Bearer token (alternative to THOTH_BEARER_TOKEN env var)
--email string Thoth email (alternative to token)
--password string Thoth password (required with email)
--api-url string Override Thoth GraphQL endpoint
--export-api-url string Override Thoth Export API endpoint
--export-specification string Export format (e.g., onix_3.0, csv, marc)

convertPressbooksToThoth(options)

Main function that orchestrates the full workflow.

Parameters:

Parameter Type Required Description
pressbooksUrl string Full URL to Pressbooks book (e.g., https://iastate.pressbooks.pub/teachingpronunciation)
imprintId string Thoth imprint UUID
token string - Bearer token (alternative to email/password)
email string - Thoth account email (alternative to token)
password string - Thoth account password (required with email)
apiUrl string - Override Thoth GraphQL endpoint
exportSpecification string - Export format (e.g., onix_3.0, csv, marc)
exportApiUrl string - Override Thoth Export API endpoint
fetch Function - Custom fetch implementation (for testing)

Returns:

{
  workId: '123e4567-e89b-12d3-a456-426614174000',     // Thoth work UUID
  created: true,                                      // true if new work, false if updated
  publicationId: '456f7890-a1b2-3456-c7d8-901234567890', // Publication UUID (if ISBN present)
  exportedMetadata: '<?xml version="1.0"?>...'       // Exported metadata (or null)
}

batchConvert(options)

Batch process multiple Pressbooks URLs with retry logic and progress tracking.

Parameters:

{
  urls: string[],           // Array of Pressbooks URLs
  imprintId: string,        // Thoth imprint UUID
  token: string,            // Bearer token
  maxRetries: number,       // Optional (default: 3)
  initialBackoffMs: number, // Optional (default: 1000)
  onProgress: function      // Optional callback: (progress) => {}
}

Returns:

{
  progress: ProgressTracker,        // State tracker with ETA
  items: Array,                     // Results for each URL:
                                    // [{ url, succeeded, result/error, retries }]
  summary: {                        // Summary stats
    total: number,
    succeeded: number,
    failed: number,
    totalTimeMs: number
  },
  formatted: string                 // Human-readable report
}

NetworkClient

Discover and list books from Pressbooks networks.

Constructor:

const client = new NetworkClient('https://pressbooks.bccampus.ca');

Methods:

  • async discoverAll() — Auto-paginate and return all books
  • async listBooks(page, perPage) — Get paginated book list
  • async listBooksPaginated() — Paginated list with metadata (hasNext, totals)
  • async getBook(slug) — Get single book by slug
  • static extractUrls(books) — Extract URLs from book objects

ThothClient (GraphQL)

Authenticated client for Thoth GraphQL API.

Methods:

  • login(email, password) — Authenticate and cache session token
  • upsertWork(workInput) — Create or update a work
  • findWorkByDoi(doi) — Look up work by DOI
  • findWorkByReference(reference) — Look up work by reference ID
  • createLanguage(workId, languageCode, languageRelation, mainLanguage) — Attach language to work
  • createContributor(params) — Create a contributor (author, editor, etc.)
  • createContribution(workId, contributorId, contribution) — Link contributor to work

ExportClient (REST)

Client for Thoth Export API.

Methods:

  • listFormats() — List available export formats
  • getFormat(formatId) — Get details of a specific format
  • listSpecifications() — List export specifications
  • getSpecification(specificationId) — Get specification details
  • exportWork(specificationId, workId) — Export work metadata
  • exportPublisher(specificationId, publisherId) — Export publisher catalog
  • listPlatforms() — List distribution platforms
  • getPlatform(platformId) — Get platform details

How It Works

1. Fetch Pressbooks Metadata
   └─> GET /wp-json/pressbooks/v2/metadata
       Returns: schema.org JSON-LD

2. Map to Thoth Structures
   └─> Convert Pressbooks fields to Thoth WorkInput
       Extract: titles, DOI, language, contributors, etc.

3. Authenticate with Thoth
   └─> Bearer token (from .env or --token flag)
       OR email/password login

4. Upsert Work to Thoth
   └─> Check if exists (by DOI or reference)
       Create or update accordingly

5. Attach Metadata
   └─> Add language (on new works)
       Add contributors/contributions (on new works)

6. Export (Optional)
   └─> If --export-specification provided
       Request work metadata in specified format

Metadata Mapping

See MAPPING.md for detailed field-by-field mapping between Pressbooks and Thoth.

Currently Mapped:

  • ✅ Title, subtitle, full title
  • ✅ DOI
  • ✅ Publication date
  • ✅ Copyright year
  • ✅ Language (ISO 639-3)
  • ✅ License
  • ✅ Cover image
  • ✅ Abstract (short & long)
  • ✅ Authors and editors
  • ✅ Translators, illustrators, reviewers
  • ✅ Copyright holder
  • ✅ Keywords / subjects (extracted, ready for future mapping)

Not Yet Mapped:

  • Subjects (keywords extracted, waiting for Thoth Subject entity support)
  • Page count, edition

Testing

Run the full test suite:

npm test

Current test coverage:

  • 137 core tests covering mapper, language, license, HTML utilities, CLI
  • 13 export API tests covering format discovery and export methods
  • 28 retry logic tests covering exponential backoff and error classification
  • 27 progress tracking tests covering ETA calculation and reporting
  • 11 batch processing tests covering multi-book orchestration
  • 21 network discovery tests covering book listing and URL extraction
  • 5 integration tests covering end-to-end publication workflows
  • Total: 233 tests, all passing ✅

Development

Project Structure

src/
  ├── index.js              # CLI entry point
  ├── converter.js          # Main orchestration logic
  ├── mapper.js             # Pressbooks → Thoth mapping
  ├── thothClient.js        # GraphQL client (authenticated)
  ├── exportClient.js       # Export API client (REST)
  ├── batch.js              # Batch processing orchestrator
  ├── network.js            # Pressbooks network discovery
  ├── retry.js              # Retry logic with exponential backoff
  ├── progress.js           # Progress tracking with ETA
  ├── language.js           # Language code utilities
  ├── license.js            # License mapping
  └── htmlUtils.js          # HTML parsing utilities

tests/
  ├── cli.integration.test.js
  ├── converter.integration.test.js
  ├── mapper.test.js
  ├── thothClient.test.js
  ├── exportClient.test.js
  ├── batch.test.js
  ├── network.test.js
  ├── retry.test.js
  ├── progress.test.js
  ├── language.test.js
  ├── license.test.js
  └── htmlUtils.test.js

.env.example               # Environment variable template
MAPPING.md                 # Detailed field mapping reference

Adding a New Field

  1. Map in mapper.js: Extract from Pressbooks JSON-LD
  2. Add to WorkInput: Include in GraphQL mutation
  3. Test in mapper.test.js: Verify extraction and mapping
  4. Update MAPPING.md: Document the mapping
  5. Run tests: npm test

Debugging

Enable verbose output:

DEBUG=* node src/index.js --url ... --imprint-id ...

Check GraphQL requests/responses:

const client = new ThothClient({ apiUrl, token });
// Inspect this._token before/after login
// Check GraphQL query structure in _request()

Common Issues

Error: "Thoth GraphQL error: Invalid token"

  • Bearer token is invalid or expired
  • Check that THOTH_BEARER_TOKEN is set correctly in .env

Error: "Cannot find imprint"

  • Imprint UUID doesn't exist in your Thoth instance
  • Verify UUID format and that it's in the correct Thoth instance

Export returns empty

  • Work exists but hasn't been fully indexed for export yet
  • Wait a moment and retry
  • Check that the specification is supported (see exportClient.listSpecifications())

Roadmap

  • Subject mapping (waiting for Thoth Subject entity support)
  • Page count and edition mapping
  • Metadata validation and conflict resolution
  • Sync state persistence

License

ISC

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Run npm test to verify
  5. Submit a pull request