steelwagstaff / pressbooks-thoth
Pressbooks–Thoth Open Metadata integration
Package info
github.com/SteelWagstaff/pressbooks-thoth
Type:wordpress-plugin
pkg:composer/steelwagstaff/pressbooks-thoth
Requires
- php: ^8.3
- composer/installers: ^2.1
- thoth-pub/thoth-client-php: ^1.0
Requires (Dev)
- laravel/pint: ^1.24.0
- yoast/phpunit-polyfills: ^1.1.5
This package is auto-updated.
Last update: 2026-05-29 21:45:48 UTC
README
A WordPress multisite plugin that integrates Pressbooks with Thoth Open Metadata, enabling publishers to register books with Thoth and keep metadata in sync.
Features
- Automatic sync: Registers books with Thoth on Book Info save and keeps metadata up to date (one-way: Pressbooks → Thoth)
- Metadata mapping: Maps Pressbooks
pb_*fields to Thoth Work, Contributor, Language, Subject, and Publication models - Bulk export: Download metadata for all books under a publisher in ONIX 3.0/2.1, MARC21, BibTeX, KBART, or CSV format — files download directly to your browser
- Per-book export: Download a single work's metadata directly from the Book Info screen
- Network settings: Configure your Thoth personal access token and default imprint from the network admin
- Environment-aware: Configure custom GraphQL API endpoints for testing with dev/staging environments while using production export services
Requirements
- PHP 8.3+
- WordPress multisite 6.5+
- Pressbooks plugin (network-activated)
- A Thoth account with at least one publisher and imprint
Installation
- Clone or download this repository into your WordPress plugins directory.
- Run
composer installto install PHP dependencies. - Network-activate the plugin in WordPress.
- Go to Network Admin → Thoth → Settings, enter your personal access token, and click Refresh Publishers & Imprints.
- Select the default imprint and save.
Development
# Install dependencies composer install # Set up the WordPress test library (first time) bin/install-wp-tests.sh wordpress_test root '' localhost latest # Run tests composer test # Fix code style (PSR-12 via pint) composer fix # Check code style without fixing composer standards
Architecture
src/
Bootstrap.php Entry point — wires WordPress hooks
Settings.php Network admin UI and option accessors
Mapper.php Pure data transformation: pb_* → Thoth models
SyncService.php Create/update lifecycle against Thoth GraphQL API
ExportService.py Bulk/single-work metadata downloads via Thoth REST API
helpers.php Namespaced constants (option keys, default URLs)
templates/
network-settings.php Network admin settings page
network-export.php Network admin bulk export page
book-metabox.php Per-book Thoth metabox on Book Info screen
tests/
MapperTest.php Unit tests for Mapper (34 assertions)
Export Workflow
When generating exports:
-
Backend (PHP) —
Bootstrap::ajaxGenerateExports():- Queries Thoth GraphQL API for all works under the selected publisher
- Calls Thoth REST API
/publisher/{id}endpoint for each format - API returns binary file content (XML for ONIX, JSON for others, etc.)
- File content is base64-encoded and returned to frontend with metadata (format, filename, MIME type)
-
Frontend (JavaScript) —
network-export.php:- Receives base64-encoded file content
- Decodes each file and creates a Blob with appropriate MIME type
- Creates temporary download links and triggers browser downloads
- Displays dismissible success notice with proper WordPress styling
-
Thoth API:
- Export generation always uses production endpoint (
export.thoth.pub) regardless of GraphQL environment - This allows testing with dev/staging GraphQL data while using production export services
- No local file storage needed — exports are generated on-demand and streamed directly to user
- Export generation always uses production endpoint (
License
GPL-3.0-or-later
Or clone and install locally:
```bash
git clone https://github.com/SteelWagstaff/pressbooks-thoth.git
cd pressbooks-thoth
npm install
Configuration
Create a .env file in the project root with your Thoth credentials:
# Copy from template cp .env.example .env # Edit .env and add your bearer token: # THOTH_BEARER_TOKEN=your_actual_token_here
Environment Variables
| Variable | Default | Description |
|---|---|---|
THOTH_BEARER_TOKEN |
(none) | Bearer token for authentication (alternative to email/password) |
THOTH_API_URL |
https://api.test.thoth.pub/graphql |
Thoth GraphQL endpoint |
THOTH_EXPORT_API_URL |
https://export.test.thoth.pub/ |
Thoth Export API endpoint |
THOTH_EXPORT_SPECIFICATION |
(none) | Default export format (e.g., onix_3.0, csv, marc) |
THOTH_EMAIL |
(none) | Email for authentication (only needed if not using bearer token) |
THOTH_PASSWORD |
(none) | Password for authentication (only needed if not using bearer token) |
Usage
CLI: Create/Update a Work
node src/index.js \ --url https://iastate.pressbooks.pub/teachingpronunciation \ --imprint-id 00000000-0000-0000-0000-000000000000
Output:
Work created: 123e4567-e89b-12d3-a456-426614174000
Publication created: 456f7890-a1b2-3456-c7d8-901234567890
Language attached: eng
Contributor attached: Jane Doe (Author)
CLI: Create/Update + Export Metadata
node src/index.js \ --url https://iastate.pressbooks.pub/teachingpronunciation \ --imprint-id 00000000-0000-0000-0000-000000000000 \ --export-specification onix_3.0
Output:
Work created: 123e4567-e89b-12d3-a456-426614174000
Language attached: eng
Contributor attached: Jane Doe (Author)
--- Exported Metadata ---
<?xml version="1.0" encoding="UTF-8"?>
<ONIXmessage ...>
...ONIX XML content...
</ONIXmessage>
CLI: Override API Endpoints
node src/index.js \ --url https://example.pressbooks.pub/book \ --imprint-id <uuid> \ --api-url https://api.thoth.pub/graphql \ --export-specification csv \ --export-api-url https://export.thoth.pub/
CLI: Batch Process Multiple Books
node src/index.js \ --batch-file urls.txt \ --imprint-id <uuid> \ --token <bearer_token>
urls.txt format (one URL per line, # for comments):
# Academic titles
https://iastate.pressbooks.pub/teachingpronunciation
https://iastate.pressbooks.pub/globalhealth
# Community titles
https://pressbooks.bccampus.ca/readersresponse
Output:
Processing 3 books...
✓ Book 1 (Jane Doe): Work created (5 contributors)
✓ Book 2 (John Smith): Work updated (3 contributors)
✗ Book 3: Failed after 3 retries - Connection timeout
Summary: 2 succeeded, 1 failed
Total time: 45.2s
CLI: Discover and Sync a Pressbooks Network
node src/index.js \ --network-url https://pressbooks.bccampus.ca \ --imprint-id <uuid> \ --token <bearer_token>
Output:
Discovering books on pressbooks.bccampus.ca...
Found 47 books
Processing...
✓ Completed: 45 books synced
✗ Failed: 2 books (network timeout)
CLI: Discover Network Without Syncing
node src/index.js \ --network-url https://pressbooks.bccampus.ca \ --network-discover-only
Output:
Discovering books on pressbooks.bccampus.ca...
Found 47 available books:
1. Teaching Pronunciation - https://iastate.pressbooks.pub/teachingpronunciation
2. Global Public Health - https://iastate.pressbooks.pub/globalhealth
...
Programmatic Usage
const { convertPressbooksToThoth } = require('pressbooks-thoth'); const result = await convertPressbooksToThoth({ pressbooksUrl: 'https://iastate.pressbooks.pub/teachingpronunciation', imprintId: '00000000-0000-0000-0000-000000000000', token: process.env.THOTH_BEARER_TOKEN, exportSpecification: 'onix_3.0', }); console.log(result.workId); // Work UUID console.log(result.created); // true/false console.log(result.exportedMetadata); // ONIX XML string (or null if no export)
API Reference
CLI Flags Reference
| Flag | Type | Description |
|---|---|---|
--url |
string | Single Pressbooks book URL |
--batch-file |
string | Path to text file with URLs (one per line, # for comments) |
--network-url |
string | Pressbooks network URL (e.g., https://pressbooks.bccampus.ca) |
--network-discover-only |
boolean | List books in network without syncing |
--imprint-id |
string | Thoth imprint UUID (required for sync operations) |
--token |
string | Bearer token (alternative to THOTH_BEARER_TOKEN env var) |
--email |
string | Thoth email (alternative to token) |
--password |
string | Thoth password (required with email) |
--api-url |
string | Override Thoth GraphQL endpoint |
--export-api-url |
string | Override Thoth Export API endpoint |
--export-specification |
string | Export format (e.g., onix_3.0, csv, marc) |
convertPressbooksToThoth(options)
Main function that orchestrates the full workflow.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
pressbooksUrl |
string | ✓ | Full URL to Pressbooks book (e.g., https://iastate.pressbooks.pub/teachingpronunciation) |
imprintId |
string | ✓ | Thoth imprint UUID |
token |
string | - | Bearer token (alternative to email/password) |
email |
string | - | Thoth account email (alternative to token) |
password |
string | - | Thoth account password (required with email) |
apiUrl |
string | - | Override Thoth GraphQL endpoint |
exportSpecification |
string | - | Export format (e.g., onix_3.0, csv, marc) |
exportApiUrl |
string | - | Override Thoth Export API endpoint |
fetch |
Function | - | Custom fetch implementation (for testing) |
Returns:
{ workId: '123e4567-e89b-12d3-a456-426614174000', // Thoth work UUID created: true, // true if new work, false if updated publicationId: '456f7890-a1b2-3456-c7d8-901234567890', // Publication UUID (if ISBN present) exportedMetadata: '<?xml version="1.0"?>...' // Exported metadata (or null) }
batchConvert(options)
Batch process multiple Pressbooks URLs with retry logic and progress tracking.
Parameters:
{ urls: string[], // Array of Pressbooks URLs imprintId: string, // Thoth imprint UUID token: string, // Bearer token maxRetries: number, // Optional (default: 3) initialBackoffMs: number, // Optional (default: 1000) onProgress: function // Optional callback: (progress) => {} }
Returns:
{ progress: ProgressTracker, // State tracker with ETA items: Array, // Results for each URL: // [{ url, succeeded, result/error, retries }] summary: { // Summary stats total: number, succeeded: number, failed: number, totalTimeMs: number }, formatted: string // Human-readable report }
NetworkClient
Discover and list books from Pressbooks networks.
Constructor:
const client = new NetworkClient('https://pressbooks.bccampus.ca');
Methods:
async discoverAll()— Auto-paginate and return all booksasync listBooks(page, perPage)— Get paginated book listasync listBooksPaginated()— Paginated list with metadata (hasNext, totals)async getBook(slug)— Get single book by slugstatic extractUrls(books)— Extract URLs from book objects
ThothClient (GraphQL)
Authenticated client for Thoth GraphQL API.
Methods:
login(email, password)— Authenticate and cache session tokenupsertWork(workInput)— Create or update a workfindWorkByDoi(doi)— Look up work by DOIfindWorkByReference(reference)— Look up work by reference IDcreateLanguage(workId, languageCode, languageRelation, mainLanguage)— Attach language to workcreateContributor(params)— Create a contributor (author, editor, etc.)createContribution(workId, contributorId, contribution)— Link contributor to work
ExportClient (REST)
Client for Thoth Export API.
Methods:
listFormats()— List available export formatsgetFormat(formatId)— Get details of a specific formatlistSpecifications()— List export specificationsgetSpecification(specificationId)— Get specification detailsexportWork(specificationId, workId)— Export work metadataexportPublisher(specificationId, publisherId)— Export publisher cataloglistPlatforms()— List distribution platformsgetPlatform(platformId)— Get platform details
How It Works
1. Fetch Pressbooks Metadata
└─> GET /wp-json/pressbooks/v2/metadata
Returns: schema.org JSON-LD
2. Map to Thoth Structures
└─> Convert Pressbooks fields to Thoth WorkInput
Extract: titles, DOI, language, contributors, etc.
3. Authenticate with Thoth
└─> Bearer token (from .env or --token flag)
OR email/password login
4. Upsert Work to Thoth
└─> Check if exists (by DOI or reference)
Create or update accordingly
5. Attach Metadata
└─> Add language (on new works)
Add contributors/contributions (on new works)
6. Export (Optional)
└─> If --export-specification provided
Request work metadata in specified format
Metadata Mapping
See MAPPING.md for detailed field-by-field mapping between Pressbooks and Thoth.
Currently Mapped:
- ✅ Title, subtitle, full title
- ✅ DOI
- ✅ Publication date
- ✅ Copyright year
- ✅ Language (ISO 639-3)
- ✅ License
- ✅ Cover image
- ✅ Abstract (short & long)
- ✅ Authors and editors
- ✅ Translators, illustrators, reviewers
- ✅ Copyright holder
- ✅ Keywords / subjects (extracted, ready for future mapping)
Not Yet Mapped:
- Subjects (keywords extracted, waiting for Thoth Subject entity support)
- Page count, edition
Testing
Run the full test suite:
npm test
Current test coverage:
- 137 core tests covering mapper, language, license, HTML utilities, CLI
- 13 export API tests covering format discovery and export methods
- 28 retry logic tests covering exponential backoff and error classification
- 27 progress tracking tests covering ETA calculation and reporting
- 11 batch processing tests covering multi-book orchestration
- 21 network discovery tests covering book listing and URL extraction
- 5 integration tests covering end-to-end publication workflows
- Total: 233 tests, all passing ✅
Development
Project Structure
src/
├── index.js # CLI entry point
├── converter.js # Main orchestration logic
├── mapper.js # Pressbooks → Thoth mapping
├── thothClient.js # GraphQL client (authenticated)
├── exportClient.js # Export API client (REST)
├── batch.js # Batch processing orchestrator
├── network.js # Pressbooks network discovery
├── retry.js # Retry logic with exponential backoff
├── progress.js # Progress tracking with ETA
├── language.js # Language code utilities
├── license.js # License mapping
└── htmlUtils.js # HTML parsing utilities
tests/
├── cli.integration.test.js
├── converter.integration.test.js
├── mapper.test.js
├── thothClient.test.js
├── exportClient.test.js
├── batch.test.js
├── network.test.js
├── retry.test.js
├── progress.test.js
├── language.test.js
├── license.test.js
└── htmlUtils.test.js
.env.example # Environment variable template
MAPPING.md # Detailed field mapping reference
Adding a New Field
- Map in
mapper.js: Extract from Pressbooks JSON-LD - Add to
WorkInput: Include in GraphQL mutation - Test in
mapper.test.js: Verify extraction and mapping - Update
MAPPING.md: Document the mapping - Run tests:
npm test
Debugging
Enable verbose output:
DEBUG=* node src/index.js --url ... --imprint-id ...
Check GraphQL requests/responses:
const client = new ThothClient({ apiUrl, token }); // Inspect this._token before/after login // Check GraphQL query structure in _request()
Common Issues
Error: "Thoth GraphQL error: Invalid token"
- Bearer token is invalid or expired
- Check that
THOTH_BEARER_TOKENis set correctly in.env
Error: "Cannot find imprint"
- Imprint UUID doesn't exist in your Thoth instance
- Verify UUID format and that it's in the correct Thoth instance
Export returns empty
- Work exists but hasn't been fully indexed for export yet
- Wait a moment and retry
- Check that the specification is supported (see
exportClient.listSpecifications())
Roadmap
- Subject mapping (waiting for Thoth Subject entity support)
- Page count and edition mapping
- Metadata validation and conflict resolution
- Sync state persistence
License
ISC
Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Run
npm testto verify - Submit a pull request