survos / folio-bundle
Portable SQLite folios for normalized museum dataset rows.
Fund package maintenance!
Requires
- php: ^8.4
- ext-pdo_sqlite: *
- ext-sqlite3: *
- doctrine/dbal: ^4.4
- doctrine/doctrine-bundle: ^2.14|^3.0
- doctrine/orm: ^3.4
- survos/data-contracts: ^2.5
- survos/dataset-bundle: ^2.5
- survos/field-bundle: ^2.5
- survos/jsonl-bundle: ^2.5
- symfony/ai-agent: ^0.9
- symfony/config: ^8.1
- symfony/console: ^8.1
- symfony/dependency-injection: ^8.1
- symfony/framework-bundle: ^8.1
- symfony/http-kernel: ^8.1
- symfony/routing: ^8.1
- voku/stop-words: ^2.0
Requires (Dev)
- api-platform/core: ^4.3
- phpstan/phpstan: ^2.1
- survos/iiif-bundle: ^2.5
- survos/imgproxy-bundle: ^2.5
- survos/tabler-bundle: ^2.5
- symfony/ux-chartjs: ^3.0
Suggests
- survos/import-bundle: Produces normalized JSONL files before ingesting them into folios.
README
Folio stores normalized/enriched dataset JSONL as portable SQLite archive files. It is the database, archive, and browsing layer for data that has already been normalized by dataset/import tooling.
harvest and md produce normalized JSONL. folio:ingest turns that JSONL into a standalone folio SQLite file. Consumers such as zm can use the Symfony/DataContracts stack when present, while Python/R/SQLite users can query the archive directly.
Required: survos/field-bundle, survos/data-contracts.
Suggested for ingest/write workflows: survos/jsonl-bundle, survos/import-bundle.
See docs/configuration.md for the required multi-connection Doctrine setup.
See docs/archive-metadata.md for the standalone archive metadata contract.
See docs/presentation-layer.md for the proposal to use folios as narrative institutional presentation packages.
Archive Contract
A folio file stores canonical rows in item and self-describing metadata alongside them:
schema_tableandschema_propertydescribe observed DTO types and fields in this archive.schema_property.statsstores field profile output fromsurvos/jsonl-bundle's profiler.docsstores generated JSON/Markdown documentation for humans, report writers, and AI agents.- generated
dto_*SQLite views project JSON fields into query-friendly columns. term_setandtermstore standalone controlled vocabularies and facets.
The metadata snapshot describes actual observed data, not the entire DTO contract universe. DTO classes from survos/data-contracts annotate observed fields with labels/descriptions when available, but consumers do not need PHP code to understand an archived folio.
Search and Publication Notes
folio:ingestloads rows, snapshots observed schema/docs/views, and rebuilds the SQLite FTS5 tableitem_fts.- Existing folios can rebuild search with
bin/console folio:fts:rebuild <provider/dataset> --query="search terms". folio:archiverefreshes archive metadata before packaging.- FTS tables are derived data. Published archive files may drop
item_fts,VACUUM, compress, ship, then rebuild FTS on the consuming side. - SQLite views and docs are also derived from persisted metadata, but they are intentionally lightweight and useful for standalone consumers.
- Vector search is intentionally deferred. When added, start with a hybrid SQLite design: FTS5/BM25 for exact keyword strength, sqlite-vec for semantic retrieval, and Reciprocal Rank Fusion to merge ranks without normalizing incompatible score scales. Reference: https://ceaksan.com/en/hybrid-search-fts5-vector-rrf
Direct SQLite Examples
select * from schema_table where kind = 'dto'; select * from schema_property where table_id = ? order by position; select local_id, label, dto_type, dto_data, extras from item limit 20; select * from dto_document limit 20; select id, type, audience, body from docs order by position;
TODO
- Add fieldSet support to the api-grid spreadsheet view to avoid displaying every DTO field at once.
- Rebuild views/docs on restore, not only FTS, if the archive was packaged without them.