marko / docs-vec
Hybrid FTS5 + sqlite-vec semantic search driver for Marko documentation
Requires
- php: ^8.5
- ext-pdo_sqlite: *
- marko/cli: 0.8.0
- marko/core: 0.8.0
- marko/docs: 0.8.0
- marko/docs-markdown: 0.8.0
Requires (Dev)
- pestphp/pest: ^4.0
Suggests
- codewithkyrian/transformers-php: Required for query-time embeddings (^0.5)
This package is auto-updated.
Last update: 2026-06-03 14:28:38 UTC
README
Hybrid FTS5 + sqlite-vec semantic documentation search driver for Marko — combines keyword and vector search for best-in-class relevance.
Overview
marko/docs-vec implements DocsSearchInterface using both SQLite FTS5 (keyword) and sqlite-vec (vector embeddings) with ONNX Runtime for local inference via codewithkyrian/transformers-php. Results are ranked by a weighted combination of BM25 keyword score and cosine similarity, giving accurate answers even when the query wording differs from the documentation. When the model is not downloaded (or on a platform without ONNX support), it falls back to FTS5-only keyword search using its own built-in index. Use marko/docs-fts instead if you only want lightweight keyword search.
Installation
composer require marko/docs-vec
For query-time embeddings, also install the ONNX runtime:
composer require codewithkyrian/transformers-php
ONNX model
This package uses the bge-small-en-v1.5 model (~130MB across model.onnx, tokenizer.json, config.json) for semantic embeddings. The model is not committed to the repository — it is downloaded on demand and verified by SHA-256.
Downloading the model
marko docs-vec:download-model
Files are written into the package at resources/models/bge-small-en-v1.5/ (gitignored). The download is pinned to a specific HuggingFace commit and each file's SHA-256 is verified. Behind a firewall or using a mirror? Pass --base-url=<your-mirror>:
marko docs-vec:download-model --base-url=https://my-mirror.example.com/bge-small-en-v1.5
marko docs-vec:build fails loudly with a pointer to this command if the model is missing.
Why not bundled?
The model is ~130MB — too large to commit to a Composer package. If you only need keyword search (no semantic/vector ranking), use the lighter marko/docs-fts driver instead, which needs no model.
Platform support
The ONNX runtime supports Linux (x64, ARM64), macOS (x64, ARM64), and Windows (x64). On unsupported platforms, or when the model has not been downloaded, docs-vec falls back to FTS5-only keyword search (no semantic ranking) using its own built-in index — it does not depend on the marko/docs-fts package.
Usage
After installing and downloading the model, module.php binds DocsSearchInterface to VecSearch automatically. Build the hybrid index, then search:
marko docs-vec:build
Documentation
Full configuration, ranking details, and the docs-driver comparison: marko/docs-vec