displace / ext-infer
PHP 8.3+ native, in-process LLM inference and embeddings via llama.cpp.
Package info
github.com/DisplaceTech/ext-infer
Language:Rust
Type:php-ext
Ext name:ext-infer
pkg:composer/displace/ext-infer
Requires
- php: ^8.3
This package is auto-updated.
Last update: 2026-06-07 02:12:50 UTC
README
Local LLM inference for PHP, in-process.
Chat, embeddings, and reasoning models — no Python sidecar, no remote API.
What is ext-infer?
ext-infer is a PHP 8.3+ extension that loads a GGUF model and runs
inference in the PHP process via llama.cpp.
PHP-native semantic search, RAG pipelines, and CLI/worker inference work
without shelling out to Python or hitting a remote API.
Written in Rust on top of ext-php-rs
and the llama-cpp-2 bindings. The
public PHP surface is fluent and role-aware — building a chat prompt looks
like Prompt::system(...)->withUser(...), not a string of <|im_start|>
tokens.
- 💬 Chat completions via an immutable
Promptbuilder that renders through the model's embedded template — no manual<|im_start|>plumbing. - 🧠 Reasoning-model aware —
Response::answer()andResponse::reasoning()split Qwen3 / R1-style<think>…</think>output automatically. - 📊 Embeddings —
Model::embed()returns anEmbeddingwithdimensions(),normalize(),cosineSimilarity()built in. - ⚡ In-process — no subprocess fork, no IPC, no daemon. Latency is whatever the model takes to decode.
- 🛠️ Apple Metal acceleration is opt-in (
make release FEATURES=metal); CPU is the portable default. - 🧵 Thread-safe —
LlamaBackendis aSync-guarded singleton and each call builds its own context, so ZTS PHP +parallelworks by design.
Quick start
mkdir -p models
curl -L -o models/Qwen3-0.6B-Q8_0.gguf \
https://huggingface.co/Qwen/Qwen3-0.6B-GGUF/resolve/main/Qwen3-0.6B-Q8_0.gguf
<?php use Displace\Infer\Model; use Displace\Infer\Prompt; $model = Model::load('models/Qwen3-0.6B-Q8_0.gguf'); $response = $model->chat( Prompt::system('You are a helpful assistant.') ->withUser('What is 2+2?'), maxTokens: 256, temperature: 0.0, ); echo $response->answer(), PHP_EOL; // "2 + 2 equals 4." echo $response->reasoning() ?? ''; // captured <think>…</think>, if any $model->close();
make build # produces target/debug/libinfer.{so,dylib} php -d extension=$PWD/target/debug/libinfer.dylib hello.php
Full walkthrough — including the interactive Symfony Console
chat and pairwise-similarity embedding
example — under examples/.
Documentation
infer.displace.tech hosts the full guide:
- Getting started — install via PIE or from source, verify, troubleshoot.
- Guide — prompts, chat, raw, embeddings, choosing a model.
- Recipes — multi-turn chat, semantic search, RAG over markdown, worker pools.
- Reference — full API surface, exceptions, environment variables, compatibility matrix.
- Advanced — threading, Metal, performance tuning.
The site is built from docs/ with mdbook
and deploys automatically on every push to main.
Compatibility
| macOS arm64 | Linux x86_64 | Linux arm64 | Windows | |
|---|---|---|---|---|
| PHP 8.3 | ✅ | ✅ | ✅ | — |
| PHP 8.4 | ✅ | ✅ | ✅ | — |
| PHP 8.5 | ✅ | ✅ | ✅ | — |
ZTS is supported by design (the code is thread-safe), enabled in
composer.json, and not yet exercised in CI. Windows is intentionally
out of scope for v0.1.
Roadmap
Shipped chat completions · raw completions · embeddings · reasoning split · typed exceptions · PHPT suite · CI matrix · PIE-compatible composer.json · tag-triggered binary release workflow.
Next first v0.1.0 release · streaming completions · KV-cache reuse via reusable Session objects · stop-string support · tool calling · continuous batching · Apple Metal default on macos-arm64.
See PLAN.md for the current planning doc and RELEASE.md
for the cut-a-release flow.
License
MIT © 2026 Eric Mann / Displace Technologies