displace/ext-infer

PHP 8.3+ native, in-process LLM inference and embeddings via llama.cpp.

Maintainers

Package info

github.com/DisplaceTech/ext-infer

Language:Rust

Type:php-ext

Ext name:ext-infer

pkg:composer/displace/ext-infer

Statistics

Installs: 2

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 0

v0.1.1 2026-06-07 01:58 UTC

This package is auto-updated.

Last update: 2026-06-07 02:12:50 UTC


README

Local LLM inference for PHP, in-process.
Chat, embeddings, and reasoning models — no Python sidecar, no remote API.

CI Latest release PHP 8.3 / 8.4 / 8.5 Pre-release MIT License Documentation

What is ext-infer?

ext-infer is a PHP 8.3+ extension that loads a GGUF model and runs inference in the PHP process via llama.cpp. PHP-native semantic search, RAG pipelines, and CLI/worker inference work without shelling out to Python or hitting a remote API.

Written in Rust on top of ext-php-rs and the llama-cpp-2 bindings. The public PHP surface is fluent and role-aware — building a chat prompt looks like Prompt::system(...)->withUser(...), not a string of <|im_start|> tokens.

  • 💬 Chat completions via an immutable Prompt builder that renders through the model's embedded template — no manual <|im_start|> plumbing.
  • 🧠 Reasoning-model awareResponse::answer() and Response::reasoning() split Qwen3 / R1-style <think>…</think> output automatically.
  • 📊 EmbeddingsModel::embed() returns an Embedding with dimensions(), normalize(), cosineSimilarity() built in.
  • In-process — no subprocess fork, no IPC, no daemon. Latency is whatever the model takes to decode.
  • 🛠️ Apple Metal acceleration is opt-in (make release FEATURES=metal); CPU is the portable default.
  • 🧵 Thread-safeLlamaBackend is a Sync-guarded singleton and each call builds its own context, so ZTS PHP + parallel works by design.

Quick start

mkdir -p models
curl -L -o models/Qwen3-0.6B-Q8_0.gguf \
    https://huggingface.co/Qwen/Qwen3-0.6B-GGUF/resolve/main/Qwen3-0.6B-Q8_0.gguf
<?php
use Displace\Infer\Model;
use Displace\Infer\Prompt;

$model    = Model::load('models/Qwen3-0.6B-Q8_0.gguf');
$response = $model->chat(
    Prompt::system('You are a helpful assistant.')
        ->withUser('What is 2+2?'),
    maxTokens: 256,
    temperature: 0.0,
);

echo $response->answer(), PHP_EOL;   // "2 + 2 equals 4."
echo $response->reasoning() ?? '';    // captured <think>…</think>, if any

$model->close();
make build       # produces target/debug/libinfer.{so,dylib}
php -d extension=$PWD/target/debug/libinfer.dylib hello.php

Full walkthrough — including the interactive Symfony Console chat and pairwise-similarity embedding example — under examples/.

Documentation

infer.displace.tech hosts the full guide:

  • Getting started — install via PIE or from source, verify, troubleshoot.
  • Guide — prompts, chat, raw, embeddings, choosing a model.
  • Recipes — multi-turn chat, semantic search, RAG over markdown, worker pools.
  • Reference — full API surface, exceptions, environment variables, compatibility matrix.
  • Advanced — threading, Metal, performance tuning.

The site is built from docs/ with mdbook and deploys automatically on every push to main.

Compatibility

macOS arm64 Linux x86_64 Linux arm64 Windows
PHP 8.3
PHP 8.4
PHP 8.5

ZTS is supported by design (the code is thread-safe), enabled in composer.json, and not yet exercised in CI. Windows is intentionally out of scope for v0.1.

Roadmap

Shipped   chat completions · raw completions · embeddings · reasoning split · typed exceptions · PHPT suite · CI matrix · PIE-compatible composer.json · tag-triggered binary release workflow.

Next   first v0.1.0 release · streaming completions · KV-cache reuse via reusable Session objects · stop-string support · tool calling · continuous batching · Apple Metal default on macos-arm64.

See PLAN.md for the current planning doc and RELEASE.md for the cut-a-release flow.

License

MIT © 2026 Eric Mann / Displace Technologies