README

An embedding-similarity cache for LLM responses. When a new prompt is semantically close to one you've answered before, serve the cached answer instead of paying for — and waiting on — another model call.

Why this exists (the 2026 gap)

A normal cache keys on an exact string. LLM prompts are almost never byte-identical — "How do I reset my password?" and "I forgot my password, what now?" are the same question to a user but two different cache keys. So traditional caching gives near-zero hit rate on real LLM traffic, and teams pay full price for what is effectively the same answer thousands of times.

Python has GPTCache for exactly this. PHP, as of 2026, has nothing native — despite the PHP AI ecosystem (Prism, Neuron, Laravel AI SDK) now being mature enough that cost is the live problem. Ykachala Semantic Cache fills that hole.

How it works

Embed the incoming prompt.
Search the vector store for the nearest previously-cached prompt.
If cosine similarity ≥ your threshold (e.g. 0.95), return the stored response — no LLM call.
Otherwise call your model, then store (embedding, prompt, response) for next time.

A cheap exact-match tier runs first (hash lookup) so identical prompts never even pay for an embedding.

Install

composer require ykachala/semantic-cache

Quick start

use Ykachala\SemanticCache\SemanticCache;
use Ykachala\SemanticCache\Store\PgVectorStore;

$cache = new SemanticCache(
    embedder:  $yourEmbedder,           // any PHP closure/object that returns a vector
    store:     new PgVectorStore($pdo),
    threshold: 0.95,                    // tune for your risk tolerance
    ttl:       3600,
);

$answer = $cache->remember($prompt, function () use ($prompt, $llm) {
    // Only runs on a miss — this is the call you're trying to avoid
    return $llm->chat($prompt);
});

Inspecting hits

$result = $cache->lookup($prompt);

if ($result->hit) {
    logger()->info('semantic cache hit', [
        'similarity' => $result->similarity,   // 0.0 – 1.0
        'matched'    => $result->matchedPrompt,
        'saved'      => $result->estimatedSaving?->format(),
    ]);
}

Tiers & safety

Tier	Cost	When
Exact	hash lookup, ~0	byte-identical prompt
Semantic	1 embedding + 1 vector search	similar prompt above threshold
Miss	full LLM call	nothing close enough

Namespaces isolate caches per user/tenant/prompt-template so you never serve one user's answer to another.
Threshold tuning trades hit-rate for correctness — 0.97+ for factual lookups, lower for chit-chat. Ship with metrics so you can tune from real traffic.
Stampede protection — concurrent misses for the same prompt collapse to one call.

Pluggable stores

InMemoryStore   # tests / single process
Psr16Store      # brute-force over any PSR-16 cache, good for small sets
RedisStore      # Redis 8 vector sets
PgVectorStore   # Postgres + pgvector, production default
QdrantStore     # external vector DB at scale

Architecture

src/
├── SemanticCache.php     # remember() / lookup() / put()
├── Lookup.php            # result: hit, similarity, matchedPrompt, saving
├── Embedder/             # EmbedderInterface + adapters
├── Store/                # VectorStore interface + drivers
└── Similarity.php        # cosine / dot-product helpers

Roadmap

Core SemanticCache (remember/lookup/put) + Lookup result
Cosine similarity + exact-match tier
EmbedderInterface + adapters
In-memory + PSR-16 stores (brute force)
pgvector + Redis + Qdrant stores
Namespaces, TTL, stampede protection, hit-rate metrics

See CLAUDE.md for the full phase plan and conventions.

License

MIT

ykachala / semantic-cache

Maintainers

Package info

Statistics

Security