ykachala/semantic-cache

Embedding-similarity response cache for LLM calls. Serve a cached answer when a new prompt is semantically close to a previous one — cutting cost and latency.

Maintainers

Package info

github.com/ykachala/semantic-cache

pkg:composer/ykachala/semantic-cache

Statistics

Installs: 0

Dependents: 0

Suggesters: 1

Stars: 0

Open Issues: 0

dev-main 2026-06-01 23:48 UTC

This package is auto-updated.

Last update: 2026-06-02 09:07:06 UTC


README

An embedding-similarity cache for LLM responses. When a new prompt is semantically close to one you've answered before, serve the cached answer instead of paying for — and waiting on — another model call.

PHP Version License

Why this exists (the 2026 gap)

A normal cache keys on an exact string. LLM prompts are almost never byte-identical — "How do I reset my password?" and "I forgot my password, what now?" are the same question to a user but two different cache keys. So traditional caching gives near-zero hit rate on real LLM traffic, and teams pay full price for what is effectively the same answer thousands of times.

Python has GPTCache for exactly this. PHP, as of 2026, has nothing native — despite the PHP AI ecosystem (Prism, Neuron, Laravel AI SDK) now being mature enough that cost is the live problem. Ykachala Semantic Cache fills that hole.

How it works

  1. Embed the incoming prompt.
  2. Search the vector store for the nearest previously-cached prompt.
  3. If cosine similarity ≥ your threshold (e.g. 0.95), return the stored response — no LLM call.
  4. Otherwise call your model, then store (embedding, prompt, response) for next time.

A cheap exact-match tier runs first (hash lookup) so identical prompts never even pay for an embedding.

Install

composer require ykachala/semantic-cache

Quick start

use Ykachala\SemanticCache\SemanticCache;
use Ykachala\SemanticCache\Store\PgVectorStore;

$cache = new SemanticCache(
    embedder:  $yourEmbedder,           // any PHP closure/object that returns a vector
    store:     new PgVectorStore($pdo),
    threshold: 0.95,                    // tune for your risk tolerance
    ttl:       3600,
);

$answer = $cache->remember($prompt, function () use ($prompt, $llm) {
    // Only runs on a miss — this is the call you're trying to avoid
    return $llm->chat($prompt);
});

Inspecting hits

$result = $cache->lookup($prompt);

if ($result->hit) {
    logger()->info('semantic cache hit', [
        'similarity' => $result->similarity,   // 0.0 – 1.0
        'matched'    => $result->matchedPrompt,
        'saved'      => $result->estimatedSaving?->format(),
    ]);
}

Tiers & safety

Tier Cost When
Exact hash lookup, ~0 byte-identical prompt
Semantic 1 embedding + 1 vector search similar prompt above threshold
Miss full LLM call nothing close enough
  • Namespaces isolate caches per user/tenant/prompt-template so you never serve one user's answer to another.
  • Threshold tuning trades hit-rate for correctness — 0.97+ for factual lookups, lower for chit-chat. Ship with metrics so you can tune from real traffic.
  • Stampede protection — concurrent misses for the same prompt collapse to one call.

Pluggable stores

InMemoryStore   # tests / single process
Psr16Store      # brute-force over any PSR-16 cache, good for small sets
RedisStore      # Redis 8 vector sets
PgVectorStore   # Postgres + pgvector, production default
QdrantStore     # external vector DB at scale

Architecture

src/
├── SemanticCache.php     # remember() / lookup() / put()
├── Lookup.php            # result: hit, similarity, matchedPrompt, saving
├── Embedder/             # EmbedderInterface + adapters
├── Store/                # VectorStore interface + drivers
└── Similarity.php        # cosine / dot-product helpers

Roadmap

  • Core SemanticCache (remember/lookup/put) + Lookup result
  • Cosine similarity + exact-match tier
  • EmbedderInterface + adapters
  • In-memory + PSR-16 stores (brute force)
  • pgvector + Redis + Qdrant stores
  • Namespaces, TTL, stampede protection, hit-rate metrics

See CLAUDE.md for the full phase plan and conventions.

License

MIT