ykachala/guardrails

Input/output safety layer for PHP LLM apps: PII redaction, prompt-injection detection, secret scanning, and content moderation as a composable pipeline.

Maintainers

Package info

github.com/ykachala/guardrails

pkg:composer/ykachala/guardrails

Statistics

Installs: 0

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 0

dev-main 2026-06-01 23:22 UTC

This package is auto-updated.

Last update: 2026-06-02 09:06:51 UTC


README

A composable input/output safety layer for PHP applications that call LLMs. Redact PII before it leaves your server, catch prompt-injection and jailbreak attempts, scan for leaked secrets, and moderate model output — all in one pipeline.

PHP Version License

Why this exists (the 2026 gap)

The PHP AI ecosystem grew up fast in early 2026 — Prism PHP, Neuron AI, and the official Laravel AI SDK now cover the happy path: making calls, RAG, agents, and structured output. What none of them own is the safety boundary.

In the Python world this is a solved, well-funded problem: LiteLLM guardrails and pydantic-ai-shields ship PII filtering, prompt-injection defense, and secret redaction out of the box. PHP teams shipping AI to production in 2026 have no native equivalent — they either roll fragile regexes by hand or proxy everything through a Python sidecar.

Ykachala Guardrails closes that gap with a framework-agnostic, provider-agnostic safety pipeline that sits between your app and any LLM client.

What it does

Guard Stage What it catches
PiiGuard input + output Emails, phones, SSNs, credit cards, IBANs, IPs — reversibly tokenized so you can restore them after the round-trip
InjectionGuard input "Ignore previous instructions", role-override, delimiter-escape, and encoded jailbreak patterns (heuristic + optional LLM classifier)
SecretGuard input + output API keys, bearer tokens, private keys, .env-shaped values
ModerationGuard output Toxicity, banned topics, configurable category thresholds
SchemaGuard output Rejects responses that break an expected shape before they reach users

Each guard returns a verdict (allow / redact / block) with a reason, so you decide the policy: fail closed, fail open, or sanitize-and-continue.

Install

composer require ykachala/guardrails

Quick start

use Ykachala\Guardrails\Pipeline;
use Ykachala\Guardrails\Guard\PiiGuard;
use Ykachala\Guardrails\Guard\InjectionGuard;
use Ykachala\Guardrails\Guard\SecretGuard;

$pipeline = Pipeline::make()
    ->input(new InjectionGuard(action: 'block'))
    ->input(new SecretGuard(action: 'block'))
    ->input(new PiiGuard(action: 'redact'))   // reversible
    ->output(new PiiGuard(action: 'restore')); // put real values back

// 1. Sanitize what you send to the model
$safe = $pipeline->inbound($userMessage);
if ($safe->blocked()) {
    throw new UnsafeInputException($safe->reason());
}

$response = $yourLlmClient->chat($safe->text());

// 2. Sanitize what you show the user
$clean = $pipeline->outbound($response, context: $safe);
echo $clean->text();

Reversible PII tokenization

// "Email john@acme.com about order 4111-1111-1111-1111"
$safe = $pipeline->inbound($text);
// -> "Email <PII_EMAIL_1> about order <PII_CC_1>"   (sent to the model)

$restored = $pipeline->outbound($modelReply, context: $safe);
// model's "<PII_EMAIL_1>" placeholders are swapped back to john@acme.com

Framework bridges

  • LaravelGuardrailsServiceProvider, config publish, and a guarded() macro on the Prism/Laravel-AI pending request.
  • Symfony — a GuardrailsBundle with a tagged-service guard registry and middleware.
  • Vanilla — the Pipeline is pure PHP with zero framework deps.

Architecture

src/
├── Pipeline.php            # orchestrates inbound/outbound passes
├── Verdict.php             # allow | redact | block + reason + redaction map
├── Guard/
│   ├── GuardInterface.php
│   ├── PiiGuard.php
│   ├── InjectionGuard.php
│   ├── SecretGuard.php
│   └── ModerationGuard.php
├── Detector/               # low-level detectors (regex + pluggable ML)
└── Bridge/                 # Laravel + Symfony integration

Roadmap

  • Core pipeline + verdict model
  • Regex detector pack (PII, secrets) with locale-aware rules
  • Heuristic injection detector + optional LLM classifier adapter
  • Reversible tokenization store
  • Laravel & Symfony bridges
  • Pluggable model-based detectors (toxicity, semantic PII)

See CLAUDE.md for the full phase plan and conventions.

License

MIT