README

Laravel-native AI security framework for the LLM era. Defend chatbots, agents, RAG pipelines and tool-calling workflows against prompt injection, jailbreaks, system prompt leakage, RAG poisoning, malicious tool usage and unsafe AI output — with a single Composer install.

Why FlareShield
Threat Model
Installation
Quick Start
Middleware
Guarding RAG Documents
Guarding Tool Calls
Output Validation
Per-Agent Configuration
Security Levels
Events
Extending FlareShield
Testing
Architecture
License

Why FlareShield

LLM-powered features ship in days, but the threat surface of an AI system is fundamentally different from a traditional web app. FlareShield gives Laravel developers a defense-in-depth toolkit designed specifically for that gap:

Layered detection — heuristic, encoded-payload, HTML/Markdown, hidden-instruction, multilingual and indirect-injection scanners.
Risk-scored verdicts — every prompt receives a normalized 0–100 score with three verdicts: pass, flag, block.
Laravel-native — Service Provider, Facade, middleware aliases, config publishing, events, container-driven extensibility.
Production-ready — strict types, immutable value objects, PSR-3 logging, no third-party AI dependencies, fully testable.
Pluggable — every scanner, validator, risk engine and tool policy is bound through the container and trivially overridable.

Threat Model

FlareShield is designed to mitigate the OWASP LLM Top 10 categories most relevant to application-layer code:

Threat	Layer
LLM01 — Prompt Injection (direct)	`scanPrompt`, role-override + heuristic scanners
LLM01 — Prompt Injection (indirect)	`sanitizeDocument`, indirect-injection scanner, RAG fence
LLM02 — Insecure Output Handling	`scanOutput`, output validator (HTML/script/secret leak)
LLM06 — Sensitive Information Disclosure	system-prompt-leak scanner + secret detection in output
LLM07 — Insecure Plugin / Tool Design	`authorizeTool`, ToolPermissionPolicy
LLM08 — Excessive Agency	per-agent config + confirmation flag for high-impact tools
LLM09 — Overreliance	structured `ScanResult` you can act on

FlareShield does not ship any AI model itself. It is a deterministic, auditable, framework-side guardrail — pair it with provider-side moderation (OpenAI Moderation, Llama Guard, AWS Bedrock Guardrails, etc.) for the strongest posture.

Installation

composer require flarex/flareshield

Publish the config (optional but recommended):

php artisan vendor:publish --tag=flareshield-config

The package auto-registers via Laravel's package discovery (FlareShieldServiceProvider) and exposes the FlareShield facade.

Requirements: PHP 8.2+, Laravel 11 / 12 / 13.

Quick Start

use FlareX\FlareShield\Facades\FlareShield;
use FlareX\FlareShield\Exceptions\PromptInjectionException;

try {
    $safePrompt = FlareShield::guardPrompt($request->input('message'));
    $reply      = $myAiClient->chat($safePrompt);
    $safeReply  = FlareShield::guardOutput($reply);

    return response()->json(['reply' => $safeReply]);
} catch (PromptInjectionException $e) {
    return response()->json([
        'error'  => 'blocked',
        'reason' => $e->result()->toArray(),
    ], 422);
}

Need the structured result instead of an exception? Use scanPrompt / scanOutput:

$result = FlareShield::scanPrompt($input);

$result->passed();   // bool
$result->flagged();  // bool — suspicious but not blocked
$result->blocked();  // bool
$result->score;      // 0..100
$result->threats;    // Threat[] — type, severity, scanner, matches

Middleware

Two middleware aliases are registered:

Alias	Class	Purpose
`flareshield.prompt`	`ProtectAiPrompt`	Validates inbound user prompt
`flareshield.output`	`ProtectAiOutput`	Validates outbound JSON reply

Route::post('/chat', [ChatController::class, 'send'])
     ->middleware([
         'flareshield.prompt:message,support-bot',
         'flareshield.output:reply,support-bot',
     ]);

Parameters: {field}, {agent?}. Blocked prompts return 422 with a structured JSON body. The full ScanResult is also stashed on the request under flareshield.prompt_result so your controller can inspect it.

Guarding RAG Documents

Indirect prompt injection is the #1 RAG threat. Sanitize every retrieved chunk before injecting it into the model context:

$cleanDoc = FlareShield::guardDocument($retrievedChunk);

$messages[] = ['role' => 'user', 'content' => "Context:\n" . $cleanDoc];

guardDocument() will:

Run all configured scanners against the chunk.
Strip HTML comments, zero-width / control characters and tag-style hidden text.
Quote suspicious imperative phrases so the model treats them as data.
Truncate to a configured maximum length.
Wrap the result in clearly labeled <<<UNTRUSTED_DOCUMENT>>> fences.
Throw RagInjectionException when the chunk crosses the block threshold.

Guarding Tool Calls

use FlareX\FlareShield\Exceptions\ToolPermissionException;

try {
    FlareShield::authorizeTool('database.read', ['table' => 'orders']);
    $result = $tools->call('database.read', ...);
} catch (ToolPermissionException $e) {
    Log::warning('AI tried to call a forbidden tool.', ['ex' => $e->getMessage()]);
}

if (FlareShield::toolRequiresConfirmation('email.send')) {
    // present a confirmation step to the user
}

Configure in config/flareshield.php under the tools key.

Output Validation

scanOutput() runs the configured output_validators. The default OutputValidator flags:

system-prompt echoes ("system prompt:", "initial instructions:")
API keys / tokens (AWS, GitHub, OpenAI, JWTs, PEM private keys)
Markdown image links that look like exfiltration beacons
raw <script> / on*= HTML

Per-Agent Configuration

Every config key can be overridden per agent:

// config/flareshield.php
'agents' => [
    'support-bot' => [
        'level'    => 'strict',
        'denylist' => ['/refund all customers/i'],
    ],
    'docs-rag' => [
        'level' => 'enterprise',
    ],
],

FlareShield::for('support-bot')->scanPrompt($input);

Security Levels

Level	Flag ≥	Block ≥	Use case
`lenient`	60	90	Local dev, demos
`balanced`	40	70	Production default
`strict`	25	50	Finance, health, internal admin bots
`enterprise`	20	45	Strict + verbose telemetry

Tune precisely in config('flareshield.thresholds').

Events

use FlareX\FlareShield\Events\ThreatDetected;

Event::listen(ThreatDetected::class, function (ThreatDetected $e) {
    // forward to SIEM, increment Pulse counter, alert on Slack, etc.
});

Available events: PromptScanned, OutputScanned, ThreatDetected, ToolCallBlocked.

Extending FlareShield

Write a custom scanner:

use FlareX\FlareShield\Contracts\Scanner;
use FlareX\FlareShield\Support\{ScanContext, Severity, Threat};

class CompanySecretScanner implements Scanner
{
    public function name(): string { return 'company_secret'; }

    public function scan(string $input, ScanContext $ctx): array
    {
        if (! preg_match('/PROJECT-NEPTUNE/', $input)) return [];

        return [new Threat(
            'internal_codename',
            'Internal codename leaked.',
            Severity::Critical,
            $this->name(),
        )];
    }
}

Then register it in config/flareshield.php:

'scanners' => [
    \FlareX\FlareShield\Scanners\HeuristicScanner::class,
    \App\Security\CompanySecretScanner::class,
    // ...
],

Need a different scoring strategy? Bind your own RiskEngine:

$this->app->bind(\FlareX\FlareShield\Contracts\RiskEngine::class, MyEngine::class);

Testing

composer install
vendor/bin/phpunit

The suite uses Orchestra Testbench and exercises scanners, the risk engine, the manager and the HTTP middleware end-to-end with realistic attack payloads.

Architecture

src/
├── FlareShield.php                 # Central manager (per-agent scoping + dispatch)
├── FlareShieldServiceProvider.php
├── Facades/FlareShield.php
├── Contracts/                      # Scanner, Guard, RiskEngine, ToolPolicy, ...
├── Support/                        # ScanResult, ScanContext, Threat, Severity
├── Scanners/                       # 8 detection strategies
├── Risk/DefaultRiskEngine.php      # Noisy-OR scoring + level thresholds
├── Validators/                     # OutputValidator, RagSanitizer
├── Policies/ToolPermissionPolicy.php
├── Middleware/                     # ProtectAiPrompt, ProtectAiOutput
├── Logging/DefaultAttackLogger.php
├── Events/                         # PromptScanned, OutputScanned, ...
└── Exceptions/                     # PromptInjection, Jailbreak, ToolPermission, ...

See docs/architecture.md and docs/threat-model.md for deeper dives.

flarex / flareshield

Maintainers

Package info

Statistics

Security