flarex / flareshield
Laravel-native AI security framework — protects chatbots, agents, RAG pipelines and tool-calling workflows against prompt injection, jailbreaks, system prompt leakage, RAG injection and unsafe AI output.
Requires
- php: ^8.2
- illuminate/contracts: ^11.0|^12.0|^13.0
- illuminate/events: ^11.0|^12.0|^13.0
- illuminate/http: ^11.0|^12.0|^13.0
- illuminate/pipeline: ^11.0|^12.0|^13.0
- illuminate/support: ^11.0|^12.0|^13.0
Requires (Dev)
- mockery/mockery: ^1.6
- orchestra/testbench: ^9.0|^10.0
- phpunit/phpunit: ^10.5|^11.0
README
Laravel-native AI security framework for the LLM era. Defend chatbots, agents, RAG pipelines and tool-calling workflows against prompt injection, jailbreaks, system prompt leakage, RAG poisoning, malicious tool usage and unsafe AI output — with a single Composer install.
Table of Contents
- Why FlareShield
- Threat Model
- Installation
- Quick Start
- Middleware
- Guarding RAG Documents
- Guarding Tool Calls
- Output Validation
- Per-Agent Configuration
- Security Levels
- Events
- Extending FlareShield
- Testing
- Architecture
- License
Why FlareShield
LLM-powered features ship in days, but the threat surface of an AI system is fundamentally different from a traditional web app. FlareShield gives Laravel developers a defense-in-depth toolkit designed specifically for that gap:
- Layered detection — heuristic, encoded-payload, HTML/Markdown, hidden-instruction, multilingual and indirect-injection scanners.
- Risk-scored verdicts — every prompt receives a normalized 0–100 score with three verdicts:
pass,flag,block. - Laravel-native — Service Provider, Facade, middleware aliases, config publishing, events, container-driven extensibility.
- Production-ready — strict types, immutable value objects, PSR-3 logging, no third-party AI dependencies, fully testable.
- Pluggable — every scanner, validator, risk engine and tool policy is bound through the container and trivially overridable.
Threat Model
FlareShield is designed to mitigate the OWASP LLM Top 10 categories most relevant to application-layer code:
| Threat | Layer |
|---|---|
| LLM01 — Prompt Injection (direct) | scanPrompt, role-override + heuristic scanners |
| LLM01 — Prompt Injection (indirect) | sanitizeDocument, indirect-injection scanner, RAG fence |
| LLM02 — Insecure Output Handling | scanOutput, output validator (HTML/script/secret leak) |
| LLM06 — Sensitive Information Disclosure | system-prompt-leak scanner + secret detection in output |
| LLM07 — Insecure Plugin / Tool Design | authorizeTool, ToolPermissionPolicy |
| LLM08 — Excessive Agency | per-agent config + confirmation flag for high-impact tools |
| LLM09 — Overreliance | structured ScanResult you can act on |
FlareShield does not ship any AI model itself. It is a deterministic, auditable, framework-side guardrail — pair it with provider-side moderation (OpenAI Moderation, Llama Guard, AWS Bedrock Guardrails, etc.) for the strongest posture.
Installation
composer require flarex/flareshield
Publish the config (optional but recommended):
php artisan vendor:publish --tag=flareshield-config
The package auto-registers via Laravel's package discovery
(FlareShieldServiceProvider) and exposes the FlareShield facade.
Requirements: PHP 8.2+, Laravel 11 / 12 / 13.
Quick Start
use FlareX\FlareShield\Facades\FlareShield; use FlareX\FlareShield\Exceptions\PromptInjectionException; try { $safePrompt = FlareShield::guardPrompt($request->input('message')); $reply = $myAiClient->chat($safePrompt); $safeReply = FlareShield::guardOutput($reply); return response()->json(['reply' => $safeReply]); } catch (PromptInjectionException $e) { return response()->json([ 'error' => 'blocked', 'reason' => $e->result()->toArray(), ], 422); }
Need the structured result instead of an exception? Use scanPrompt /
scanOutput:
$result = FlareShield::scanPrompt($input); $result->passed(); // bool $result->flagged(); // bool — suspicious but not blocked $result->blocked(); // bool $result->score; // 0..100 $result->threats; // Threat[] — type, severity, scanner, matches
Middleware
Two middleware aliases are registered:
| Alias | Class | Purpose |
|---|---|---|
flareshield.prompt |
ProtectAiPrompt |
Validates inbound user prompt |
flareshield.output |
ProtectAiOutput |
Validates outbound JSON reply |
Route::post('/chat', [ChatController::class, 'send']) ->middleware([ 'flareshield.prompt:message,support-bot', 'flareshield.output:reply,support-bot', ]);
Parameters: {field}, {agent?}. Blocked prompts return 422 with a
structured JSON body. The full ScanResult is also stashed on the request
under flareshield.prompt_result so your controller can inspect it.
Guarding RAG Documents
Indirect prompt injection is the #1 RAG threat. Sanitize every retrieved chunk before injecting it into the model context:
$cleanDoc = FlareShield::guardDocument($retrievedChunk); $messages[] = ['role' => 'user', 'content' => "Context:\n" . $cleanDoc];
guardDocument() will:
- Run all configured scanners against the chunk.
- Strip HTML comments, zero-width / control characters and tag-style hidden text.
- Quote suspicious imperative phrases so the model treats them as data.
- Truncate to a configured maximum length.
- Wrap the result in clearly labeled
<<<UNTRUSTED_DOCUMENT>>>fences. - Throw
RagInjectionExceptionwhen the chunk crosses the block threshold.
Guarding Tool Calls
use FlareX\FlareShield\Exceptions\ToolPermissionException; try { FlareShield::authorizeTool('database.read', ['table' => 'orders']); $result = $tools->call('database.read', ...); } catch (ToolPermissionException $e) { Log::warning('AI tried to call a forbidden tool.', ['ex' => $e->getMessage()]); } if (FlareShield::toolRequiresConfirmation('email.send')) { // present a confirmation step to the user }
Configure in config/flareshield.php under the tools key.
Output Validation
scanOutput() runs the configured output_validators. The default
OutputValidator flags:
- system-prompt echoes (
"system prompt:","initial instructions:") - API keys / tokens (AWS, GitHub, OpenAI, JWTs, PEM private keys)
- Markdown image links that look like exfiltration beacons
- raw
<script>/on*=HTML
Per-Agent Configuration
Every config key can be overridden per agent:
// config/flareshield.php 'agents' => [ 'support-bot' => [ 'level' => 'strict', 'denylist' => ['/refund all customers/i'], ], 'docs-rag' => [ 'level' => 'enterprise', ], ],
FlareShield::for('support-bot')->scanPrompt($input);
Security Levels
| Level | Flag ≥ | Block ≥ | Use case |
|---|---|---|---|
lenient |
60 | 90 | Local dev, demos |
balanced |
40 | 70 | Production default |
strict |
25 | 50 | Finance, health, internal admin bots |
enterprise |
20 | 45 | Strict + verbose telemetry |
Tune precisely in config('flareshield.thresholds').
Events
use FlareX\FlareShield\Events\ThreatDetected; Event::listen(ThreatDetected::class, function (ThreatDetected $e) { // forward to SIEM, increment Pulse counter, alert on Slack, etc. });
Available events: PromptScanned, OutputScanned, ThreatDetected, ToolCallBlocked.
Extending FlareShield
Write a custom scanner:
use FlareX\FlareShield\Contracts\Scanner; use FlareX\FlareShield\Support\{ScanContext, Severity, Threat}; class CompanySecretScanner implements Scanner { public function name(): string { return 'company_secret'; } public function scan(string $input, ScanContext $ctx): array { if (! preg_match('/PROJECT-NEPTUNE/', $input)) return []; return [new Threat( 'internal_codename', 'Internal codename leaked.', Severity::Critical, $this->name(), )]; } }
Then register it in config/flareshield.php:
'scanners' => [ \FlareX\FlareShield\Scanners\HeuristicScanner::class, \App\Security\CompanySecretScanner::class, // ... ],
Need a different scoring strategy? Bind your own RiskEngine:
$this->app->bind(\FlareX\FlareShield\Contracts\RiskEngine::class, MyEngine::class);
Testing
composer install vendor/bin/phpunit
The suite uses Orchestra Testbench and exercises scanners, the risk engine, the manager and the HTTP middleware end-to-end with realistic attack payloads.
Architecture
src/
├── FlareShield.php # Central manager (per-agent scoping + dispatch)
├── FlareShieldServiceProvider.php
├── Facades/FlareShield.php
├── Contracts/ # Scanner, Guard, RiskEngine, ToolPolicy, ...
├── Support/ # ScanResult, ScanContext, Threat, Severity
├── Scanners/ # 8 detection strategies
├── Risk/DefaultRiskEngine.php # Noisy-OR scoring + level thresholds
├── Validators/ # OutputValidator, RagSanitizer
├── Policies/ToolPermissionPolicy.php
├── Middleware/ # ProtectAiPrompt, ProtectAiOutput
├── Logging/DefaultAttackLogger.php
├── Events/ # PromptScanned, OutputScanned, ...
└── Exceptions/ # PromptInjection, Jailbreak, ToolPermission, ...
See docs/architecture.md and docs/threat-model.md for deeper dives.
License
MIT © FlareX. See LICENSE.