flarex/flareshield

Laravel-native AI security framework — protects chatbots, agents, RAG pipelines and tool-calling workflows against prompt injection, jailbreaks, system prompt leakage, RAG injection and unsafe AI output.

Maintainers

Package info

github.com/flarexsolutions/flareshield

Homepage

Issues

pkg:composer/flarex/flareshield

Statistics

Installs: 2

Dependents: 0

Suggesters: 0

Stars: 0

v1.0.0 2026-05-07 08:45 UTC

This package is auto-updated.

Last update: 2026-05-07 09:05:12 UTC


README

Latest Version on Packagist License

Laravel-native AI security framework for the LLM era. Defend chatbots, agents, RAG pipelines and tool-calling workflows against prompt injection, jailbreaks, system prompt leakage, RAG poisoning, malicious tool usage and unsafe AI output — with a single Composer install.

Table of Contents

Why FlareShield

LLM-powered features ship in days, but the threat surface of an AI system is fundamentally different from a traditional web app. FlareShield gives Laravel developers a defense-in-depth toolkit designed specifically for that gap:

  • Layered detection — heuristic, encoded-payload, HTML/Markdown, hidden-instruction, multilingual and indirect-injection scanners.
  • Risk-scored verdicts — every prompt receives a normalized 0–100 score with three verdicts: pass, flag, block.
  • Laravel-native — Service Provider, Facade, middleware aliases, config publishing, events, container-driven extensibility.
  • Production-ready — strict types, immutable value objects, PSR-3 logging, no third-party AI dependencies, fully testable.
  • Pluggable — every scanner, validator, risk engine and tool policy is bound through the container and trivially overridable.

Threat Model

FlareShield is designed to mitigate the OWASP LLM Top 10 categories most relevant to application-layer code:

Threat Layer
LLM01 — Prompt Injection (direct) scanPrompt, role-override + heuristic scanners
LLM01 — Prompt Injection (indirect) sanitizeDocument, indirect-injection scanner, RAG fence
LLM02 — Insecure Output Handling scanOutput, output validator (HTML/script/secret leak)
LLM06 — Sensitive Information Disclosure system-prompt-leak scanner + secret detection in output
LLM07 — Insecure Plugin / Tool Design authorizeTool, ToolPermissionPolicy
LLM08 — Excessive Agency per-agent config + confirmation flag for high-impact tools
LLM09 — Overreliance structured ScanResult you can act on

FlareShield does not ship any AI model itself. It is a deterministic, auditable, framework-side guardrail — pair it with provider-side moderation (OpenAI Moderation, Llama Guard, AWS Bedrock Guardrails, etc.) for the strongest posture.

Installation

composer require flarex/flareshield

Publish the config (optional but recommended):

php artisan vendor:publish --tag=flareshield-config

The package auto-registers via Laravel's package discovery (FlareShieldServiceProvider) and exposes the FlareShield facade.

Requirements: PHP 8.2+, Laravel 11 / 12 / 13.

Quick Start

use FlareX\FlareShield\Facades\FlareShield;
use FlareX\FlareShield\Exceptions\PromptInjectionException;

try {
    $safePrompt = FlareShield::guardPrompt($request->input('message'));
    $reply      = $myAiClient->chat($safePrompt);
    $safeReply  = FlareShield::guardOutput($reply);

    return response()->json(['reply' => $safeReply]);
} catch (PromptInjectionException $e) {
    return response()->json([
        'error'  => 'blocked',
        'reason' => $e->result()->toArray(),
    ], 422);
}

Need the structured result instead of an exception? Use scanPrompt / scanOutput:

$result = FlareShield::scanPrompt($input);

$result->passed();   // bool
$result->flagged();  // bool — suspicious but not blocked
$result->blocked();  // bool
$result->score;      // 0..100
$result->threats;    // Threat[] — type, severity, scanner, matches

Middleware

Two middleware aliases are registered:

Alias Class Purpose
flareshield.prompt ProtectAiPrompt Validates inbound user prompt
flareshield.output ProtectAiOutput Validates outbound JSON reply
Route::post('/chat', [ChatController::class, 'send'])
     ->middleware([
         'flareshield.prompt:message,support-bot',
         'flareshield.output:reply,support-bot',
     ]);

Parameters: {field}, {agent?}. Blocked prompts return 422 with a structured JSON body. The full ScanResult is also stashed on the request under flareshield.prompt_result so your controller can inspect it.

Guarding RAG Documents

Indirect prompt injection is the #1 RAG threat. Sanitize every retrieved chunk before injecting it into the model context:

$cleanDoc = FlareShield::guardDocument($retrievedChunk);

$messages[] = ['role' => 'user', 'content' => "Context:\n" . $cleanDoc];

guardDocument() will:

  1. Run all configured scanners against the chunk.
  2. Strip HTML comments, zero-width / control characters and tag-style hidden text.
  3. Quote suspicious imperative phrases so the model treats them as data.
  4. Truncate to a configured maximum length.
  5. Wrap the result in clearly labeled <<<UNTRUSTED_DOCUMENT>>> fences.
  6. Throw RagInjectionException when the chunk crosses the block threshold.

Guarding Tool Calls

use FlareX\FlareShield\Exceptions\ToolPermissionException;

try {
    FlareShield::authorizeTool('database.read', ['table' => 'orders']);
    $result = $tools->call('database.read', ...);
} catch (ToolPermissionException $e) {
    Log::warning('AI tried to call a forbidden tool.', ['ex' => $e->getMessage()]);
}

if (FlareShield::toolRequiresConfirmation('email.send')) {
    // present a confirmation step to the user
}

Configure in config/flareshield.php under the tools key.

Output Validation

scanOutput() runs the configured output_validators. The default OutputValidator flags:

  • system-prompt echoes ("system prompt:", "initial instructions:")
  • API keys / tokens (AWS, GitHub, OpenAI, JWTs, PEM private keys)
  • Markdown image links that look like exfiltration beacons
  • raw <script> / on*= HTML

Per-Agent Configuration

Every config key can be overridden per agent:

// config/flareshield.php
'agents' => [
    'support-bot' => [
        'level'    => 'strict',
        'denylist' => ['/refund all customers/i'],
    ],
    'docs-rag' => [
        'level' => 'enterprise',
    ],
],
FlareShield::for('support-bot')->scanPrompt($input);

Security Levels

Level Flag ≥ Block ≥ Use case
lenient 60 90 Local dev, demos
balanced 40 70 Production default
strict 25 50 Finance, health, internal admin bots
enterprise 20 45 Strict + verbose telemetry

Tune precisely in config('flareshield.thresholds').

Events

use FlareX\FlareShield\Events\ThreatDetected;

Event::listen(ThreatDetected::class, function (ThreatDetected $e) {
    // forward to SIEM, increment Pulse counter, alert on Slack, etc.
});

Available events: PromptScanned, OutputScanned, ThreatDetected, ToolCallBlocked.

Extending FlareShield

Write a custom scanner:

use FlareX\FlareShield\Contracts\Scanner;
use FlareX\FlareShield\Support\{ScanContext, Severity, Threat};

class CompanySecretScanner implements Scanner
{
    public function name(): string { return 'company_secret'; }

    public function scan(string $input, ScanContext $ctx): array
    {
        if (! preg_match('/PROJECT-NEPTUNE/', $input)) return [];

        return [new Threat(
            'internal_codename',
            'Internal codename leaked.',
            Severity::Critical,
            $this->name(),
        )];
    }
}

Then register it in config/flareshield.php:

'scanners' => [
    \FlareX\FlareShield\Scanners\HeuristicScanner::class,
    \App\Security\CompanySecretScanner::class,
    // ...
],

Need a different scoring strategy? Bind your own RiskEngine:

$this->app->bind(\FlareX\FlareShield\Contracts\RiskEngine::class, MyEngine::class);

Testing

composer install
vendor/bin/phpunit

The suite uses Orchestra Testbench and exercises scanners, the risk engine, the manager and the HTTP middleware end-to-end with realistic attack payloads.

Architecture

src/
├── FlareShield.php                 # Central manager (per-agent scoping + dispatch)
├── FlareShieldServiceProvider.php
├── Facades/FlareShield.php
├── Contracts/                      # Scanner, Guard, RiskEngine, ToolPolicy, ...
├── Support/                        # ScanResult, ScanContext, Threat, Severity
├── Scanners/                       # 8 detection strategies
├── Risk/DefaultRiskEngine.php      # Noisy-OR scoring + level thresholds
├── Validators/                     # OutputValidator, RagSanitizer
├── Policies/ToolPermissionPolicy.php
├── Middleware/                     # ProtectAiPrompt, ProtectAiOutput
├── Logging/DefaultAttackLogger.php
├── Events/                         # PromptScanned, OutputScanned, ...
└── Exceptions/                     # PromptInjection, Jailbreak, ToolPermission, ...

See docs/architecture.md and docs/threat-model.md for deeper dives.

License

MIT © FlareX. See LICENSE.