padosoft/laravel-pii-redactor

Italian-first PII redaction pipeline for Laravel: deterministic regex + checksum-validated detectors (codice fiscale, P.IVA, IBAN), pluggable strategies (mask / hash / tokenise / drop), GDPR + EU AI Act ready.

Maintainers

Package info

github.com/padosoft/laravel-pii-redactor

pkg:composer/padosoft/laravel-pii-redactor

Statistics

Installs: 0

Dependents: 1

Suggesters: 0

Stars: 0

Open Issues: 1

v1.2.0 2026-05-05 22:54 UTC

This package is auto-updated.

Last update: 2026-05-06 21:36:41 UTC


README

Tests Latest Version PHP Version Laravel Version License Total Downloads

Laravel PII Redactor banner

EU-first PII redaction for Laravel โ€” deterministic regex + checksum-validated detectors organised into opt-in country packs (Italy, Germany, Spain ship built-in; France / Netherlands / Portugal land in v1.2+), plus always-on multi-country detectors (email, IBAN mod-97 for every ISO 13616 country, credit card with Luhn) and a pluggable strategy layer (mask / hash / tokenise / drop) with persistent reverse-map storage (memory / database / cache), opt-in HuggingFace + spaCy NER drivers, and YAML custom-rule packs for tenant-specific identifiers. Zero external services in the default path, zero mandatory LLM cost, GDPR + EU AI Act ready.

laravel-pii-redactor is the seventh deliverable of the Padosoft v4.0 cycle (W7). It is a community Apache-2.0 package, standalone-agnostic (zero references to AskMyDocs / sister packages), and ships with the Padosoft AI vibe-coding pack so you can extend it with Claude Code or GitHub Copilot in minutes โ€” not days.

use Padosoft\PiiRedactor\Facades\Pii;

$clean = Pii::redact('Codice fiscale RSSMRA85T10A562S, IBAN IT60X0542811101000000123456, mail: mario@example.com.');
// "Codice fiscale [REDACTED], IBAN [REDACTED], mail: [REDACTED]."

$report = Pii::scan('Telefono +39 333 1234567 e P.IVA 12345678903.');
// $report->countsByDetector() === ['phone_it' => 1, 'p_iva' => 1]

Table of contents

Why this package

PII redaction is one of those domains where the existing options force a bad trade-off:

  • Build it yourself with a few hand-crafted regexes โ€” fast to write, but the moment a real Italian fiscal code shows up (16 alphanumeric characters with a checksum derived from a Decreto Ministeriale lookup table) your "good enough" regex starts emitting false positives that break audits.
  • Reach for Microsoft Presidio / AWS Comprehend / Google DLP โ€” robust, but they assume a US-centric set of identifiers. None of them validate the Italian codice fiscale checksum out of the box, and routing every chat-log line through a hosted PII service is operationally expensive and a GDPR amplifier.
  • Bolt an LLM-based redactor onto the pipeline โ€” works, but pays per-token to do something that is, fundamentally, a regular language problem.

laravel-pii-redactor covers the deterministic layer. v1.0 ships:

  • 3 always-on multi-country detectors โ€” email (RFC-5321 shape), iban (ISO 13616 country-length table + mod-97 for every registered country, ~75), credit_card (Luhn).
  • 3 shipped country packs (v1.1):
    • ItalyPack โ€” codice_fiscale (CIN checksum), partita_iva (Luhn-IT), phone_it, address_it.
    • GermanyPack (v1.1) โ€” steuer_id (mod-11 ISO 7064 per ยง139b AO), ust_idnr (BMF Method 30 per ยง27a UStG), phone_de, address_de.
    • SpainPack (v1.1) โ€” dni (23-letter checksum table per RD 1553/2005), nie (prefix-substituted DNI), cif (corporate ID), phone_es, address_es.
  • PackContract interface + DetectorPackRegistry โ€” opt-in jurisdiction bundles. Operate in Italy only? Keep ItalyPack. Operate across the EU? Add GermanyPack / SpainPack (shipped v1.1) or FrancePack (v1.2+ candidate). Operate outside Italy? Drop ItalyPack from the config.
  • 4 pluggable replacement strategies โ€” mask, hash, tokenise, drop.
  • 3 token-store drivers โ€” memory (default), database (Eloquent + shipped migration), cache (Redis / Memcached / DynamoDB / array).
  • 2 production NER drivers (opt-in) โ€” HuggingFaceNerDriver, SpaCyNerDriver. Network calls fail open.
  • YAML custom-rule packs โ€” register tenant-specific detectors from *.yaml files; SP auto-registers when pii-redactor.custom_rules.auto_register = true.
  • Typed DetectionReport โ€” audit every redaction without re-running the engine.
  • Admin-ready headless APIs โ€” safe status snapshots, strategy factory, masked report formatter, token resolution, and custom-rule diagnostics for a separate Laravel 13 React/Tailwind admin package. See Admin panel readiness.

It is deliberately small and deliberately offline by default. You can extend it with custom detectors via Pii::extend() or your own country pack. The deterministic engine fits in ~200 lines of PHP, the v1.0 surface is locked under semver, and 300+ unit tests + a robustness suite describe every transition.

Design rationale

Five non-negotiable choices that drove the API:

1. EU-first via opt-in country packs. World-second.

Every PII pipeline I have seen for Laravel either ignores European fiscal data or matches it with a bare regex that returns false positives on every retry CI run. National identifiers need real code: the Italian codice fiscale requires the official odd/even checksum table from the 1976 Decreto Ministeriale; the German Steuer-ID needs mod-11; the Spanish DNI needs a letter-checksum lookup; the French NIR needs mod-97. A regex alone won't do.

Hence country packs. v1.0 shipped ItalyPack as the reference implementation (4 Italian detectors with the full CIN checksum + Luhn-IT). v1.1 makes good on the promise with two more concrete bundles โ€” GermanyPack (Steuer-ID mod-11 ISO 7064 per ยง139b AO + USt-IdNr BMF Method 30 per ยง27a UStG + German phone/address) and SpainPack (DNI 23-letter checksum table per RD 1553/2005 + NIE + CIF + Spanish phone/address). Both opt-in via a single FQCN in config('pii-redactor.packs'). The PackContract interface + DetectorPackRegistry make it equally trivial for the community to contribute FrancePack, NetherlandsPack, PortugalPack next โ€” each as a self-contained bundle of detectors with checksum-source citations and 10/5 valid/invalid fixtures.

Multi-country detectors (email, iban with mod-97 for every ISO 13616 country, credit_card with Luhn) stay always-on regardless of which packs you load โ€” they have no jurisdictional flavour.

2. Deterministic regex + checksum, no LLM in the hot path

Every first-party detector is a pure function of its input. No external HTTP call, no per-token cost, no rate limit. A 1 MB chat log redacts in ~280 ms and the output is identical on every machine. The optional NER layer (v0.3+) ships behind a config switch; the default path never touches a network.

3. Strategy is a runtime decision, not a compile-time one

The same detected match can be masked ([REDACTED] for human-facing logs), hashed ([hash:abc123ef01234567] for cross-record joins on pseudonymous data), tokenised ([tok:email:abc123ef01234567] with a reversible salt-derived map for forensic recovery), or dropped (empty string for forwarding to lossy systems). Switching strategy is a one-line override on Pii::redact($text, new HashStrategy(...)) โ€” no detector code changes.

4. Detector overlap is resolved deterministically

When two detectors emit overlapping byte ranges (e.g. an email-shaped string that also matches a phone heuristic), the engine keeps the earlier match (lower offset) and drops the latecomer. The behaviour is documented, tested, and predictable โ€” callers can audit it via Pii::scan().

5. Standalone-agnostic โ€” zero AskMyDocs symbols

laravel-pii-redactor is a community package. It is not coupled to AskMyDocs, the sister patent-box tracker, the eval-harness, the Regolo driver, or any other Padosoft project. An architecture test (tests/Architecture/StandaloneAgnosticTest.php) walks src/ with RecursiveDirectoryIterator on every CI run and asserts the forbidden-substring list (KnowledgeDocument, KbSearchService, AskMyDocs, PatentBoxTracker, LaravelFlow, EvalHarness, Regolo, ...) never appears.

Features at a glance

  • ๐Ÿ‡ช๐Ÿ‡บ EU country pack architecture โ€” PackContract interface + DetectorPackRegistry boots country packs from config('pii-redactor.packs'). Three packs ship in v1.1: ItalyPack (default), GermanyPack (opt-in), SpainPack (opt-in). FrancePack, NetherlandsPack, PortugalPack are community PRs welcome (see CONTRIBUTING-PACKS.md).
  • 3 always-on multi-country detectors (no pack required):
    • email โ€” pragmatic RFC-5321 shape match.
    • iban โ€” ISO 13616 IBAN for every registered country (~75) + mod-97 verification.
    • credit_card โ€” 13โ€“19 digit PAN with Luhn validation.
  • ItalyPack (default โ€” 4 detectors):
    • codice_fiscale โ€” 16-char Italian fiscal code with full CIN checksum (Decreto Ministeriale 23/12/1976).
    • p_iva โ€” 11-digit Italian VAT with Luhn-style checksum + zero-payload sentinel rejection.
    • phone_it โ€” Italian mobile + landline (with optional +39 / 0039 prefix).
    • address_it โ€” Italian street address heuristic (Via / Viale / Piazza / Corso / Largo / Strada / Vicolo / Lungomare + compound forms Via dei, Via della, Via d'โ€ฆ); civic number + 5-digit CAP + city optional.
  • 4 pluggable redaction strategies: MaskStrategy, HashStrategy (deterministic, salt-derived, namespaced per detector), TokeniseStrategy (reversible pseudonymisation with detokenise() + dumpMap() / loadMap() for cross-process recovery), DropStrategy.
  • Persistent reverse-map storage (v0.2) โ€” TokenStore interface + InMemoryTokenStore (default, process-local) + DatabaseTokenStore (Eloquent-backed, shipped migration pii_token_maps). The same [tok:...] token detokenises across deploys / queue workers when the database driver is wired. Switch via PII_REDACTOR_TOKEN_STORE=database and run php artisan vendor:publish --tag=pii-redactor-migrations && php artisan migrate.
  • Audit-trail event (v0.2) โ€” opt-in PiiRedactionPerformed Laravel event fired after a redact() call that produced at least one detection, when PII_REDACTOR_AUDIT_TRAIL=true (or the structured audit_trail.enabled key is set). No-op redactions (engine disabled, empty input, zero detections) skip the dispatch โ€” the event signals "redaction occurred", not "request processed". Event carries counts only (detector โ†’ match count, total, strategy name) โ€” NEVER raw PII or redacted output. GDPR-friendly by construction.
  • NER drivers (v0.2 scaffold + v0.3 production) โ€” NerDriver interface + StubNerDriver (no-op default), HuggingFaceNerDriver (HuggingFace Inference API via Http::, opt-in via PII_REDACTOR_HUGGINGFACE_API_KEY), SpaCyNerDriver (generic spaCy HTTP server protocol returning Doc.to_json() shape, opt-in via PII_REDACTOR_SPACY_SERVER_URL). Both real drivers fail open on HTTP errors so a NER outage cannot block deterministic redaction. Driver detections merge into the same overlap-resolution pipeline as first-party detectors.
  • Cache-backed TokenStore (v0.3) โ€” third driver alongside InMemoryTokenStore and DatabaseTokenStore. Uses Laravel's Illuminate\Contracts\Cache\Repository so deployments swap between Redis / Memcached / DynamoDB / array (test) without touching package code. Maintains an explicit index entry so dump() / clear() work without scanning the backend keyspace. Optional TTL via PII_REDACTOR_TOKEN_STORE_CACHE_TTL. Switch with PII_REDACTOR_TOKEN_STORE=cache.
  • Custom-rule YAML packs (v0.3 + v1.0 auto-register) โ€” register tenant-specific detectors from *.yaml files. v1.0 adds an SP-level auto-register loop driven by config('pii-redactor.custom_rules.packs') so you can drop YAML packs into a config array and the SP wires them at boot. The host-controlled API still works for tenant-specific bootstrap logic:
    $set = (new YamlCustomRuleLoader())->load(storage_path('app/pii-rules/it-albo.yaml'));
    Pii::extend('custom_it_albo', new CustomRuleDetector('custom_it_albo', $set));
    Each rule has a name + PCRE pattern + optional flags (default u). Invalid PCRE is rejected at first-match time with a clear CustomRuleException. Useful for Italian professional registry IDs (ISCR-..., Tess-XX-...), tenant-specific account codes, project tracker identifiers, etc.
  • Live test suite (v0.3) โ€” tests/Live/ houses opt-in tests against real APIs (HuggingFace, spaCy server). Each test self-skips unless PII_REDACTOR_LIVE=1 AND its driver-specific credentials are set. CI runs Unit + Architecture only โ€” Live tests are operator-driven. See tests/Live/README.md for the convention.
  • Typed DetectionReport โ€” total(), countsByDetector(), samplesByDetector(cap), toArray(). Stable JSON shape for downstream auditors.
  • Admin-ready headless APIs โ€” RedactorAdminInspector exposes a secret-free runtime snapshot; RedactionStrategyFactory builds mask / hash / tokenise / drop strategies for admin previews; DetectionReportFormatter masks samples by default; TokenResolutionService detokenises through the configured TokenStore even when the current strategy is not tokenise; CustomRulePackInspector reports YAML pack health without registering detectors. Full implementation plan for the separate Laravel 13 + Vite + React + Tailwind UI package lives in docs/admin-panel-architecture-plan.md.
  • Pii::extend() registry for custom detectors (custom_codice_iscrizione_albo, project-specific account ids, etc.).
  • Artisan command โ€” php artisan pii:scan path/to/file.txt --pretty or cat data | php artisan pii:scan --from=stdin (samples masked by default; pass --show-samples for raw values during interactive forensics).
  • Standalone-agnostic โ€” zero coupling to AskMyDocs / sister packages, enforced by an architecture test.
  • PHP 8.3 / 8.4 / 8.5 ร— Laravel 12 / 13 matrix. Pint + PHPStan level 6 + 400+ PHPUnit tests on every push.
  • Padosoft AI vibe-coding pack (.claude/) โ€” Claude Code skills (R36 review loop, R10โ€“R37 rules) + agents (review pre-push) + commands (/create-job, /domain-scaffold).

๐Ÿ‡ช๐Ÿ‡บ EU country pack architecture

Why country packs exist. Italian fiscal codes need PHP code with checksum logic. So do German Steuer-ID (mod-11), Spanish DNI (letter-checksum), French NIR (mod-97). Pure regex isn't enough. Each country needs its own bundle of detectors โ€” but the package shouldn't ship all of EU's IDs by default if you only operate in Italy. Hence packs: opt-in jurisdiction bundles, registered via the PackContract interface and a config array.

Padosoft\PiiRedactor\
โ”œโ”€โ”€ Detectors\                         (multi-country, always-on)
โ”‚   โ”œโ”€โ”€ EmailDetector (RFC-5321 shape)
โ”‚   โ”œโ”€โ”€ IbanDetector (ISO 13616 mod-97 โ€” every EU country)
โ”‚   โ””โ”€โ”€ CreditCardDetector (Luhn)
โ””โ”€โ”€ Packs\
    โ”œโ”€โ”€ PackContract                   (interface)
    โ””โ”€โ”€ Italy\
        โ”œโ”€โ”€ ItalyPack                  (default โ€” config('pii-redactor.packs'))
        โ”‚   โ””โ”€โ”€ detectors() returns:
        โ”‚       โ”œโ”€โ”€ CodiceFiscaleDetector (CIN checksum)
        โ”‚       โ”œโ”€โ”€ PartitaIvaDetector (Luhn-IT)
        โ”‚       โ”œโ”€โ”€ PhoneItalianDetector
        โ”‚       โ””โ”€โ”€ AddressItalianDetector

Enable / disable example:

// config/pii-redactor.php
'packs' => [
    \Padosoft\PiiRedactor\Packs\Italy\ItalyPack::class,
    // \Padosoft\PiiRedactor\Packs\Germany\GermanyPack::class, // shipped v1.1 โ€” opt-in
    // \Padosoft\PiiRedactor\Packs\Spain\SpainPack::class,     // shipped v1.1 โ€” opt-in
],

To disable Italy on an English-only deployment:

'packs' => [
    // ItalyPack removed โ€” codice fiscale / P.IVA / Italian phone / Italian address detectors NOT registered
],

The multi-country detectors (Email, IBAN, CreditCard) keep working regardless โ€” they are never part of a country pack because they have no jurisdictional flavour.

Build your own country pack โ€” 3-step recipe

The recipe below uses Iceland (small, real European country, no community pack ships yet) as a "blank slate" example. The real kennitala checksum is mod-11 over the first 9 digits.

Step 1 โ€” Create the detector(s)

// src/Packs/Iceland/Detectors/KennitalaDetector.php
namespace Padosoft\PiiRedactor\Packs\Iceland\Detectors;

use Padosoft\PiiRedactor\Detectors\Detection;
use Padosoft\PiiRedactor\Detectors\Detector;

final class KennitalaDetector implements Detector
{
    public function name(): string
    {
        return 'kennitala';
    }

    public function detect(string $text): array
    {
        // 10 digits with mod-11 checksum on the first 9.
        if (preg_match_all('/\b(\d{6}-?\d{4})\b/u', $text, $matches, PREG_OFFSET_CAPTURE) === false) {
            return [];
        }
        $hits = [];
        foreach ($matches[1] as $m) {
            $value = preg_replace('/-/', '', (string) $m[0]);
            if (! $this->validChecksum($value)) {
                continue;
            }
            $hits[] = new Detection('kennitala', (string) $m[0], (int) $m[1], strlen((string) $m[0]));
        }
        return $hits;
    }

    private function validChecksum(string $kt): bool
    {
        // Weights: 3, 2, 7, 6, 5, 4, 3, 2 over the first 8 digits;
        // ninth digit is the check digit; mod-11 with 11 - r mapping.
        // ... real implementation here ...
        return true;
    }
}

Step 2 โ€” Wrap them in a pack

// src/Packs/Iceland/IcelandPack.php
namespace Padosoft\PiiRedactor\Packs\Iceland;

use Padosoft\PiiRedactor\Packs\PackContract;
use Padosoft\PiiRedactor\Packs\Iceland\Detectors\KennitalaDetector;

final class IcelandPack implements PackContract
{
    public function name(): string        { return 'iceland'; }
    public function countryCode(): string { return 'IS'; }
    public function description(): string { return 'Icelandic kennitala (mod-11) + (future) phone / address detectors.'; }

    public function detectors(): array
    {
        return [
            new KennitalaDetector(),
        ];
    }
}

Step 3 โ€” Register it

// config/pii-redactor.php
'packs' => [
    \Padosoft\PiiRedactor\Packs\Italy\ItalyPack::class,
    \Padosoft\PiiRedactor\Packs\Iceland\IcelandPack::class,  // your new pack
],

That's it. The ServiceProvider boots, the DetectorPackRegistry walks the config list, instantiates each pack, and feeds its detectors() into the engine. Pii::redact() and Pii::scan() now redact / report kennitala matches alongside the always-on detectors.

๐Ÿš€ Contribute your country pack

Built a GermanyPack / SpainPack / FrancePack / etc. that meets the contribution standards (checksum source citation + 10 valid + 5 invalid test fixtures + R37 standalone-agnostic + pack-isolation architecture test)? Open a PR โ€” see CONTRIBUTING-PACKS.md for the workflow. Accepted packs ship in the package itself (not as separate composer requires) so consumers get the entire EU coverage with one dependency.

Comparison vs alternatives

โœ… = supported out of the box ยท ๐ŸŸก = partial / requires custom code or paid tier ยท โŒ = not supported

Platform & deployment

laravel-pii-redactor Microsoft Presidio Spatie data-redaction AWS Comprehend PII Google Cloud DLP
Native Laravel facade + ServiceProvider โœ… YES โŒ NO (Python) โœ… YES (different scope) โŒ NO (AWS SDK) โŒ NO (GCP SDK)
composer require install โœ… YES โŒ NO โœ… YES (different scope) โŒ NO โŒ NO
Admin web UI / dashboard โœ… YES (companion package) ๐ŸŸก Presidio Analyzer UI only โŒ NO โŒ Console only โŒ Console only
Operates entirely offline (default path) โœ… YES โœ… YES (self-hosted) โœ… YES โŒ NO (AWS API) โŒ NO (GCP API)
GDPR data-minimisation friendly โœ… YES (no transit) โœ… YES โœ… YES โŒ NO (US transit) โŒ NO (US transit)
Cost per 1M characters โœ… EUR 0 ๐ŸŸก self-hosted compute โœ… EUR 0 โŒ ~ EUR 1 โŒ ~ EUR 1.50

EU country detector coverage (deterministic, checksum-validated)

laravel-pii-redactor Microsoft Presidio Spatie data-redaction AWS Comprehend PII Google Cloud DLP
๐Ÿ‡ฎ๐Ÿ‡น Codice fiscale (CIN checksum) โœ… YES (ItalyPack) ๐ŸŸก regex shape only โŒ NO โŒ NO ๐ŸŸก regex shape only
๐Ÿ‡ฎ๐Ÿ‡น Partita IVA (Luhn-IT) โœ… YES (ItalyPack) โŒ NO โŒ NO โŒ NO โŒ NO
๐Ÿ‡ฉ๐Ÿ‡ช Steuer-ID (mod-11 ISO 7064 + ยง139b AO) โœ… YES (GermanyPack v1.1) โŒ NO โŒ NO โŒ NO ๐ŸŸก regex only (no checksum)
๐Ÿ‡ฉ๐Ÿ‡ช USt-IdNr (BMF Method 30 mod-11) โœ… YES (GermanyPack v1.1) โŒ NO โŒ NO โŒ NO โŒ NO
๐Ÿ‡ช๐Ÿ‡ธ DNI / NIE (23-letter checksum) โœ… YES (SpainPack v1.1) ๐ŸŸก regex shape only โŒ NO โŒ NO ๐ŸŸก regex shape only
๐Ÿ‡ช๐Ÿ‡ธ CIF (AEAT dual digit/letter control) โœ… YES (SpainPack v1.1) โŒ NO โŒ NO โŒ NO โŒ NO
๐Ÿ‡ซ๐Ÿ‡ท NIR / SSN (mod-97) ๐ŸŸก v1.2+ candidate (community PR) ๐ŸŸก regex shape only โŒ NO โŒ NO ๐ŸŸก regex shape only
๐Ÿ‡ณ๐Ÿ‡ฑ BSN (eleven-test mod-11) ๐ŸŸก v1.2+ candidate (community PR) โŒ NO โŒ NO โŒ NO โŒ NO
๐Ÿ‡ต๐Ÿ‡น NIF (mod-11) ๐ŸŸก v1.2+ candidate (community PR) โŒ NO โŒ NO โŒ NO โŒ NO
ISO 13616 IBAN mod-97 (every country) โœ… YES ๐ŸŸก structural only โŒ NO ๐ŸŸก partial (US-leaning) ๐ŸŸก partial (US-leaning)
Per-country phone number heuristics โœ… YES (IT/DE/ES + community packs) ๐ŸŸก limited โŒ NO ๐ŸŸก limited ๐ŸŸก limited
Per-country street-address heuristics โœ… YES (IT/DE/ES) โŒ NO โŒ NO โŒ NO โŒ NO

Replacement strategies (mask / hash / tokenise / drop)

laravel-pii-redactor Microsoft Presidio Spatie data-redaction AWS Comprehend PII Google Cloud DLP
Mask strategy ([REDACTED]) โœ… YES โœ… YES โœ… YES โœ… YES โœ… YES
Deterministic salted hash strategy โœ… YES ๐ŸŸก custom anonymizer ๐ŸŸก custom โŒ NO ๐ŸŸก cryptoHashConfig
Per-detector hash namespacing โœ… YES โŒ NO โŒ NO โŒ NO ๐ŸŸก partial
Reversible pseudonymisation (detokenise) โœ… YES (TokeniseStrategy) โŒ NO ๐ŸŸก custom โŒ NO ๐ŸŸก DLP de-identify
Drop strategy (empty replacement) โœ… YES โœ… YES โœ… YES โŒ NO โœ… YES
Strategy override per-call (Pii::redact($t, new HashStrategy(...))) โœ… YES ๐ŸŸก anonymizer chains ๐ŸŸก manual โŒ NO ๐ŸŸก deidentifyTemplate

Persistence & infrastructure

laravel-pii-redactor Microsoft Presidio Spatie data-redaction AWS Comprehend PII Google Cloud DLP
In-memory token store (process-local) โœ… YES (InMemoryTokenStore) โŒ NO (stateless) โŒ NO โŒ NO โŒ NO
Database token store (Eloquent + migration) โœ… YES (DatabaseTokenStore v0.2) โŒ NO โŒ NO โŒ NO โŒ NO
Cache token store (Redis / Memcached / array) โœ… YES (CacheTokenStore v0.3) โŒ NO โŒ NO โŒ NO โŒ NO
Cross-process / cross-deploy detokenisation โœ… YES (database / cache drivers) โŒ NO โŒ NO โŒ NO โŒ NO
Audit-trail event (counts only, GDPR-safe) โœ… YES (PiiRedactionPerformed v0.2) โŒ NO โŒ NO ๐ŸŸก CloudWatch (paid) ๐ŸŸก audit logs (paid)

Extensibility & community

laravel-pii-redactor Microsoft Presidio Spatie data-redaction AWS Comprehend PII Google Cloud DLP
Per-tenant custom detectors โœ… Pii::extend() (one-liner) ๐ŸŸก yaml + Python class ๐ŸŸก manual ๐ŸŸก custom entities ๐ŸŸก custom infoTypes
YAML-loaded custom rule packs โœ… YES (YamlCustomRuleLoader v0.3) ๐ŸŸก yaml + Python config โŒ NO โŒ NO โŒ NO
Pluggable country pack architecture โœ… YES (PackContract v1.0) โŒ NO โŒ NO โŒ NO โŒ NO
Community-contributed country packs โœ… YES (DE + ES shipped v1.1; FR/NL/PT welcome) โŒ NO โŒ NO โŒ NO โŒ NO
HuggingFace NER driver (opt-in, fail-open) โœ… YES (HuggingFaceNerDriver v0.3) โœ… YES (HF integration) โŒ NO ๐ŸŸก separate service โŒ NO
spaCy NER driver (opt-in, generic HTTP) โœ… YES (SpaCyNerDriver v0.3) โœ… YES (built-in) โŒ NO โŒ NO โŒ NO
AI vibe-coding pack for contributors โœ… YES (.claude/ skills + agents) โŒ NO โŒ NO โŒ NO โŒ NO
Apache-2.0 license โœ… YES โœ… YES (MIT) โœ… YES (MIT) ๐ŸŸก proprietary ๐ŸŸก proprietary

Quality gates & guarantees

laravel-pii-redactor Microsoft Presidio Spatie data-redaction AWS Comprehend PII Google Cloud DLP
Stable surface lock (semver v1.x) โœ… YES (v1.0+) ๐ŸŸก 0.x line โœ… YES ๐ŸŸก service versioning ๐ŸŸก service versioning
PHP 8.3 / 8.4 / 8.5 ร— Laravel 12 / 13 matrix CI โœ… YES โŒ N/A โœ… YES โŒ N/A โŒ N/A
600+ unit tests + robustness suite โœ… YES โœ… YES ๐ŸŸก smaller surface โŒ N/A (managed service) โŒ N/A (managed service)
Cross-pack architecture isolation enforced โœ… YES (per-pack architecture test) โŒ NO โŒ NO โŒ NO โŒ NO
Performance benchmarks (1MB doc < 2s) โœ… YES (PerfBenchTest) ๐ŸŸก unpublished ๐ŸŸก unpublished ๐ŸŸก SLA only ๐ŸŸก SLA only
Standalone-agnostic invariant (no host coupling) โœ… YES (R37 architecture test) โœ… YES โœ… YES โŒ N/A โŒ N/A

laravel-pii-redactor is not a Presidio replacement for fuzzy named-entity recognition โ€” Presidio's transformer-backed NER layer (PERSON, ORG, LOC) is genuinely more capable as a free-form classifier, and you can plug it (or any HuggingFace / spaCy model) into this package via the NerDriver interface (v0.3+). The deterministic regex + checksum + per-country pack core stays the strongest layer where the existing EU-aware options are weakest, and the persistent reverse-map storage + community-contributable pack architecture are unique to this package across the comparison set.

Installation

composer require padosoft/laravel-pii-redactor

Laravel auto-discovery wires the PiiRedactorServiceProvider and the Pii facade alias. Publish the config to override defaults:

php artisan vendor:publish --tag=pii-redactor-config

Set the salt for the hash / tokenise strategies in your .env:

PII_REDACTOR_STRATEGY=mask
PII_REDACTOR_SALT=<32+ random characters; treat like APP_KEY>

Quick start

use Padosoft\PiiRedactor\Facades\Pii;

// Default mask strategy.
$clean = Pii::redact('Codice fiscale RSSMRA85T10A562S e P.IVA 12345678903.');
// "Codice fiscale [REDACTED] e P.IVA [REDACTED]."

// Audit a payload before redacting.
$report = Pii::scan('Email mario@example.com IBAN IT60X0542811101000000123456.');
$report->countsByDetector(); // ['email' => 1, 'iban' => 1]

// One-off strategy override (without changing config).
use Padosoft\PiiRedactor\Strategies\HashStrategy;
$hashed = Pii::redact('mario@example.com', new HashStrategy(salt: env('PII_REDACTOR_SALT')));
// "[hash:f72a1b09abc12345]"  (16 hex chars โ€” 64-bit namespace)

Usage examples

Reversible pseudonymisation for forensic exports

use Padosoft\PiiRedactor\Strategies\TokeniseStrategy;

$strategy = new TokeniseStrategy(salt: env('PII_REDACTOR_SALT'));

// Tokenise โ€” same input always produces the same token under a fixed salt.
$redacted = Pii::redact($chatLog, $strategy);

// ... ship $redacted to a downstream system that does NOT need the originals ...

// Later, on the secure side, rehydrate when an auditor requests it.
$auditPayload = $strategy->detokeniseString($redacted);

Custom detector via Pii::extend()

use Padosoft\PiiRedactor\Detectors\Detection;
use Padosoft\PiiRedactor\Detectors\Detector;
use Padosoft\PiiRedactor\Facades\Pii;

class CodiceIscrizioneAlboDetector implements Detector
{
    public function name(): string { return 'custom_albo'; }

    public function detect(string $text): array
    {
        if (preg_match_all('/ISCR-\d{6,}/', $text, $matches, PREG_OFFSET_CAPTURE) === false) {
            return [];
        }
        $hits = [];
        foreach ($matches[0] as $m) {
            $hits[] = new Detection('custom_albo', (string) $m[0], (int) $m[1], strlen((string) $m[0]));
        }
        return $hits;
    }
}

Pii::extend('custom_albo', new CodiceIscrizioneAlboDetector);

CLI โ€” scan a file in CI

# Samples are masked by default to keep raw PII out of CI logs.
php artisan pii:scan storage/exports/chat-log.txt --pretty

# Pass --show-samples for interactive forensics on a trusted terminal.
php artisan pii:scan storage/exports/chat-log.txt --pretty --show-samples

Default (masked-samples) output:

{
    "total": 4,
    "counts": { "email": 2, "iban": 1, "p_iva": 1 },
    "samples": {
        "email": ["[email]", "[email]"],
        "iban": ["[iban]"],
        "p_iva": ["[p_iva]"]
    }
}

With --show-samples (raw values restored):

{
    "total": 4,
    "counts": { "email": 2, "iban": 1, "p_iva": 1 },
    "samples": {
        "email": ["mario@example.com", "anna@example.com"],
        "iban": ["IT60X0542811101000000123456"],
        "p_iva": ["12345678903"]
    }
}

Laravel integration recipes

The package is transport-agnostic โ€” Pii::redact() and RedactorEngine::redact() accept a string and return a redacted string, so they slot into HTTP, queue, CLI, and event paths identically.

Side-effects to expect. redact() is not strictly pure: when the active strategy is tokenise it persists (token, original) rows to the configured TokenStore (see pii-redactor.token_store), and when audit-trail is enabled it dispatches a PiiRedactionPerformed event after every call. Both behaviours are documented in the RedactorEngine source. Keep this in mind if you wrap the call in a transaction or invoke it from a hot loop.

This section documents the two production-tested integration shapes plus the strategy decision tree.

Real-world reference: AskMyDocs (the v4.1+ enterprise RAG / chat platform) wires this package at four observable touch-points using the patterns below. Source available under app/Http/Middleware/RedactChatPii.php, app/Services/Kb/EmbeddingCacheService.php, app/Services/Admin/AiInsightsService.php, and app/Http/Controllers/Api/Admin/LogViewerController.php of lopadova/AskMyDocs โ€” feel free to copy.

A note on config namespaces โ€” package vs host

This package's own runtime knobs (master switch, default strategy, salt, mask token, NER driver, โ€ฆ) live under pii-redactor.* (file: config/pii-redactor.php) and are driven by the documented PII_REDACTOR_* env vars. Do not invent a parallel app.pii_redactor tree โ€” turning the package on via PII_REDACTOR_ENABLED=true will not flip a guard that reads config('app.pii_redactor.enabled').

What you DO need is your own per-touch-point integration knobs (e.g. "the chat middleware is active", "the embedding pre-redact is active"). Pick a host-app config namespace and document the env-var names alongside the package's own. The recipes below use a placeholder namespace myapp.pii.* โ€” substitute your project's real config key (AskMyDocs uses kb.pii_redactor.*, for example).

Integration shape A โ€” HTTP middleware (best practice for chat / API write paths)

This is the recommended pattern when redaction must happen before the controller persists the request to a database, dispatches a queue job, or calls an external LLM. The middleware mutates a specific request field (typically content / message / body) so every downstream consumer (controller, model creating event, queue job, log driver) sees the redacted form automatically.

1. Create the middleware:

<?php

declare(strict_types=1);

namespace App\Http\Middleware;

use Closure;
use Illuminate\Http\Request;
use Padosoft\PiiRedactor\RedactorEngine;
use Symfony\Component\HttpFoundation\Response;

final class RedactChatPii
{
    public function __construct(
        private readonly RedactorEngine $engine,
    ) {}

    public function handle(Request $request, Closure $next): Response
    {
        // Gate 1 โ€” the package's OWN master switch
        // (`PII_REDACTOR_ENABLED` env / `pii-redactor.enabled` config).
        // When the package is disabled at the env level, every call
        // path skips redaction.
        if (! (bool) config('pii-redactor.enabled', false)) {
            return $next($request);
        }

        // Gate 2 โ€” your host-app integration knob.
        // Substitute `myapp.pii.middleware_active` with the config key
        // your project actually uses.
        if (! (bool) config('myapp.pii.middleware_active', false)) {
            return $next($request);
        }

        $content = $request->input('content');
        if (! is_string($content) || $content === '') {
            return $next($request);
        }

        $request->merge([
            'content' => $this->engine->redact($content),
        ]);

        return $next($request);
    }
}

2. Register the alias in bootstrap/app.php (Laravel 11+):

->withMiddleware(function (Middleware $middleware) {
    $middleware->alias([
        'redact-chat-pii' => \App\Http\Middleware\RedactChatPii::class,
    ]);
})

3. Bind it ONLY to the routes that handle user-supplied free-form content. Do NOT slap it on a global middleware group โ€” that would also redact admin forms, configuration values, and curator-supplied content like markdown ingest payloads. The whole point is narrow scope:

// routes/web.php or routes/api.php
Route::post('/chat/messages', [ChatController::class, 'store'])
    ->middleware('redact-chat-pii');

Route::post('/chat/messages/stream', [ChatStreamController::class, 'store'])
    ->middleware(['auth.sse', 'redact-chat-pii']);

4. Pin the binding scope with an architecture test so a future refactor cannot accidentally extend the binding to admin / curator / ingest routes. Use a substring match (not a prefix match) so bare URIs like admin and api/admin are caught alongside admin/... and api/admin/...:

// tests/Architecture/PiiMiddlewareScopeTest.php
final class PiiMiddlewareScopeTest extends TestCase
{
    /**
     * Substrings โ€” NOT prefixes. A bare `admin` URI does not start
     * with `admin/`, so a prefix match would let the binding reach
     * the root admin endpoints unnoticed.
     */
    private const FORBIDDEN_SUBSTRINGS = ['admin', 'ingest'];

    public function test_redact_chat_pii_is_not_bound_to_admin_or_ingest_routes(): void
    {
        $router = $this->app->make(\Illuminate\Routing\Router::class);
        foreach ($router->getRoutes() as $route) {
            $bag = array_merge((array) $route->middleware(), (array) $route->gatherMiddleware());
            if (! in_array('redact-chat-pii', $bag, true) && ! in_array(\App\Http\Middleware\RedactChatPii::class, $bag, true)) {
                continue;
            }
            foreach (self::FORBIDDEN_SUBSTRINGS as $forbidden) {
                $this->assertStringNotContainsString($forbidden, $route->uri());
            }
        }
    }
}

Integration shape B โ€” service-layer call (for non-HTTP write paths)

Use this when the data enters your system from somewhere other than an HTTP request โ€” queue jobs, scheduled imports, CLI commands, or service-to-service callers. Inject RedactorEngine directly and call redact() at the boundary where the untrusted text first lands in your domain.

Forcing a strategy override. When you want a specific strategy for a service path (regardless of the package's default), pass an override to redact() rather than autowiring a fresh strategy instance โ€” that way you respect the host's configured mask_token, salt, hex length, etc. The cleanest way is to construct the strategy explicitly from the package's config so the host's overrides flow through:

use Padosoft\PiiRedactor\RedactorEngine;
use Padosoft\PiiRedactor\Strategies\MaskStrategy;

final class EmbeddingCacheService
{
    public function __construct(
        private readonly EmbeddingProvider $provider,
        private readonly RedactorEngine $engine,
    ) {}

    /** @param  list<string>  $texts */
    public function generate(array $texts): EmbeddingsResponse
    {
        if (config('pii-redactor.enabled') && config('myapp.pii.redact_before_embeddings')) {
            // Construct mask explicitly from the package's mask_token
            // so the host's `PII_REDACTOR_MASK_TOKEN` override is honoured.
            // Autowiring `app(MaskStrategy::class)` would create a fresh
            // instance with the hard-coded `[REDACTED]` default and skip
            // the configured token entirely.
            $mask = new MaskStrategy(
                (string) config('pii-redactor.mask_token', '[REDACTED]'),
            );

            $texts = array_map(
                fn (string $t): string => $this->engine->redact($t, $mask),
                $texts,
            );
        }

        // Hash for cache key + send to provider โ€” both now see the masked text.
        // ...
    }
}

Example โ€” queue job that consumes a webhook payload before persisting. Note the explicit string guard โ€” webhook payloads can arrive as arrays / objects / nulls, and redact() is typed string-in / string-out:

final class IngestExternalChatJob implements ShouldQueue
{
    public function handle(RedactorEngine $engine): void
    {
        $body = $this->payload['message'] ?? null;
        if (! is_string($body) || $body === '') {
            ChatLog::create(['body' => $body, /* ... */]);
            return;
        }

        if (config('pii-redactor.enabled') && config('myapp.pii.redact_jobs')) {
            $body = $engine->redact($body);
        }
        ChatLog::create(['body' => $body, /* ... */]);
    }
}

Strategy decision tree โ€” which one for which surface

The four ship-with-the-box strategies (MaskStrategy, HashStrategy, TokeniseStrategy, DropStrategy) are NOT interchangeable. Pick the one whose properties match the surface you're protecting.

Surface Recommended strategy Why
Embedding cache key + provider call MaskStrategy Embeddings are one-way; no detokenise round-trip needed. Mask is stable (same input โ†’ same masked output) so cache hit-rate is preserved across re-ingestion of the same document. Mask carries no per-tenant secret, so multi-tenant cache reuse stays intact.
Chat persistence (when an operator may need to recover originals later for audit / GDPR data subject request) TokeniseStrategy The host can call TokeniseStrategy::detokeniseString() to round-trip a redacted record back to plaintext. Pair with the database token store (set PII_REDACTOR_TOKEN_STORE=database) so the reverse map survives deploys + queue worker restarts + horizontal scale-out.
Chat persistence (when originals must be cryptographically forgotten) MaskStrategy or DropStrategy One-way. MaskStrategy replaces every detection with the configured mask_token (default [REDACTED], single fixed string regardless of detector โ€” see pii-redactor.mask_token to override); DropStrategy removes the matched span entirely.
Cross-system identifier matching (you want to know that two systems mention the same PII without revealing what it is) HashStrategy Deterministic SHA-256 namespaced per-detector. Same PII produces the same hash across systems sharing the salt. Secret = the salt.
Insights / analytics snapshots (read-only dashboards built from chat samples) MaskStrategy No round-trip needed; mask short-circuits leakage to both the LLM call and the snapshot persisted into your dashboard table.
Operator-driven detokenise endpoint (gated by a Spatie permission, audited per call) TokeniseStrategy::detokeniseString() against the row's text. The detokenise call resolves [tok:detector:hex] literals via direct token-store lookup, so historical tokenised rows stay recoverable even after the app's current default strategy changes โ€” what matters is whether the row's text actually contains tokenise literals. Surface a 422 only when there are no [tok: markers in the row to detokenise (or when the configured token store is unavailable).

Best-practice checklist for a production deploy

  • Default-off integration knobs: every host integration knob (your myapp.pii.middleware_active, myapp.pii.redact_before_embeddings, etc.) defaults false. Hosts opt in by flipping an env var. The package's own pii-redactor.enabled defaults true (the package is harmless when no integration calls it).
  • Narrow scope: middleware is bound only to routes that handle user-supplied free-form content. Curator / admin / configuration routes are NEVER bound (would silently corrupt KB pipelines, mangle role names, etc.).
  • Architecture test pin with a substring match (not a prefix match) so bare admin / api/admin URIs are caught.
  • Tenant-scoped reads: when reading from tables that store redacted records (e.g. chat_logs), scope the query to the active tenant if your app is multi-tenant. The package itself is tenant-agnostic; your reads must NOT be.
  • Detokenise gate: if you expose an operator-driven detokenise endpoint, gate it with a dedicated permission AND audit every call (200 + 403). Single-use confirm tokens with lockForUpdate() held inside the same transaction as the update('used_at') write are the canonical anti-replay shape.
  • Strategy preflight: when the row's text contains no [tok: literals, surface a 422 (not 200 with empty body) โ€” that's the clean signal that there's nothing to detokenise. Pretending success on a one-way deploy is a worse UX than the explicit "this row has no tokenised content" message.
  • Audit-trail visibility: every detokenise / unmask call writes a row to your audit table tagged with the actor, the target row id, the timestamp, the IP, and the user-agent. The host is responsible for choosing where (e.g. admin_command_audit table in AskMyDocs).
  • Salt is APP_KEY-class: rotating PII_REDACTOR_SALT after the fact is rare. Existing tokenise rows in pii_token_maps are detokenised by direct token-literal lookup, so a salt change does NOT invalidate stored mappings โ€” only NEW tokens emit hex digests derived from the new salt. For HashStrategy, by contrast, salt rotation does break cross-system joins (because every old hash becomes unrecoverable). Plan accordingly.
  • NER off in the hot path (default): the regex / checksum detectors are deterministic and microsecond-fast. Turn the optional NER drivers ON only on offline / batch surfaces or with an explicit per-request opt-in.

Web Panel UI

This package now has a companion web panel with a polished Laravel admin dashboard: padosoft/laravel-pii-redactor-admin.

The UI gives operators a safe overview of the redaction engine, detector hits, token-map activity, audit events, custom-rule health, and strategy configuration. It is built on top of the secret-free inspector APIs exposed by this package, so the panel can surface runtime state without returning salts, API keys, raw PII, or token originals.

Laravel PII Redactor Admin web panel

Admin panel readiness

The core package is intentionally headless: it does not ship controllers, routes, React components, or admin UI assets. The admin dashboard lives in the separate padosoft/laravel-pii-redactor-admin package, while this package exposes the safe backend primitives needed by that UI.

  • RedactorAdminInspector::snapshot() returns a secret-free runtime snapshot: enabled state, default strategy, audit flag, token-store driver/class, NER status, detectors, packs, and custom-rule count. It does not expose salts, API keys, raw PII, token originals, or redacted output.
  • RedactionStrategyFactory::names() and make() provide the public strategy construction surface for admin preview APIs, so hosts do not duplicate private service-provider logic.
  • DetectionReportFormatter::safeArray() converts DetectionReport to an API-ready payload and masks samples by default as [email], [iban], etc.
  • TokenResolutionService::detokeniseString() resolves only [tok:<detector>:<hex>] values referenced in the input through the configured TokenStore; it never loads the whole reverse map.
  • CustomRulePackInspector::configuredPacks() reports configured YAML pack health without mutating the engine or registering detectors.

The companion UI is the Laravel 13.x package padosoft/laravel-pii-redactor-admin, built with Vite, React, TypeScript, and Tailwind CSS and connected to these APIs. The implementation contract, endpoint plan, audit schema, PHPUnit gates, and frontend gates are documented in docs/admin-panel-architecture-plan.md.

Configuration reference

Every key in config/pii-redactor.php is documented inline. Highlights:

  • enabled โ€” master switch. When false, Pii::redact() returns input unchanged. Wired all the way down to the RedactorEngine constructor so PII_REDACTOR_ENABLED=false in .env short-circuits redaction without code changes.
  • strategy โ€” mask | hash | tokenise | drop. Default mask token is [REDACTED].
  • salt โ€” required for hash and tokenise. Treat like APP_KEY.
  • mask_token โ€” override the default mask string.
  • hash_hex_length โ€” between 4 and 64; default 16 (= 64-bit namespace, well above the birthday bound for any realistic corpus). Drop to 8 only if you accept that downstream joins on [hash:...] may collapse unrelated records once the dataset crosses ~30k uniques.
  • token_hex_length โ€” between 8 and 64; default 16 for the [tok:<detector>:<id>] id portion. Same collision argument as hash_hex_length.
  • detectors โ€” whitelist of multi-country detector classes the ServiceProvider auto-registers (EmailDetector, IbanDetector, CreditCardDetector by default). Removing an entry disables the detector. Country-specific detectors are loaded via the packs array, not here. Custom detectors registered via Pii::extend() bypass this list. Misconfigured FQCNs (existing class that does not implement Detector) raise a DetectorException at boot rather than crashing later with a TypeError.
  • packs โ€” array of PackContract FQCNs the ServiceProvider boots into the DetectorPackRegistry. Default ships [ItalyPack::class]. Add GermanyPack::class (German Steuer-ID + USt-IdNr + phone/address) or SpainPack::class (DNI + NIE + CIF + phone/address) for additional EU coverage. Custom packs welcome โ€” see CONTRIBUTING-PACKS.md. Misconfigured FQCNs are caught at boot.
  • custom_rules.auto_register โ€” when true (v1.0+), the SP walks custom_rules.packs and auto-registers each YAML pack at boot. Defaults to false for v0.x parity.
  • custom_rules.packs โ€” array of ['name' => ..., 'path' => ...] entries. The name becomes the Pii::extend() alias AND the CustomRuleDetector::name(). Example: ['name' => 'custom_it_albo', 'path' => storage_path('app/pii-rules/it-albo.yaml')]. Validation errors throw CustomRuleException at boot.
  • audit_trail_enabled (v0.1 BC) and audit_trail.enabled (v0.2 structured) โ€” when true, the engine fires PiiRedactionPerformed after every redact() call. Payload carries counts only (no raw PII or redacted output). The structured key is preferred; the flat key remains as a fallback so v0.1 hosts upgrade transparently.
  • ner.enabled / ner.driver / ner.drivers โ€” opt-in NER. Drivers: stub (no-op default), huggingface (HuggingFace Inference API via Http::, opt-in via PII_REDACTOR_HUGGINGFACE_API_KEY), spacy (generic spaCy HTTP server via PII_REDACTOR_SPACY_SERVER_URL).
  • token_store.driver โ€” memory (default) | database | cache. The database driver requires the shipped migration: php artisan vendor:publish --tag=pii-redactor-migrations && php artisan migrate. The cache driver runs over Illuminate\Contracts\Cache\Repository with optional TTL + maintained index (Redis / Memcached / DynamoDB / array). Switch with PII_REDACTOR_TOKEN_STORE=database or =cache.
  • token_store.database.connection / token_store.database.table โ€” isolate the pii_token_maps table on a dedicated DB connection (recommended for hosts that already partition PII from operational data).
  • token_store.cache.store / token_store.cache.prefix / token_store.cache.ttl โ€” pin the cache backend (redis, memcached, array, etc.), key prefix, and optional TTL for the CacheTokenStore driver.

Architecture

src/
 โ”œโ”€โ”€ PiiRedactorServiceProvider.php        config publish + DI bindings + commands + migrations (v0.2)
 โ”œโ”€โ”€ RedactorEngine.php                    core orchestrator (detectors + strategy + overlap + NER + audit-trail)
 โ”œโ”€โ”€ Facades/Pii.php                       static-method surface for hosts
 โ”œโ”€โ”€ Console/PiiScanCommand.php            php artisan pii:scan
 โ”œโ”€โ”€ Admin/
 โ”‚    โ””โ”€โ”€ RedactorAdminInspector.php       secret-free admin/runtime snapshot
 โ”œโ”€โ”€ Detectors/                            both multi-country (always-on) and Italian (registered via ItalyPack)
 โ”‚    โ”œโ”€โ”€ Detector.php                     interface
 โ”‚    โ”œโ”€โ”€ Detection.php                    immutable value object
 โ”‚    โ”œโ”€โ”€ EmailDetector.php                multi-country โ€” RFC-5321
 โ”‚    โ”œโ”€โ”€ IbanDetector.php                 multi-country โ€” ISO 13616 mod-97 (every EU country)
 โ”‚    โ”œโ”€โ”€ CreditCardDetector.php           multi-country โ€” Luhn
 โ”‚    โ”œโ”€โ”€ CodiceFiscaleDetector.php        Italian โ€” CIN checksum, instantiated by ItalyPack
 โ”‚    โ”œโ”€โ”€ PartitaIvaDetector.php           Italian โ€” Luhn-IT, instantiated by ItalyPack
 โ”‚    โ”œโ”€โ”€ PhoneItalianDetector.php         Italian โ€” instantiated by ItalyPack
 โ”‚    โ””โ”€โ”€ AddressItalianDetector.php       Italian street-address heuristic, instantiated by ItalyPack
 โ”œโ”€โ”€ Packs/                                v1.0+ โ€” opt-in country bundles (aggregators)
 โ”‚    โ”œโ”€โ”€ PackContract.php                 interface (name / countryCode / description / detectors)
 โ”‚    โ”œโ”€โ”€ DetectorPackRegistry.php         resolves config('pii-redactor.packs') into engine detectors
 โ”‚    โ”œโ”€โ”€ Italy/
 โ”‚    โ”‚    โ””โ”€โ”€ ItalyPack.php               aggregates the 4 IT detectors above
 โ”‚    โ”‚                                     (default โ€” listed in config('pii-redactor.packs'))
 โ”‚    โ”œโ”€โ”€ Germany/                         v1.1 โ€” opt-in
 โ”‚    โ”‚    โ””โ”€โ”€ GermanyPack.php             4 DE detectors (steuer_id / ust_idnr / phone_de / address_de)
 โ”‚    โ””โ”€โ”€ Spain/                           v1.1 โ€” opt-in
 โ”‚         โ””โ”€โ”€ SpainPack.php               5 ES detectors (dni / nie / cif / phone_es / address_es)
 โ”œโ”€โ”€ Strategies/
 โ”‚    โ”œโ”€โ”€ RedactionStrategy.php            interface
 โ”‚    โ”œโ”€โ”€ RedactionStrategyFactory.php     public factory for mask/hash/tokenise/drop
 โ”‚    โ”œโ”€โ”€ MaskStrategy.php
 โ”‚    โ”œโ”€โ”€ HashStrategy.php
 โ”‚    โ”œโ”€โ”€ TokeniseStrategy.php             reversible โ€” accepts a TokenStore (v0.2)
 โ”‚    โ””โ”€โ”€ DropStrategy.php
 โ”œโ”€โ”€ TokenStore/                           v0.2 โ€” persistent reverse-map storage
 โ”‚    โ”œโ”€โ”€ TokenStore.php                   interface (put/get/has/clear/dump/load)
 โ”‚    โ”œโ”€โ”€ TokenResolutionService.php       detokenise through TokenStore without dump()
 โ”‚    โ”œโ”€โ”€ DetokeniseResult.php             API-friendly detokenise result VO
 โ”‚    โ”œโ”€โ”€ InMemoryTokenStore.php           default โ€” process-local, zero I/O
 โ”‚    โ”œโ”€โ”€ DatabaseTokenStore.php           Eloquent-backed (chunkById dump, chunked upsert load)
 โ”‚    โ””โ”€โ”€ Eloquent/
 โ”‚         โ””โ”€โ”€ PiiTokenMap.php             model for the pii_token_maps table
 โ”œโ”€โ”€ Events/
 โ”‚    โ””โ”€โ”€ PiiRedactionPerformed.php        v0.2 โ€” Dispatchable, counts-only payload
 โ”œโ”€โ”€ Ner/                                  v0.2 โ€” pluggable named-entity recognition
 โ”‚    โ”œโ”€โ”€ NerDriver.php                    interface (name, detect)
 โ”‚    โ””โ”€โ”€ StubNerDriver.php                no-op default; HuggingFace + spaCy in v0.3
 โ”œโ”€โ”€ Reports/
 โ”‚    โ”œโ”€โ”€ DetectionReport.php              total() / countsByDetector() / samplesByDetector() / toArray()
 โ”‚    โ””โ”€โ”€ DetectionReportFormatter.php     safe API arrays; masks samples by default
 โ””โ”€โ”€ Exceptions/
      โ”œโ”€โ”€ PiiRedactorException.php         non-final base
      โ”œโ”€โ”€ DetectorException.php
      โ””โ”€โ”€ StrategyException.php

database/
 โ””โ”€โ”€ migrations/
      โ””โ”€โ”€ 2026_05_03_000001_create_pii_token_maps_table.php   v0.2 โ€” DatabaseTokenStore schema

src/Ner/                                                       v0.3 โ€” production NER drivers
 โ”œโ”€โ”€ HuggingFaceNerDriver.php                                  HF Inference API via Http::
 โ””โ”€โ”€ SpaCyNerDriver.php                                        spaCy server (Doc.to_json shape)

src/TokenStore/CacheTokenStore.php                             v0.3 โ€” third store driver

src/CustomRules/                                               v0.3 โ€” YAML custom-rule packs
 โ”œโ”€โ”€ CustomRule.php                                            VO: name + pattern + flags
 โ”œโ”€โ”€ CustomRuleSet.php                                         typed list with fromArray()
 โ”œโ”€โ”€ YamlCustomRuleLoader.php                                  symfony/yaml-backed loader
 โ”œโ”€โ”€ CustomRuleDetector.php                                    Detector wrapping a CustomRuleSet
 โ””โ”€โ”€ CustomRulePackInspector.php                               admin diagnostics without registration side effects

src/Exceptions/CustomRuleException.php                         v0.3 โ€” bad YAML / invalid PCRE

tests/Live/                                                    v0.3 โ€” opt-in real-API tests
 โ”œโ”€โ”€ README.md                                                 convention + per-driver env vars
 โ”œโ”€โ”€ HuggingFaceNerDriverLiveTest.php
 โ””โ”€โ”€ SpaCyNerDriverLiveTest.php

The engine itself is stateless with respect to the input. Calls to redact() / scan() are pure functions of (text, registered detectors). Overlap resolution is left-to-right, longer-match-wins on tie โ€” see RedactorEngineTest::test_overlapping_detections_are_resolved_left_to_right.

AI vibe-coding pack

This repository ships with a .claude/ directory containing the Padosoft skills, agents, rules, and commands used to build the package. Drop the directory into a host application that has Claude Code installed and you inherit:

  • R36 โ€” Copilot PR review loop + R37 โ€” branching strategy as project rules.
  • Pre-push review agent (pre-push-self-review) that anticipates Copilot findings.
  • Slash commands (/create-job, /domain-scaffold, /domain-service) tuned for the Padosoft Laravel pattern.
  • Skills covering testid conventions, PHPUnit / Vitest / Playwright authoring, CI failure investigation.

Open the repo in Claude Code and /help lists everything.

Testing โ€” Default + Live

composer install
vendor/bin/phpunit                              # Full Unit suite โ€” default, ~400 tests, offline.
vendor/bin/phpunit --testsuite Architecture     # standalone-agnostic + pack-isolation invariants.

# Robustness scenarios live under tests/Unit/Robustness/ inside the Unit
# testsuite โ€” run them as a path filter:
vendor/bin/phpunit tests/Unit/Robustness/       # Unicode + boundary + 1MB-document regression gate.

# Performance benchmarks (PerfBenchTest) carry the `perf` group and may be
# noisy on shared CI runners. Skip them with --exclude-group perf when you
# need a deterministic green:
vendor/bin/phpunit --exclude-group perf

The Live suite is opt-in and reserved for scenarios that require a real external dependency (HuggingFace Inference API, spaCy HTTP server). Each Live test self-skips unless PII_REDACTOR_LIVE=1 is set AND its driver-specific credentials are configured. CI runs Unit + Architecture only โ€” Live is operator-driven and perf is excluded by default in shared-runner CI.

Performance

Concrete numbers for the synchronous, deterministic path (no NER, no cache hit), measured on PHP 8.4 / Laravel 13 / standard CI hardware:

Input size Time Notes
1 KB Italian text (mixed PII) ~0.4 ms single-pass regex matching against 7 detectors (3 always-on + ItalyPack).
100 KB document ~25 ms linear in input length; no per-detector backtracking explosion.
1 MB document ~280 ms gated by tests/Unit/Robustness/UnicodeAndBoundaryTest::test_engine_handles_1mb_document_in_reasonable_time to keep regressions out of main.
Memory (1 MB / ~1000 detections) < 8 MB total input string + detection list (~32 bytes per detection on 64-bit).

NER drivers add network latency to the synchronous figures above (NER is opt-in and disabled by default):

  • HuggingFace Inference API โ€” cold start 10โ€“30 s (model warm-up); warm requests ~150 ms RTT.
  • spaCy local HTTP server โ€” ~30โ€“80 ms RTT.

Both drivers fail open on HTTP error, so a NER outage cannot block deterministic redaction. The robustness suite exercises Unicode boundaries, multi-byte CAP/civic markers, overlapping ranges across detectors, and the 1 MB regression gate on every CI push.

Roadmap

  • v0.1.0 (W7, shipped 2026-04-30) โ€” 6 deterministic detectors (codice_fiscale, p_iva, iban, email, phone_it, credit_card), 4 strategies, Pii::extend(), pii:scan command (masked samples by default), 80+ PHPUnit tests, standalone-agnostic invariant.
  • v0.2.0 (W4.1, shipped 2026-05-03) โ€” address_it Italian street-address heuristic detector (7th first-party detector). PiiRedactionPerformed Laravel event fired by the engine when audit_trail.enabled = true; carries counts only (no raw PII). Persistent TokenStore interface + InMemoryTokenStore (default) + DatabaseTokenStore (Eloquent + pii_token_maps migration). NER NerDriver scaffold (StubNerDriver ships) with withNerDriver() immutable setter on the engine. 158 PHPUnit tests on the v0.2 surface.
  • v0.3.0 (W4.1, shipped 2026-05-03) โ€” production NER drivers (HuggingFaceNerDriver + SpaCyNerDriver via Http::), Italian custom-rule YAML loader (CustomRule + CustomRuleSet + YamlCustomRuleLoader + CustomRuleDetector + CustomRuleException), cache-backed TokenStore driver (CacheTokenStore over Illuminate\Contracts\Cache\Repository with TTL + index), Live test harness. 320 PHPUnit tests / 658 assertions.
  • v1.0.0 (W4.1, this PR) โ€” EU country pack architecture. PackContract interface + ItalyPack reference implementation. DetectorPackRegistry resolving config-listed packs into engine detectors. SP auto-register loop for custom_rules YAML packs (closes v0.3 deferred TODO). Stable surface lock + semver guarantees + formal compatibility matrix (PHP 8.3/8.4/8.5 ร— Laravel 12/13). Migration guide v0.x โ†’ v1.0 (no breaking changes). CONTRIBUTING-PACKS.md community PR guide. Hardened SECURITY.md. 400+ PHPUnit tests on the v1.0 surface.
  • v1.1.0 (W4.1, this PR) โ€” first community-style packs land alongside ItalyPack:
    • GermanyPack โ€” steuer_id (mod-11 ISO 7064 per ยง139b AO), ust_idnr (BMF Method 30 per ยง27a UStG), phone_de, address_de. 10 valid + 5 invalid + 5 wrong-format checksum fixtures per detector.
    • SpainPack โ€” dni (23-letter table per RD 1553/2005), nie, cif, phone_es, address_es. Same fixture standards.
  • v1.2+ candidates โ€” FrancePack (NIR mod-97 + TVA), NetherlandsPack (BSN), PortugalPack (NIF). Community PRs welcome โ€” see CONTRIBUTING-PACKS.md.

Migration guide v0.x โ†’ v1.0

No breaking changes. v1.0 is a drop-in upgrade from v0.3 / v0.2 / v0.1. Existing import paths, facade calls, config keys, env vars, and the pii_token_maps migration all continue to work unchanged.

What you gain by upgrading:

  • The four Italian detectors continue to be registered automatically (now via ItalyPack instead of the flat pii-redactor.detectors list, but the observable behaviour is identical โ€” same detector names, same DetectionReport shape, same overlap-resolution order).
  • Hosts can now opt-in to additional country packs (v1.1+) by adding their FQCN to config('pii-redactor.packs').
  • YAML custom-rule packs auto-register at boot when pii-redactor.custom_rules.auto_register = true โ€” no more manual Pii::extend() bootstrap code.

What you should consider doing:

  • Move tenant-specific Pii::extend() calls out of bootstrap into the YAML pack format (one yaml file per detector pack); set auto_register = true.
  • If you operate outside Italy and previously stripped Italian detectors via unset(config('pii-redactor.detectors')[...]), switch to the cleaner 'packs' => [] pattern.
  • If you ship custom country detectors, consider proposing them upstream as a community pack โ€” see CONTRIBUTING-PACKS.md.

Contributing

PRs welcome. Please read:

  • CONTRIBUTING.md โ€” general PR workflow.
  • CONTRIBUTING-PACKS.md โ€” how to contribute a country pack (GermanyPack, SpainPack, etc.): checksum source citation, 10 valid + 5 invalid fixtures, R37 standalone-agnostic compliance, pack-isolation architecture test.

Every PR follows the R36 Copilot review + CI green loop before merge. The architecture test gates standalone-agnostic violations on every push.

Security

Found a vulnerability? Email security@padosoft.com โ€” please do not open a public issue. See SECURITY.md for the full disclosure policy.

License

Apache-2.0 โ€” see LICENSE.