weichie-com / blur
PII detection and de-identification SDK for BeNeLux (Belgium, Netherlands, Luxembourg)
Requires
- php: >=8.1
- ext-mbstring: *
- ext-openssl: *
- giggsey/libphonenumber-for-php: ^8.13
Requires (Dev)
- phpunit/phpunit: ^10.0
README
A data protection and de-identification SDK for BeNeLux (Belgium, Netherlands, Luxembourg) and US identifiers, inspired by Microsoft Presidio.
Features
- Pattern-based PII Detection: Fast and accurate entity recognition using regex patterns
- Full Validation: Checksum validation (Luhn, mod-97, 11-proof) for high accuracy
- BeNeLux-Specific Recognizers:
- π³π± Dutch BSN (Burgerservicenummer) with 11-proof validation
- π§πͺ Belgian National Number with mod-97 validation
- π±πΊ Luxembourg National ID
- BeNeLux IBAN codes with mod-97 checksum
- Phone numbers for BE/NL/LU (using libphonenumber)
- US-Specific Recognizers:
- πΊπΈ Social Security Number (SSN) with area/group/serial validation
- πΊπΈ Individual Taxpayer ID (ITIN)
- πΊπΈ Passport Number (traditional + next-gen)
- πΊπΈ Driver License (multi-state formats)
- πΊπΈ Bank Account Number
- πΊπΈ ABA Routing Number with checksum validation
- Generic Recognizers: Email, Credit Card (Luhn), IP Address, URL
- Multiple Anonymization Strategies:
- Replace with custom values
- Redact (remove completely)
- Mask (partial or full)
- Hash (SHA-256/SHA-512)
- Encrypt/Decrypt (AES-256-CBC)
- Context Enhancement: Boost detection confidence with contextual keywords
- UTF-8 Support: Full multibyte string handling
- Type-Safe: Built with PHP 8.1+ strict types
Installation
Install via Composer:
composer require weichie-com/blur
Or add to your composer.json:
{
"require": {
"weichie-com/blur": "^1.0"
}
}
Requirements
- PHP 8.1+ (for strict types and named parameters)
- ext-mbstring: Multibyte string support (UTF-8)
- ext-openssl: AES encryption support
- giggsey/libphonenumber-for-php: Phone number validation (auto-installed)
Quick Start
<?php require_once 'vendor/autoload.php'; use Weichie\Blur\Analyzer\AnalyzerEngine; use Weichie\Blur\Analyzer\RecognizerRegistry; use Weichie\Blur\Analyzer\Recognizers\BeNeLux\BsnRecognizer; use Weichie\Blur\Anonymizer\AnonymizerEngine; use Weichie\Blur\Anonymizer\Models\OperatorConfig; use Weichie\Blur\Anonymizer\Operators\MaskOperator; // 1. Setup Analyzer $registry = new RecognizerRegistry(); $registry->addRecognizer(new BsnRecognizer()); $analyzer = new AnalyzerEngine($registry); // 2. Analyze text $text = "Het BSN nummer is 111222333 voor deze klant."; $results = $analyzer->analyze($text, language: 'nl'); // 3. Setup Anonymizer $anonymizer = new AnonymizerEngine(); $anonymizer->addOperator(new MaskOperator()); // 4. Anonymize $operators = [ 'NL_BSN' => OperatorConfig::mask('*', 6) ]; $anonymized = $anonymizer->anonymize($text, $results, $operators); echo $anonymized->getText(); // Output: "Het BSN nummer is ******333 voor deze klant."
Usage Examples
1. Detecting BeNeLux National IDs
use Weichie\Blur\Analyzer\Recognizers\BeNeLux\BsnRecognizer; use Weichie\Blur\Analyzer\Recognizers\BeNeLux\BelgianNationalNumberRecognizer; use Weichie\Blur\Analyzer\Recognizers\BeNeLux\LuxembourgNationalIdRecognizer; $registry = new RecognizerRegistry(); $registry->addRecognizer(new BsnRecognizer()); // Dutch BSN $registry->addRecognizer(new BelgianNationalNumberRecognizer()); // Belgian National Number $registry->addRecognizer(new LuxembourgNationalIdRecognizer()); // Luxembourg National ID $analyzer = new AnalyzerEngine($registry); $text = "BSN: 111222333, BE National: 85.07.30-033.61, LU ID: 1990030112345"; $results = $analyzer->analyze($text, language: 'nl'); foreach ($results as $result) { echo "{$result->entityType}: score {$result->score}\n"; }
2. Detecting IBAN Codes
use Weichie\Blur\Analyzer\Recognizers\Generic\IbanRecognizer; $registry = new RecognizerRegistry(); $registry->addRecognizer(new IbanRecognizer()); $analyzer = new AnalyzerEngine($registry); $text = "IBAN: NL91ABNA0417164300 (Netherlands), BE68539007547034 (Belgium)"; $results = $analyzer->analyze($text);
3. Multiple Anonymization Strategies
// Strategy 1: Replace with labels $operators = [ 'NL_BSN' => OperatorConfig::replace('[BSN-REDACTED]'), 'EMAIL_ADDRESS' => OperatorConfig::replace('[EMAIL]'), ]; // Strategy 2: Partial masking $operators = [ 'NL_BSN' => OperatorConfig::mask('*', 6, false), // Mask first 6 chars 'CREDIT_CARD' => OperatorConfig::mask('*', 12, false), // Mask first 12 chars ]; // Strategy 3: Complete redaction $operators = [ 'DEFAULT' => OperatorConfig::redact(), // Remove all detected entities ]; // Strategy 4: Hashing for consistency $operators = [ 'NL_BSN' => OperatorConfig::hash('sha256'), 'IBAN_CODE' => OperatorConfig::hash('sha256'), ]; // Strategy 5: Encryption (reversible) $key = 'your-secret-key'; $operators = [ 'NL_BSN' => OperatorConfig::encrypt($key), 'BE_NATIONAL_NUMBER' => OperatorConfig::encrypt($key), ];
4. Context Enhancement
Boost detection confidence when context keywords appear near entities:
$text = "Het BSN nummer is 111222333 voor deze klant."; $results = $analyzer->analyze( text: $text, language: 'nl', context: ['bsn', 'nummer', 'klant'], // Boost score when these words are nearby scoreThreshold: 0.3 ); // The BSN will have a higher confidence score due to context words
5. Entity Filtering
Detect only specific entity types:
$results = $analyzer->analyze( text: $text, language: 'nl', entities: ['NL_BSN', 'EMAIL_ADDRESS'] // Only detect these types );
6. Allow List
Whitelist specific values to ignore:
$results = $analyzer->analyze( text: $text, language: 'nl', allowList: ['test@example.com', '111222333'] // Ignore these values );
Supported Recognizers
BeNeLux-Specific
| Entity Type | Description | Validation | Country |
|---|---|---|---|
NL_BSN |
Dutch Burgerservicenummer | 11-proof checksum | π³π± NL |
BE_NATIONAL_NUMBER |
Belgian National Number | mod-97 checksum | π§πͺ BE |
LU_NATIONAL_ID |
Luxembourg National ID | Date validation | π±πΊ LU |
IBAN_CODE |
IBAN (BE/NL/LU) | mod-97 checksum | π§πͺπ³π±π±πΊ |
PHONE_NUMBER |
Phone numbers | libphonenumber | π§πͺπ³π±π±πΊ |
US-Specific
| Entity Type | Description | Validation | Country |
|---|---|---|---|
US_SSN |
Social Security Number | Area/group/serial rules | πΊπΈ US |
US_ITIN |
Individual Taxpayer ID | Format + digit ranges | πΊπΈ US |
US_PASSPORT |
Passport Number | Pattern (context-boosted) | πΊπΈ US |
US_DRIVER_LICENSE |
Driver License | Multi-state patterns | πΊπΈ US |
US_BANK_NUMBER |
Bank Account Number | Pattern (context-boosted) | πΊπΈ US |
US_ABA_ROUTING |
ABA Routing Number | Weighted checksum (mod 10) | πΊπΈ US |
Generic
| Entity Type | Description | Validation |
|---|---|---|
EMAIL_ADDRESS |
Email addresses | RFC validation |
CREDIT_CARD |
Credit card numbers | Luhn checksum |
IP_ADDRESS |
IPv4/IPv6 addresses | IP validation |
URL |
URLs | URL validation |
Supported Operators
| Operator | Description | Parameters |
|---|---|---|
replace |
Replace with custom value | new_value |
redact |
Remove completely | None |
mask |
Partial/full masking | masking_char, chars_to_mask, from_end |
hash |
SHA-256/SHA-512 hashing | algorithm (default: sha256) |
encrypt |
AES-256-CBC encryption | key |
decrypt |
AES-256-CBC decryption | key |
Validation Algorithms
Luhn Checksum (Credit Cards)
Used to validate credit card numbers. Prevents false positives from random digit sequences.
Mod-97 Checksum (IBAN, Belgian National Number)
ISO 7064 mod-97 algorithm for IBAN codes and Belgian National Numbers.
11-Proof Checksum (Dutch BSN)
Dutch "elfproef" (11-check) algorithm for validating BSN numbers.
ABA Routing Checksum (US ABA Routing)
Weighted sum mod-10 algorithm (weights: 3, 7, 1) for validating US ABA routing numbers.
Architecture
Weichie\Blur\
βββ Analyzer/
β βββ AnalyzerEngine.php # Main detection orchestrator
β βββ EntityRecognizer.php # Base recognizer interface
β βββ PatternRecognizer.php # Pattern-based recognition
β βββ RecognizerRegistry.php # Recognizer management
β βββ Recognizers/
β β βββ Generic/ # Universal recognizers
β β β βββ EmailRecognizer.php
β β β βββ CreditCardRecognizer.php
β β β βββ IpRecognizer.php
β β β βββ UrlRecognizer.php
β β β βββ IbanRecognizer.php
β β β βββ PhoneRecognizer.php
β β βββ BeNeLux/ # BeNeLux-specific
β β β βββ BsnRecognizer.php
β β β βββ BelgianNationalNumberRecognizer.php
β β β βββ LuxembourgNationalIdRecognizer.php
β β βββ US/ # US-specific
β β βββ UsSsnRecognizer.php
β β βββ UsItinRecognizer.php
β β βββ UsPassportRecognizer.php
β β βββ UsDriverLicenseRecognizer.php
β β βββ UsBankRecognizer.php
β β βββ AbaRoutingRecognizer.php
β βββ Models/
β βββ RecognizerResult.php
β βββ Pattern.php
βββ Anonymizer/
βββ AnonymizerEngine.php # Main anonymization orchestrator
βββ Operator.php # Base operator interface
βββ TextReplaceBuilder.php # Text manipulation
βββ Operators/
β βββ ReplaceOperator.php
β βββ RedactOperator.php
β βββ MaskOperator.php
β βββ HashOperator.php
β βββ EncryptOperator.php
β βββ DecryptOperator.php
βββ Models/
βββ OperatorConfig.php
βββ OperatorResult.php
βββ EngineResult.php
Design Principles
- Simple but Complete: Focus on core functionality without ML/NLP complexity
- Pattern-Based: Fast regex matching with validation for accuracy
- Type-Safe: PHP 8.1+ with strict types throughout
- UTF-8 First: Proper multibyte string handling everywhere
- Extensible: Easy to add custom recognizers and operators
- Immutable Results: Thread-safe result objects
Performance
- Fast Pattern Matching: No ML model overhead
- Efficient Validation: Checksum algorithms run in O(n) time
- UTF-8 Optimized: Uses
mb_*functions for correct character offsets - Minimal Dependencies: Only essential libraries (libphonenumber)
Examples
See examples/benelux_example.php for a comprehensive demonstration including:
- All BeNeLux recognizers in action
- Multiple anonymization strategies
- Context enhancement
- Different operator configurations
Run it:
php examples/benelux_example.php
Contributing
Contributions are welcome! To add support for additional countries:
- Create a recognizer in
src/Analyzer/Recognizers/CountryName/ - Extend
PatternRecognizeror implementEntityRecognizer - Add validation logic (checksum, format, etc.)
- Include context words in local language(s)
- Add tests and examples
License
MIT License - See LICENSE file for details
Credits
This project is inspired by Microsoft Presidio. Special thanks to the Presidio team for their excellent work on PII detection and de-identification.
Roadmap
- US-specific recognizers (SSN, ITIN, Passport, Driver License, Bank Account, ABA Routing)
- Additional country-specific recognizers (Germany, France, Spain, etc.)
- Custom recognizer builder API
- Batch processing support
- Performance benchmarks
- Integration with popular PHP frameworks (Laravel, Symfony)
Support
For issues, questions, or contributions, please visit the GitHub repository.