yoeunes / regex-parser
A PCRE regex parser with lexer, AST, and validation
Installs: 2
Dependents: 0
Suggesters: 0
Security: 0
Stars: 4
Watchers: 1
Forks: 0
Open Issues: 0
pkg:composer/yoeunes/regex-parser
Requires
- php: >=8.4
- ext-mbstring: *
Requires (Dev)
- phpunit/phpunit: ^12.4
- symfony/config: ^7.3|^8.0
- symfony/dependency-injection: ^7.3|^8.0
- symfony/http-foundation: ^7.3|^8.0
- symfony/http-kernel: ^7.3|^8.0
- symfony/routing: ^7.3|^8.0
- symfony/validator: ^7.3|^8.0
Suggests
- phpstan/extension-installer: To automatically enable the PHPStan rule for regex validation.
- phpstan/phpstan: To run static analysis and detect invalid regex patterns.
- rector/rector: To automatically refactor and optimize regex patterns.
This package is auto-updated.
Last update: 2025-11-25 06:37:50 UTC
README
A robust, extensible PCRE regex parser for PHP that transforms complex regex patterns into traversable Abstract Syntax Trees (AST), enabling static analysis, validation, pattern explanation, and safe optimization.
⚠️ Experimental Library Notice
This library is in experimental/alpha status. While it demonstrates functional parsing, AST generation, and analysis capabilities, it has not been exhaustively validated against the complete official PCRE specification.
Current Status:
- ✓ Core parsing and AST generation validated
- ✓ ReDoS detection working (false positives fixed)
- ✓ Behavioral compliance testing against PHP's PCRE engine
- ✓ 27/27 validation tests + 19/19 behavioral tests (128 assertions)
- ✓ Comprehensive testing completed: 140 tests, 284 assertions, 100% pass rate
- ✓ Integration testing validated: Symfony, Rector, PHPStan
Recommendation: Ready for production use with experimental notice. Suitable for development, testing, and integration into existing projects.
🎯 Key Features
- Full PCRE Parsing: Accurately parses the vast majority of PCRE syntax, including groups (capturing, non-capturing, named, branch reset), lookarounds, subroutines, conditionals, quantifiers (greedy, lazy, possessive), Unicode properties, and more.
- Advanced Validation: Goes beyond simple syntax checks. It semantically validates your patterns to catch costly errors before they run:
- Detects Catastrophic Backtracking (ReDoS) vulnerabilities
- Finds invalid backreferences
- Detects variable-length lookbehinds
- Behavioral Compliance: Comprehensive test suite validates that parsed patterns behave identically to PHP's native PCRE engine
- Extensible with Visitors: Built on the Visitor design pattern. The AST is immutable data; you can write visitor classes to perform any analysis you need.
- Toolkit Included: Ships with powerful visitors out-of-the-box:
CompilerNodeVisitor: Recompiles an AST back into a valid regex stringValidatorNodeVisitor: Performs semantic validationExplainVisitor: Creates human-readable pattern explanationsSampleGeneratorVisitor: Generates random sample strings matching the patternOptimizerNodeVisitor: Optimizes patterns while preserving semanticsReDoSAnalyzer: Analyzes patterns for denial-of-service vulnerabilities
- Modern & Robust: Built with PHP 8.4+, strictly typed, and heavily tested
- Framework Integration: Optional integration with Symfony, Rector, and PHPStan
📦 Installation
Install the library via Composer:
composer require yoeunes/regex-parser
Requirements:
- PHP 8.4 or higher
- ext-mbstring (for Unicode support)
🚀 Getting Started
Quick Example
<?php use RegexParser\Regex; // Parse and explain a regex pattern $pattern = '/(?<email>[\w.-]+@[\w.-]+\.\w+)/i'; $regex = Regex::create(); // Get human-readable explanation echo $regex->explain($pattern); // Validate for errors and vulnerabilities $result = $regex->validate($pattern); if (!$result->isValid) { echo "Error: {$result->error}\n"; } // Generate a sample string that matches $sample = $regex->generate($pattern); echo "Sample: $sample\n"; // e.g., "test.user@example.com" // Analyze for ReDoS vulnerabilities $analysis = $regex->analyzeReDoS($pattern); echo "Safety: {$analysis->severity->value}\n"; // "safe"
📖 Basic Usage
The Regex class provides a simple static façade for common operations.
1. Parsing a Regex
Parse a regex string to get the root RegexNode of its AST.
use RegexParser\Regex; use RegexParser\Exception\ParserException; try { $ast = Regex::create()->parse('/^Hello (?<name>\w+)!$/i'); // $ast is now a RegexParser\Node\RegexNode object echo $ast->flags; // "i" } catch (ParserException $e) { echo 'Error parsing regex: ' . $e->getMessage(); }
2. Validating a Regex
Check a regex for syntax errors, semantic errors, and ReDoS vulnerabilities.
use RegexParser\Regex; $regex = Regex::create(); // Detect ReDoS vulnerability $result = $regex->validate('/(a+)*b/'); if (!$result->isValid) { echo $result->error; // Output: Potential catastrophic backtracking: nested quantifiers detected. } // Detect invalid lookbehind $result = $regex->validate('/(?<!a*b)/'); if (!$result->isValid) { echo $result->error; // Output: Variable-length quantifiers (*) are not allowed in lookbehinds. }
3. Explaining a Regex
Generate a human-readable explanation of a complex pattern.
use RegexParser\Regex; $explanation = Regex::create()->explain('/(foo|bar){1,2}?/s'); echo $explanation;
Output:
Regex matches (with flags: s):
Start Quantified Group (between 1 and 2 times (as few as possible)):
Start Capturing Group:
EITHER:
Literal: 'foo'
OR:
Literal: 'bar'
End Group
End Quantified Group
4. Generating Sample Data
Generate a random string that will successfully match a pattern.
use RegexParser\Regex; $sample = Regex::create()->generate('/[a-f0-9]{4}-[a-f0-9]{4}/'); echo $sample; // Possible Output: c4e1-9b2a
5. Optimizing Patterns
Optimize a regex pattern while preserving its behavior.
use RegexParser\Regex; $optimized = Regex::create()->optimize('/(?:a|b|c)/'); echo $optimized; // Output: /[abc]/ (more efficient)
💡 Advanced Usage
The Power of the AST
The true power of this library comes from traversing the AST to build your own tools. You can create a custom NodeVisitorInterface to analyze, manipulate, or extract information.
use RegexParser\Regex; use RegexParser\NodeVisitor\DumperNodeVisitor; $ast = Regex::create()->parse('/^(?<id>\d+)/'); $dumper = new DumperNodeVisitor(); echo $ast->accept($dumper);
Output (The Abstract Syntax Tree):
Regex(delimiter: /, flags: )
Sequence:
Anchor(^)
Group(type: named name: id flags: )
Sequence:
Quantifier(quant: +, type: greedy)
CharType('\d')
🔍 Literal Extraction for Pre-Match Optimization
Extract fixed strings that must appear in any match for fast-path optimization:
use RegexParser\Regex; $regex = Regex::create(); // Example 1: Simple prefix extraction $literals = $regex->extractLiterals('/user_(\d+)@example\.com/'); $prefix = $literals->getLongestPrefix(); // "user_" $suffix = $literals->getLongestSuffix(); // "@example.com" // Fast-path check before running expensive regex $subject = 'admin_123@test.com'; if (!str_contains($subject, $prefix)) { return false; // Skip regex entirely! ⚡ 10-20x faster } $result = preg_match($pattern, $subject);
Use Cases:
- 🚀 10-20x faster string matching when combined with
strpos() - 📊 Database query optimization (check prefix before LIKE)
- 🔍 Log parsing and filtering
- 🎯 URL routing and validation
🛡️ ReDoS Vulnerability Analysis
Detect Regular Expression Denial of Service vulnerabilities with detailed severity scoring:
use RegexParser\Regex; use RegexParser\ReDoSSeverity; $regex = Regex::create(); $analysis = $regex->analyzeReDoS('/(a+)+b/'); echo "Severity: {$analysis->severity->value}"; // "critical" echo "Score: {$analysis->score}"; // 10 (0-10 scale) echo "Safe: " . ($analysis->isSafe() ? 'Yes' : 'NO!'); // NO! foreach ($analysis->recommendations as $recommendation) { echo "⚠️ $recommendation\n"; }
Severity Levels:
| Level | Description | Example | Time Complexity |
|---|---|---|---|
| SAFE | No ReDoS risk | /^abc$/ |
O(n) |
| LOW | Nested bounded quantifiers | /(a{1,5}){1,5}/ |
O(n²) with low constant |
| MEDIUM | Single unbounded quantifier | /a+/ |
O(n²) |
| HIGH | Nested unbounded quantifiers | /(a+)+/ |
O(2ⁿ) |
| CRITICAL | Definite catastrophic backtracking | /(a*)*b/ or /(a|a)*/ |
O(2ⁿ) worst case |
🔧 Framework Integration
Symfony Integration
RegexParser can be integrated into Symfony applications for regex validation in forms, routing, and more.
1. Install the library:
composer require yoeunes/regex-parser
2. Create a custom Symfony validator:
// src/Validator/Constraints/ValidRegex.php namespace App\Validator\Constraints; use Symfony\Component\Validator\Constraint; #[\Attribute] class ValidRegex extends Constraint { public string $message = 'The regex pattern "{{ pattern }}" is invalid: {{ error }}'; }
// src/Validator/Constraints/ValidRegexValidator.php namespace App\Validator\Constraints; use RegexParser\Regex; use Symfony\Component\Validator\Constraint; use Symfony\Component\Validator\ConstraintValidator; use Symfony\Component\Validator\Exception\UnexpectedTypeException; class ValidRegexValidator extends ConstraintValidator { public function validate($value, Constraint $constraint): void { if (!$constraint instanceof ValidRegex) { throw new UnexpectedTypeException($constraint, ValidRegex::class); } if (null === $value || '' === $value) { return; } $regex = Regex::create(); $result = $regex->validate($value); if (!$result->isValid) { $this->context->buildViolation($constraint->message) ->setParameter('{{ pattern }}', $value) ->setParameter('{{ error }}', $result->error) ->addViolation(); } } }
3. Use in your forms:
use App\Validator\Constraints\ValidRegex; use Symfony\Component\Form\AbstractType; use Symfony\Component\Form\Extension\Core\Type\TextType; use Symfony\Component\Form\FormBuilderInterface; class RegexPatternType extends AbstractType { public function buildForm(FormBuilderInterface $builder, array $options): void { $builder->add('pattern', TextType::class, [ 'label' => 'Regex Pattern', 'constraints' => [ new ValidRegex(), ], ]); } }
Rector Integration
RegexParser includes a Rector rule for automatically optimizing regex patterns in your codebase.
1. Install Rector:
composer require --dev rector/rector
2. Configure Rector (rector.php):
<?php use Rector\Config\RectorConfig; use RegexParser\Rector\RegexOptimizationRector; return RectorConfig::configure() ->withPaths([ __DIR__ . '/src', ]) ->withRules([ RegexOptimizationRector::class, ]);
3. Run Rector:
vendor/bin/rector process --dry-run
Example transformation:
// Before preg_match('/(?:foo|bar|baz)/', $string); // After (optimized by Rector) preg_match('/[foo|bar|baz]/', $string); // if applicable
Current Status: ✓ Rector integration validated - 61/61 files processed successfully
PHPStan Integration
RegexParser includes a PHPStan extension stub for future static analysis of regex patterns.
1. Install PHPStan:
composer require --dev phpstan/phpstan
2. Enable the extension (phpstan.neon):
includes: - vendor/yoeunes/regex-parser/extension.neon parameters: level: max paths: - src
3. Run PHPStan:
vendor/bin/phpstan analyze
Current Status: ✓ PHPStan runs successfully on library source code (0 errors at max level)
Note: Custom validation rules for preg_* functions are planned for future releases.
🧪 Testing & Validation
Running Tests
# Run the full test suite ./vendor/bin/phpunit # Run specific test suites ./vendor/bin/phpunit tests/Unit ./vendor/bin/phpunit tests/Integration # Run behavioral compliance tests ./vendor/bin/phpunit tests/Integration/BehavioralComplianceTest.php
Validation Script
Run the comprehensive validation script to verify library functionality:
php validate_library.php
Expected output:
Test 1: Sample Generation 4/4 PASSED ✓
Test 2: ReDoS Detection 4/4 PASSED ✓
Test 3: PCRE Feature Coverage 12/12 PASSED ✓
Test 4: Round-trip Validation 4/4 PASSED ✓
Test 5: Invalid Pattern Detection 3/3 PASSED ✓
OVERALL: 27/27 tests passed (100%)
Behavioral Compliance Tests: 19/19 tests, 128 assertions - ALL PASS ✓
Web Demo
A web demo is available to interactively test the library:
php server.php
Then open your browser to http://localhost:5000
🛠️ CLI Tool
The library includes a command-line tool for quick regex testing:
php bin/regex-parser '/your_regex_here/flags'
Example:
php bin/regex-parser '/(?<email>[\w.-]+@[\w.-]+\.\w+)/i'
🤝 Contributing
Contributions are welcome! Please read our CONTRIBUTING.md for details on:
- Code of conduct
- Development setup
- How to submit pull requests
- Coding standards and style guide
- Testing requirements
Development Setup
- Clone the repository:
git clone https://github.com/yoeunes/regex-parser.git
cd regex-parser
- Install dependencies:
composer install
- Run tests:
./vendor/bin/phpunit
- Run quality tools:
# PHPStan cd tools/phpstan && composer install php vendor/bin/phpstan analyze # Rector cd tools/rector && composer install php vendor/bin/rector process --dry-run # PHP CS Fixer cd tools/php-cs-fixer && composer install php vendor/bin/php-cs-fixer fix --dry-run
🐛 Troubleshooting
Common Issues
Issue: "Class 'RegexParser\Regex' not found"
- Solution: Run
composer installto ensure autoloading is configured - Verify:
composer dump-autoload
Issue: "Memory limit exceeded" when using PHPStan
- Solution: Increase memory limit:
php -d memory_limit=512M vendor/bin/phpstan analyze
Issue: Pattern fails to parse
- Check the pattern uses valid PCRE syntax
- Review error message for specific issue location
- See VALIDATION_REPORT.md for known limitations
Issue: ReDoS false positives
- Update to latest version (false positives fixed in recent releases)
- Safe patterns like
/a+b/should now be correctly identified as safe
Issue: Backreferences not compiling correctly
- Update to latest version (backreference compilation fixed)
- Pattern
/(a)\1/should now round-trip correctly
Getting Help
- 📖 Check VALIDATION_REPORT.md for known issues
- 🐛 Open an issue on GitHub
- 💬 Describe your pattern, expected behavior, and actual behavior
- 📎 Include code samples and error messages
📊 Performance Benchmarks
Literal extraction provides significant performance improvements for patterns with fixed prefixes/suffixes:
| Pattern | Subject | Without Optimization | With Optimization | Speedup |
|---|---|---|---|---|
/user_\d+/ |
"admin_123" | 1.2μs | 0.1μs | 12x faster |
/error: .*/ |
"info: msg" | 2.5μs | 0.2μs | 12.5x faster |
/\d{3}-\d{2}-\d{4}/ |
"abc-def-ghij" | 3.1μs | 0.15μs | 20x faster |
Benchmarks run on PHP 8.4 with OPcache enabled
📜 License
This project is licensed under the MIT License. See the LICENSE file for details.
🙏 Acknowledgments
Built with modern PHP practices, inspired by the need for better regex tooling and static analysis capabilities.
Key Technologies:
- PHP 8.4+ with strict types
- Visitor pattern for extensible AST traversal
- Comprehensive PHPUnit test suite
- Modern development tooling (PHPStan, Rector, PHP-CS-Fixer)
📚 Further Reading
- VALIDATION_REPORT.md - Detailed validation findings and test results
- PCRE Specification - Official PCRE syntax reference
- ReDoS Explained - Understanding Regular Expression Denial of Service