yoeunes/regex-parser

A powerful PCRE regex parser with lexer, AST builder, validation, ReDoS analysis, and syntax highlighting. Zero dependencies, blazing fast, and production-ready.

Fund package maintenance!
yoeunes

Installs: 8 435

Dependents: 3

Suggesters: 0

Security: 0

Stars: 25

Watchers: 1

Forks: 2

Open Issues: 1

pkg:composer/yoeunes/regex-parser

v1.2.0 2026-01-08 17:36 UTC

README

RegexParser

Author Badge GitHub Release Badge License Badge Packagist Downloads Badge GitHub Stars Badge Supported PHP Version Badge

RegexParser: Static Analysis, Linter & Logic Solver

RegexParser is a PHP 8.2+ library that treats regular expressions as code.

Unlike simple wrappers around preg_match, RegexParser implements a complete compiler pipeline (Lexer โ†’ Parser โ†’ AST) and an Automata-based Logic Solver (AST โ†’ NFA โ†’ DFA).

This architecture allows for advanced static analysis:

  • Linting: Detect redundancy, useless flags, and optimizations.
  • Safety: Statically detect catastrophic backtracking (ReDoS).
  • Logic: Mathematically compare patterns (Intersection, Equivalence, Subset).

Built for learning, validation, and robust tooling in PHP projects.

If you are new to regex, start with the Regex Tutorial. If you want a short overview, see the Quick Start Guide.

Getting started

# Install the library
composer require yoeunes/regex-parser

# Try the CLI
vendor/bin/regex explain '/\d{4}-\d{2}-\d{2}/'

What RegexParser provides

  • ๐Ÿ—๏ธ Deep Parsing: Parse /pattern/flags into a structured, typed AST.
  • ๐Ÿง  Logic Solver: Mathematically compare two regexes using NFA/DFA transformation. Detect route conflicts and validate security subsets.
  • ๐Ÿ›ก๏ธ ReDoS Analysis: Analyze potential catastrophic backtracking risks structure-wise.
  • ๐Ÿงน Linter: Clean up legacy code (useless flags, redundant groups) via the CLI.
  • ๐Ÿ“– Explanation: Explain patterns in plain English.
  • ๐Ÿ”ง Visitor API: A flexible API for building custom regex tooling.

Philosophy & Accuracy

RegexParser separates what it can guarantee from what is heuristic:

  • Guaranteed: parsing, AST structure, error offsets, and syntax validation for the targeted PHP/PCRE version.
  • Heuristic: ReDoS analysis is structural and conservative; treat it as potential risk unless confirmed.
  • Context matters: PCRE version, JIT, and backtrack/recursion limits change practical impact.

How to report a vulnerability responsibly

If you believe a pattern is exploitable:

  1. Run confirmed mode and capture a bounded, reproducible PoC.
  2. Include the pattern, input lengths, timings, JIT setting, and PCRE limits.
  3. Verify impact in the real code path before filing a security issue.

See SECURITY.md for reporting channels.

Safer rewrites (verify behavior)

These techniques reduce backtracking but can change matching behavior. Always validate with tests.

/(a+)+$/     -> /a+$/      (semantics often preserved, but verify captures)
/(a+)+$/     -> /a++$/     (possessive, no backtracking)
/(a|aa)+/    -> /a+/       (only if alternation is redundant)
/(a|aa)+/    -> /(?>a|aa)+/ (atomic, avoids backtracking)

How it works

  • Regex::parse() splits the literal into pattern and flags.
  • The lexer produces a token stream.
  • The parser builds an AST (RegexNode).
  • Visitors walk the AST to validate, explain, analyze, or transform.

For the full architecture, see docs/ARCHITECTURE.md.

CLI quick tour

# Parse and validate a pattern
vendor/bin/regex parse '/^hello world$/'

# Get plain English explanation
vendor/bin/regex explain '/\d{4}-\d{2}-\d{2}/'

# Check for potential ReDoS risk (theoretical by default)
vendor/bin/regex analyze '/(a+)+$/'

# Colorize pattern for better readability
vendor/bin/regex highlight '/\d+/'

# Lint your entire codebase
vendor/bin/regex lint src/

Regex Lint Output

PHP API at a glance

use RegexParser\Regex;
use RegexParser\ReDoS\ReDoSMode;

$regex = Regex::create([
    'runtime_pcre_validation' => true,
]);

// Parse a pattern into AST
$ast = $regex->parse('/^hello world$/i');

// Validate pattern safety
$result = $regex->validate('/(?<=test)foo/');
if (!$result->isValid()) {
    echo $result->getErrorMessage();
}

// Check for ReDoS risk (theoretical by default)
$analysis = $regex->redos('/(a+)+$/');
echo $analysis->severity->value; // 'critical', 'safe', etc.

// Optional: attempt bounded confirmation
$confirmed = $regex->redos('/(a+)+$/', mode: ReDoSMode::CONFIRMED);
echo $confirmed->isConfirmed() ? 'confirmed' : 'theoretical';

// Get human-readable explanation
echo $regex->explain('/\d{4}-\d{2}-\d{2}/');

Integrations

RegexParser integrates with common PHP tooling:

  • Symfony bundle: docs/guides/cli.md
  • PHPStan: vendor/yoeunes/regex-parser/extension.neon
  • GitHub Actions: vendor/bin/regex lint in your CI pipeline

Performance

RegexParser ships lightweight benchmark scripts in benchmarks/ to track parser, compiler, and formatter throughput.

  • Run formatter benchmarks: php benchmarks/benchmark_formatters.php
  • Run all benchmarks: for file in benchmarks/benchmark_*.php; do echo "Running $file"; php "$file"; echo; done

Documentation

Start here:

Key references:

Contributing

Contributions are welcome! See CONTRIBUTING.md to get started.

# Set up development environment
composer install

# Run tests
composer phpunit

# Check code style
composer phpcs

# Run static analysis
composer phpstan

License

Released under the MIT License.

Support

If you run into issues or have questions, please open an issue on GitHub: https://github.com/yoeunes/regex-parser/issues.