nimayneb / yawl
YAWL - Yet another wildcard library - Match pattern for strings with asterisk and query token
Requires
- php: ^7.4.0
- ext-mbstring: *
- nimayneb/benchmark: 1.0.1
- phpmd/phpmd: ^2.8
Requires (Dev)
- nimayneb/phpqg: 1.0.0
- phpunit/phpunit: ^9.1.0
This package is auto-updated.
Last update: 2024-10-25 21:01:45 UTC
README
This is a library with classes for any wildcard implementation that finds pattern with * (asterisk) and ? (query) token.
Problem
Without regular expression extension (named ext-pcre
- before PHP 5.3) you have no wildcard support within PHP.
See https://www.php.net/manual/en/pcre.installation.php.
The known wildcard behavior (see Wildcard character (@Wikipedia)) is limited for a good implementation.
A small remedy?
We need a kind of "compiled" and "cached" pattern. The implementation of regular expression like preg_match
has a
huge performance. We lost these advantage as of PHP 7.0, because of just-in-time compiled pcre pattern!
Table of content
- Benchmark
- Wildcard variants
- Possible valid pattern
- Invalid pattern
- Escaping
- Repeating phrases
- Caching
- Wish list
- Appendix
API:
- Matcher (for use of single calls)
- Performer (for use of multiple calls)
- Converter (for use of regular expression)
Benchmark
When we benchmark several methods to match a fitting phrase (1000 random strings), we have a good results without "regular expressions":
Single Byte:
Multi Byte (like Unicode):
Internal PHP functions comparison
Wildcard variants
Possible valid pattern
?? (2 characters)
???* (2-3 characters)
Invalid pattern
***
?**
?*?
*?
Escaping
\\
\?
\*
Please note of this escaping scenarios:
Further explanation:
Repeating phrases
The asterisk (*
) has a problem finding the right position. If several identical phrases can be found in a string,
the search must take place in all positions to find the pattern.
Search: *is?ue (where is "*is")
Subject: this is an asterisk issue
Founds: ^ ^ ^ ^
The simplest solution is to break down the pattern recursively, but it should be not recursively.
Caching
The regular expression extension has a caching and compiling strategy to improve performance. The second time the same pattern is called, a tremendous increase in performance is achieved.
To help us to improve also performance, we use a simple key-value caching.
protected array $cachedResults = [];
public function match(string $subject): bool
{
return $this->cachedResults[$subject] ?? $this->cachedResults[$subject] = $this->computePhrases($subject, 0);
}
This is not a preferred solution.
Wish list
- Remove recursive call of
WildcardPerformer::computePhrases
(see Wikipedia - Matching wildcards) - Remove of
StringFunctionMapper
(too slow) - Combine Matcher and Performer (two advantages in one)
- Caching interface
- New pattern
??**
(0 or 2 characters) or?????**
(0 or 5 characters) - Glob support
- Example:
[abcdef0123456789]
,[0-9a-f]
,[!a-z]
- Use in YAWL:
[0-9a-f]?
(one times)[0-9a-f]?*
(zero or one times)[0-9a-f]*
(zero or N times)[0-9a-f]**
(one or N times)[0-9a-f]x
(one times then followsx
)
- Example:
Appendix
Common Algorithms:
Non-recursive Algorithms:
Recursive Algorithms:
- Compares a filename or pathname to a pattern by "Apple"
- Stackoverflow - Recursive solutions for glob pattern matching by "Siler"
PDepend
- NOP - Number Of Packages
- NOC - Number Of Classes
- NOM - Number Of Methods
- LOC – Lines of Code
- CYCLO - Cyclomatic complexity
- CALLS - number of distinct function- and method-calls
- ANDC - Average Number of Derived Classes
- AHH - Average Hierarchy Height