rafaelnajera / matcher
A regexp-like matcher for bespoke tokens.
Requires
- php: >=7.0
README
Matcher implements a regexp-like matching system that can be used with user-devised tokens
Installation
Install the latest version with
$ composer require rafaelnajera/matcher
Usage
The main class is Matcher, which allows you to match a sequence of tokens against a pattern specified in a regexp-like manner.
Matcher works on a Pattern object, which stands for a regexp-like pattern that can be matched.
The following sets up a Pattern object to match '^ab(cd)*e'
:
$pattern = (new Pattern())->withTokenSeries(['a', 'b']) ->withAddedPatternZeroOrMore((new Pattern())->withTokenSeries(['c', 'd'])) ->withTokenSeries(['e']);
The tokens used to set up the pattern can be of any type. Matching is done
by strict comparison with the input tokens. Tokens can also be objects
that implement the Token
interface, in which case the token's matches($someInput)
method will be called. The input in this case can be anything as long as the token's
matches()
method knows how to determine a match.
Once set up, a Matcher object can be created and input tokens
can be fed to the it one by one with the match
method:
$matcher = new Matcher($pattern); $r = $matcher->match('a'); $r = $matcher->match('b'); ...
Here $r
will be false if the input does not match the pattern. $r
will be true
if the sequence is still "alive", that is, if the sequence still matches the
pattern in $matcher
. When a full match is found the matchFound()
method returns
true:
$m = $matcher->matchFound();
The public variable $matcher->matched
at this point will contain the actual sequence
of matched tokens or, if tokens implement the Token
interface, whatever the
token's matched($someInput)
method returns. This array of matched token information
can be manipulated during the matching process with callbacks as explained
below.
The reset()
method, resets the internal state of the pattern matcher as if no
token had been fed to it.
$matcher->reset();
Input tokens can also be given in an array:
$r = $matcher->matchArray(['a', 'b', 'c']);
By default this method resets the matcher before starting to match the elements of the given array. An optional flag can be given to change this behaviour:
$r = $matcher->matchArray(['a', 'b', 'c'], false);
Callbacks
A callback can be provided that will be called when a full match occurs. The
callback function is called with $matcher->matched
as its only argument and
its output will overwrite $matcher->matched
.
The following code, for example, will cause $matcher->matched to be 'abc'
instead
of the array ['a', 'b', 'c']
:
$pattern = (new Pattern())->withTokenSeries(['a', 'b', 'c']) ->withCallback( function ($m) { return implode($m); } ); $matcher = new Matcher($pattern); $matcher->matchArray(['a', 'b', 'c', 'e']); $matcher->matchFound(); // true $matcher->matched; // 'abc'
Callbacks are retained in their proper places when patterns are added. This allows sub-patterns with specific callbacks to be created. For example:
$subPattern = (new Pattern())->withTokenSeries(['c', 'd']) ->withCallback( function($m) { ... }); $pattern = (new Pattern())->withTokenSeries(['a', 'b']) ->withAddedPatternZeroOrMore($subPattern) ->withTokenSeries(['e']); $matcher = new Matcher($pattern);
In this case, every time the 'cd' subpattern is matched, the callback will be called.
###End Token
The special constant Token::EOF
stands for the end of input. It can be used
to set up patterns and also to signal the matcher the end of the input.
$pattern = (new Pattern())->withTokenSeries(['a', 'b', Token::EOF]); $matcher = new Matcher($pattern); $matcher->matchArray(['a', 'b']); // no match $matcher->matchArray(['a', 'b', Token::EOF]); // match found!
Parallel Matching
The class ParallelMatcher
matches input tokens against a set
of patterns. Once a match is found in one of the patterns, the matcher
saves that pattern's match sequence and starts matching again. This effectively
considers a valid input stream as a sequence of matched patterns.
The constructor simply takes an array of Pattern
objects. Then the match()
method can be called on input tokens as with the Matcher
class.
Example:
$pmatcher = new ParallelMatcher( [ (new Pattern())->withTokenSeries(['(', 'a', ')']), (new Pattern())->withTokenSeries(['(', 'b', ')']), (new Pattern())->withTokenSeries(['(', 'c', ')']) ] ); $r = $pmatcher->match('('); // false if the input token does not match any pattern ...
The method matchArray()
can be used to match an array of tokens
against the patterns. It returns true if the input sequence was matched
perfectly and false if there was any error. The number of matched patterns
can be found with numMatches()
Example:
$result = $pmatcher->matchArray(['(', 'c', ')', '(', 'c', ')']); // true $pmatcher->numMatches(); // 2 $result = $pmatcher->matchArray(['(', 'a', ')', '(', 'x', ')']); // false $pmatcher->numMatches(); // 1