rafaelnajera/matcher

A regexp-like matcher for bespoke tokens.

0.5.0 2017-02-19 18:48 UTC

This package is auto-updated.

Last update: 2024-11-19 02:23:59 UTC


README

Matcher implements a regexp-like matching system that can be used with user-devised tokens

Installation

Install the latest version with

$ composer require rafaelnajera/matcher

Usage

The main class is Matcher, which allows you to match a sequence of tokens against a pattern specified in a regexp-like manner.

Matcher works on a Pattern object, which stands for a regexp-like pattern that can be matched.

The following sets up a Pattern object to match '^ab(cd)*e':

$pattern = (new Pattern())->withTokenSeries(['a', 'b'])
           ->withAddedPatternZeroOrMore((new Pattern())->withTokenSeries(['c', 'd']))
           ->withTokenSeries(['e']);

The tokens used to set up the pattern can be of any type. Matching is done by strict comparison with the input tokens. Tokens can also be objects that implement the Token interface, in which case the token's matches($someInput) method will be called. The input in this case can be anything as long as the token's matches() method knows how to determine a match.

Once set up, a Matcher object can be created and input tokens can be fed to the it one by one with the match method:

$matcher = new Matcher($pattern);
$r = $matcher->match('a');
$r = $matcher->match('b');
...

Here $r will be false if the input does not match the pattern. $r will be true if the sequence is still "alive", that is, if the sequence still matches the pattern in $matcher. When a full match is found the matchFound() method returns true:

$m = $matcher->matchFound();

The public variable $matcher->matched at this point will contain the actual sequence of matched tokens or, if tokens implement the Token interface, whatever the token's matched($someInput) method returns. This array of matched token information can be manipulated during the matching process with callbacks as explained below.

The reset() method, resets the internal state of the pattern matcher as if no token had been fed to it.

$matcher->reset();

Input tokens can also be given in an array:

$r = $matcher->matchArray(['a', 'b', 'c']);

By default this method resets the matcher before starting to match the elements of the given array. An optional flag can be given to change this behaviour:

$r = $matcher->matchArray(['a', 'b', 'c'], false);

Callbacks

A callback can be provided that will be called when a full match occurs. The callback function is called with $matcher->matched as its only argument and its output will overwrite $matcher->matched.

The following code, for example, will cause $matcher->matched to be 'abc' instead of the array ['a', 'b', 'c']:

$pattern = (new Pattern())->withTokenSeries(['a', 'b', 'c'])
   ->withCallback( 
    function ($m) {
        return implode($m);
    }
);

$matcher = new Matcher($pattern);
$matcher->matchArray(['a', 'b', 'c', 'e']);

$matcher->matchFound();  // true
$matcher->matched;  // 'abc'

Callbacks are retained in their proper places when patterns are added. This allows sub-patterns with specific callbacks to be created. For example:

$subPattern = (new Pattern())->withTokenSeries(['c', 'd'])
     ->withCallback( function($m) { ... });

$pattern = (new Pattern())->withTokenSeries(['a', 'b'])
        ->withAddedPatternZeroOrMore($subPattern)
        ->withTokenSeries(['e']);

$matcher = new Matcher($pattern);

In this case, every time the 'cd' subpattern is matched, the callback will be called.

###End Token

The special constant Token::EOF stands for the end of input. It can be used to set up patterns and also to signal the matcher the end of the input.

$pattern = (new Pattern())->withTokenSeries(['a', 'b', Token::EOF]);
        
$matcher = new Matcher($pattern);
        
$matcher->matchArray(['a', 'b']);  // no match
$matcher->matchArray(['a', 'b', Token::EOF]); // match found!

Parallel Matching

The class ParallelMatcher matches input tokens against a set of patterns. Once a match is found in one of the patterns, the matcher saves that pattern's match sequence and starts matching again. This effectively considers a valid input stream as a sequence of matched patterns.

The constructor simply takes an array of Pattern objects. Then the match() method can be called on input tokens as with the Matcher class.

Example:

$pmatcher = new ParallelMatcher(
    [
        (new Pattern())->withTokenSeries(['(', 'a', ')']),
        (new Pattern())->withTokenSeries(['(', 'b', ')']),
        (new Pattern())->withTokenSeries(['(', 'c', ')'])
    ]
);

$r = $pmatcher->match('('); // false if the input token does not match any pattern
...

The method matchArray() can be used to match an array of tokens against the patterns. It returns true if the input sequence was matched perfectly and false if there was any error. The number of matched patterns can be found with numMatches()

Example:

$result = $pmatcher->matchArray(['(', 'c', ')', '(', 'c', ')']); // true
$pmatcher->numMatches();  // 2

$result = $pmatcher->matchArray(['(', 'a', ')', '(', 'x', ')']); // false
$pmatcher->numMatches();  // 1