bishopb/pattern

A string-matching PHP library sporting a consistent, fluent API.

dev-master 2020-12-29 02:48 UTC

This package is auto-updated.

Last update: 2024-08-29 04:14:28 UTC


README

Latest Stable Version Build Status Coverage Status Monthly Downloads

Warning: This API does not exist. After implementing it, and measuring the performance, I decided the overhead did not meet my goals and abandoned that code. The situation may be different in PHP 8. If you like this API, and wnat to see it come alive, please implement it and submit a PR.

Pattern is a string-matching PHP library sporting a consistent, fluent API. Pattern unifies the API for strcmp and family, fnmatch, preg_match, and version_compare, while also offering convenience methods for common string-matching operations.

Pattern might be for you if:

  • You want readable string-matching code that clearly describes your intent.
  • You're frustrated that there's no simple, built-in implementation to find if a string ends with another.
  • You want to avoid silly off-by-one errors when doing simple string checks.
  • You want the best performing algorithm for a particular kind of string check, regardless of whether that's strstr or strpos.
  • You're tired of referring to the PHP user manual for the argument order of strpos and friends.

Quickstart

Install with Composer: composer require bishopb/pattern:dev-master

Use:

use BishopB\Pattern;

// common matching API regardless of pattern language
$subjects = array ( 'Capable', 'Enabler', 'Able', );
$patterns = array (
    new Pattern\Literal('Able'),
    new Pattern\Wildcard('*able*')
);
foreach ($subjects as $subject) {
    foreach ($patterns as $pattern) {
        $pattern->matches($subject) and "$pattern matches $subject";
    }
}

// literal matching sugar
$able = new Pattern\Literal('Able')->fold();
$able->foundIn('tablet');
$able->begins('Abletic');
$able->ends('Parable');
$able->sorts->before('active');
$able->sorts->after('aardvark');

// version matching sugar
$stable = new Pattern\Version('1.0.0');
$stable->matches('1.0.0');
$stable->before('1.0.1');
$stable->after('0.9.9');

Motivation

In PHP, developers have four common ways to match strings to patterns:

There are few problems with these API:

  • $pattern is a plain old string, which means you can make probable mistakes like: fnmatch('^foo.*bar', $input)
  • strcmp and family return an orderable result that doesn't encourage intenional programming. Consider: if (! strcasecmp('foo', $input)) { echo 'pop quiz: matches foo?'; }
  • Functions to perform literal comparisons are scattered all over the place: strcmp, strcasecmp, strpos, stripos, etc.
  • Both strcasecmp and == are dangerous ways to compare strings.
  • Can be difficult to remember which argument is pattern and which is subject (compare strpos and preg_match).
  • How one specifies "case-insensitive" various widely amongst the comparison functions.
  • If your code initially accepts literal matches, then you want to support regular expressions, you have to re-write your code.
  • Not every platform supports fnmatch.

This library provides a fast, thin abstraction over the built-in pattern matching functions to mitigate these problems.

Performance

This package's philosophy is simple: to deliver syntactic sugar with minimal run-time fat. API calls are a thin facade over the fastest implementation of the requested match. Space is conserved as much as possible.

Run-time benchmarks

Meaurements for different tests in operations per second.

Peak-memory consumption benchmarks

Note: All benchmarks run a minimum of 1,000 times on a small, unloaded EC2 instance using PHP 5.3. Refer to tests/*Event.php for actual code. Refer to the Travis CI builds for run times on different PHP versions.

Advanced usage

Manipulating the search subjects

Typically methods in the pattern classes (Literal, Wildcard, and Pcre) take strings. However, you can also pass instances of Subject, which is a lightweight string class fit with methods common to string comparison:

use BishopB\Pattern;

$device  = new Literal('Tablet')->fold();
$version = new Version('8.1');
$subject = new Subject('    Microsoft Tablet running Windows 8.1.0RC242.')-trim();

$device->matches(
    $subject->
    column(' ', 1) // explode at space and get the 1st index (0-based)
);
$version->after(
    $subject->
    column(' ', -1)-> // explode at space and get the last index (nth-from last)
    substring(0, 4)   // only the first 5 characters
);

Faster searching of big text or with repeated searches

When your subject text is long, or you expect to compare your literal pattern to many different subjects, it's worth it to "study" the literal pattern for improved performance.

// notice the use of study()
// without this, searching would be much slower
$zebra = new Literal('zebra')->fold()->study();
$words = file_get_contents('/usr/share/dict/words') or die('No dictionary');
$zebra->foundIn($words);

You may be wondering: how many characters is "long"? Or, how many iterations is "many"? Well, I suppose it depends. But, a long time ago, some PHP internals benchmarking suggested a length of 5000+ or more would make studying worth it.

FAQ

Why not just use the built-ins?

For the reasons mentioned above. Personally, I wrote this library because I kept referring to the official docs on the argument order for the built-ins and because common use cases aren't handled concisely. In summary, this library lets me write less code and be more clear in meaning.

For example, I see a lot of code following this pattern:

if (! strcmp($actual, $expected)) {
    $this->doSomething();
} else {
    throw new \RuntimeException('Actual does not match expected');
}

It's technically right. But, to me, it looks wrong. I find this much easier to read:

if ($actual->matches($expected)) {
    $this->doSomething();
} else {
    throw new \RuntimeException('Actual does not match expected');
}

There is a related side benefit. In weak-mode PHP, functions that receive an invalid parameter emit a warning and return null. Since null evaluates falsey, the example above runs doSomething unexpectedly. Consider:

// ?password=[]
if (! strcmp($_GET['password'], $user->password)) {
    $this->login($user);
} else {
    throw new \RuntimeException('Invalid password');
}

Why? Because true === (! (null === strcmp(array (), '******'))). In this library, an exception is raised if you try to match against an array.

Why not add more stringy methods, like length(), to Subject?

The package overall aims to support pattern matching in the lightest weight possible. Bulking up Subject with methods unrelated to pattern matches conflicts with this goal.