peckrob/search-parser

A parser that converts a freeform query into an intermediate object, that can be converted to query many backends (SQL, ElasticSearch, etc).

v0.2 2020-05-14 20:33 UTC

This package is auto-updated.

Last update: 2024-05-01 22:50:13 UTC


README

SearchParser is a parser that converts a freeform query into an intermediate object, that can then be converted to query many backends (SQL, ElasticSearch, etc). It includes translators for SQL (using PDO) and Laravel Eloquent ORM. It supports a faceted language search as commonly found on many sites across the web.

For example, consider the following query:

from:foo@example.com "bar baz" !meef date:2018/01/01-2018/08/01 #hashtag

Using SearchParser, it is tokenized into a SearchQuery object containing a series of SearchQueryComponents that represent each logical component of the search query:

$q = new \peckrob\SearchParser\SearchParser();
$x = $q->parse($query);
print_r($x);

Returns:

peckrob\SearchParser\SearchQuery Object
(
    [position:peckrob\SearchParser\SearchQuery:private] => 0
    [data:protected] => Array
        (
            [0] => peckrob\SearchParser\SearchQueryComponent Object
                (
                    [type] => field
                    [field] => from
                    [value] => foo@example.com
                    [firstRangeValue] =>
                    [secondRangeValue] =>
                    [negate] =>
                )

            [1] => peckrob\SearchParser\SearchQueryComponent Object
                (
                    [type] => text
                    [field] =>
                    [value] => bar baz
                    [firstRangeValue] =>
                    [secondRangeValue] =>
                    [negate] =>
                )

            [2] => peckrob\SearchParser\SearchQueryComponent Object
                (
                    [type] => text
                    [field] =>
                    [value] => meef
                    [firstRangeValue] =>
                    [secondRangeValue] =>
                    [negate] => 1
                )

            [3] => peckrob\SearchParser\SearchQueryComponent Object
                (
                    [type] => range
                    [field] => date
                    [value] =>
                    [firstRangeValue] => 2018/01/01
                    [secondRangeValue] => 2018/08/01
                    [negate] =>
                )

            [4] => peckrob\SearchParser\SearchQueryComponent Object
                (
                    [type] => text
                    [field] =>
                    [value] => #hashtag
                    [firstRangeValue] =>
                    [secondRangeValue] =>
                    [negate] =>
                )

        )

)

Install

Install using composer:

composer require peckrob/search-parser

Has no external dependencies. Only tested on PHP 7.2+, but may potentially work on PHP5. But you should not be using PHP5. :)

Parsing

To parse a string into component tokens, create a SearchParser instance and call parse() on it.

$q = new \peckrob\SearchParser\SearchParser();
$x = $q->parse($query);

This will return a SearchQuery object that contains a series of SearchQueryComponents. The SearchQuery object is iterable, you can loop over it with a foreach loop.

Defining Custom Parsers

The built-in parser will parse the string above fine and supports a nice baseline of functionality. But if you need to extend the parser to parse additional data, you can do so trivially. You can create a class that implements the \peckrob\SearchParser\Parsers\Parser interface and implements the parsePart() method that returns a SearchQueryComponent object. This will be added to the SearchQuery object generated by the parser before being returned.

Then, just add the custom parser to SearchParser by calling addParser().

$custom = new \peckrob\SearchParser\Parsers\Hashtag();
$q = new \peckrob\SearchParser\SearchParser();
$q->addParser($custom);
$q->parse($query);

Returns:

...

            [4] => peckrob\SearchParser\SearchQueryComponent Object
                (
                    [type] => hashtag
                    [field] =>
                    [value] => hashtag
                    [firstRangeValue] =>
                    [secondRangeValue] =>
                    [negate] =>
                )

You can look at the Hashtag parser in Parsers for an example. Of course, you will need to provide a matching custom transform to handle your new custom component type (see below). Note that parsers do not "fall through." If your parser handles a part, it will move on to the next part.

Transforms

Included in the package are a couple of example transforms. These take the SearchQuery output from parse() and transform it into a format suitable for querying a backend. Included are a SQL backend and an Eloquent backend suitable for directly querying a Laravel Eloquent model object.

To use a transform, create an instance of a Transformer, passing in an optional default field and context object depending on the transformer.

$pdo = new PDO("sqlite:/tmp/foo.sql");

$transform = new \peckrob\SearchParser\Transforms\SQL\SQL("default_field", $pdo);
$where = $transform->transform($x);

Returns:

`from` = 'foo@example.com' and `default_field` = 'bar baz' and `default_field` != 'meef' and (`date` between '2018/01/01' and '2018/08/01') and `default_field` = '#hashtag'

With Laravel/Lumen

SearchParser natively supports Laravel/Lumen Eloquent ORM queries. You can use the Eloquent transform.

$user = User::take(100);
$transform = new \peckrob\SearchParser\Transforms\Eloquent\Eloquent("default_field", $user);
$user = $transform->transform($x);

This will return the $user object with all the where()'s, etc. ready for a query.

$users = $user->get();

Loose Mode

Both the native transforms support looseMode, which treats every text query as a like query. If you have defined custom parsers above, but not defined custom transforms (below), custom SearchQueryComponents types are treated as text.

$pdo = new PDO("sqlite:/tmp/foo.sql");

$transform = new \peckrob\SearchParser\Transforms\SQL\SQL("default_field", $pdo);
$transform->looseMode = true;
$where = $transform->transform($x);

Returns:

`from` = 'foo@example.com' and `default_field` like '%bar baz%' and `default_field` not like '%meef%' and (`date` between '2018/01/01' and '2018/08/01') and `default_field` like '%#hashtag%'

Defining Custom Component Transforms

In general you are free to transform the data however you like and you do not need to use any of the built-in transforms if you don't want to. However, the built-in transforms do also support custom component transforms as well, that they will call before they have run all their transforms. If you do not define a custom transform, custom parse types are treated as text in the standard transformer.

To create your own Transform, implement the \peckrob\SearchParser\Transforms\TransformsComponents interface and implement the transformPart() method. See the Hashtag transformer for an example.

$pdo = new PDO("sqlite:/tmp/foo.sql");
$transform = new \peckrob\SearchParser\Transforms\SQL\SQL("default_field", $pdo);
$transform->addComponentTransform(new \peckrob\SearchParser\Transforms\SQL\Hashtag("default_field", $pdo));
$where = $transform->transform($x);

Returns:

`from` = 'foo@example.com' and `default_field` = 'bar baz' and `default_field` != 'meef' and (`date` between '2018/01/01' and '2018/08/01') and hashtag = 'hashtag'

Filters

A Note About Security

The SQL transform will escape data passed as arguments (that is why you pass a PDO object as the context), but not as fields. The Eloquent transform very likely works the same way under the hood.

The suggested approach is to filter the fields based on a whitelist and throw out things that aren't valid. Don't just pass the SearchQuery directly back to the SQL transform without filtering the fields.

Built In Filters

SearchParser has a couple of filters availabe in the package. There is FieldFilter and FieldNameMapper. Filters are executed in the order that they are added to the Filter object.

FieldFilter

FieldFilter is a simple whitelist of valid fields. Any SearchQueryComponent that has a field and does not match one of the whitelist of valid fields is removed rom the SearchQuery.

$filter = new \peckrob\SearchParser\Filters\Filter();
$field_filter = new \peckrob\SearchParser\Filters\FieldNameMapper();
$field_filter->validFields = ['from'];
$filter->addFilter($field_filter);
$filter->filter($x);

Returns:

peckrob\SearchParser\SearchQuery Object
(
    [position:peckrob\SearchParser\SearchQuery:private] => 5
    [data:protected] => Array
        (
            [0] => peckrob\SearchParser\SearchQueryComponent Object
                (
                    [type] => field
                    [field] => from
                    [value] => foo@example.com
                    [firstRangeValue] =>
                    [secondRangeValue] =>
                    [negate] =>
                )

            [1] => peckrob\SearchParser\SearchQueryComponent Object
                (
                    [type] => text
                    [field] =>
                    [value] => bar baz
                    [firstRangeValue] =>
                    [secondRangeValue] =>
                    [negate] =>
                )

            [2] => peckrob\SearchParser\SearchQueryComponent Object
                (
                    [type] => text
                    [field] =>
                    [value] => meef
                    [firstRangeValue] =>
                    [secondRangeValue] =>
                    [negate] => 1
                )

            [4] => peckrob\SearchParser\SearchQueryComponent Object
                (
                    [type] => hashtag
                    [field] =>
                    [value] => hashtag
                    [firstRangeValue] =>
                    [secondRangeValue] =>
                    [negate] =>
                )

        )

)

Note that all fields except from have been removed from the SearchQuery object.

FieldNameMapper

Suppose you have a field that you want to expose to your users that is differently titled on your backend. For instance, date to your users might be created_on on your backend. This where the FieldNameMapper filter comes into play.

$filter = new \peckrob\SearchParser\Filters\Filter();
$field_filter = new \peckrob\SearchParser\Filters\FieldNameMapper();
$field_filter->mappingFields = [
    'date' => 'created_on'
];
$filter->addFilter($field_filter);
$filter->filter($x);

Returns:

...

            [3] => peckrob\SearchParser\SearchQueryComponent Object
                (
                    [type] => range
                    [field] => date_created
                    [value] =>
                    [firstRangeValue] => 2018/01/01
                    [secondRangeValue] => 2018/08/01
                    [negate] =>
                )

Defining Custom Filters

As with Parsers and Transformers, it is trivial to define custom filters. Simply create a class that implements the FiltersQueries interface, and define a filter() method. You will be passed in the SearchQuery object.

There are some convenience methods on SearchQuery that make writing filters a bit easier. Namely, they are:

  • remove(SearchQueryComponent $item) - Removes an item from the SearchQuery.
  • replace(SearchQueryComponent $old, SearchQueryComponent $new) - Replaces an item with a new item.
  • merge(SearchQuery $query) - Merges two SearchQuery objects together.

Once you have defined your custom filter, simply call addFilter() on the Filter instance. Again, filters are executed in the order they are added.

Tests

Tests are included. phpunit is a require-dev in this project, so you will need to composer install with dev. Then just run phpunit from the project root. Some tests may be skipped if optional components (such as Eloquent) are not installed.

Author

Rebecca Peck

License

MIT