stratadox/parser

Parser combinator library

v0.1.1 2023-11-13 12:26 UTC

This package is auto-updated.

Last update: 2024-04-13 13:25:46 UTC


README

Github Action Scrutinizer Code Quality

Simple Yet Powerful Parsing Library.

What is this

A library to create custom parsers. Based on the "ancient" concept of parser combinators, this library contains a vast variety of base parsers, decorators, combinators and helpers.

Why use this

Parsers made with this library can be used in many ways. Parsing is transforming text into a usable structure.

This can be used for various purposes, whether it be transforming json / csv / xml / yaml / etc. into some kind of data structure, or parsing a custom DSL or expression language into an abstract syntax tree.

Whether you wish to create your own file format, your own programming language, interpret existing file formats or languages... This library is here to help.

How to use this

For hands on how-tos, see the guide.

Installation

Using composer: composer require stratadox/parser

Overview

There's 3 base parsers: any, text and pattern.

  • Any matches any single character.
  • Text matches a predefined string.
  • Pattern matches a regular expression.

These can be upgraded by a fair amount of add-ons ("decorators"), which can be combined as needed:

  • Repeatable applies the parser any number of times, yielding a list.
  • Map modifies successful results based on a function.
  • Full Map modifies all results based on a function.
  • Ignore requires the thing to be there, and then ignores it. (Miauw)
  • Maybe does not require it, but uses it if it's there.
  • Optional combines the above two.
  • Except "un-matches" if another parser succeeds.
  • End returns an error state if there's unparsed content.
  • All or Nothing fiddles with the parse error.

Parsers can be combined using these combinators:

All the above can be mixed and combined at will. To make life easier, there's a bunch of combinator shortcuts for "everyday tasks":

  • Between matches the parser's content between start and end.
  • Between Escaped matches unescaped content between start and end.
  • Split yields one or more results, split by a delimiter.
  • Must Split yields two or more results, split by a delimiter.
  • Keep Split yields a structure like {delimiter: [left, right]}.

There's several additional helpers, which are essentially mapping shortcuts:

  • Join implodes the array result into a string.
  • Non-Empty refuses empty results.
  • At Least refuses arrays with fewer than x entries.
  • At Most refuses arrays with more than x entries.
  • First transforms an array result into its first item.
  • Item transforms an array result into its nth item.

To enable lazy parsers (and/or to provide a structure), different containers are available:

Example 1: CSV

For a basic "real life" example, here's a simple CSV parser:

<?php
use Stratadox\Parser\Helpers\Between;
use Stratadox\Parser\Parser;
use function Stratadox\Parser\any;
use function Stratadox\Parser\pattern;

function csvParser(
    Parser|string $sep = ',',
    Parser|string $esc = '"',
): Parser {
    $newline = pattern('\r\n|\r|\n');
    return Between::escaped('"', '"', $esc)
        ->or(any()->except($newline->or($sep)->or($esc))->repeatableString())
        ->mustSplit($sep)->maybe()
        ->split($newline)
        ->end();
}

(For associative result mapping, see the CSV example)

Example 2: Calculator AST

This next example parses basic arithmetic strings (e.g. 1 + -3 * 3 ^ 2) into an abstract syntax tree:

<?php
use Stratadox\Parser\Containers\Grammar;
use Stratadox\Parser\Containers\Lazy;
use Stratadox\Parser\Parser;
use function Stratadox\Parser\pattern;
use function Stratadox\Parser\text;

function calculationsParser(): Parser
{
    $grammar = Grammar::with($lazy = Lazy::container());

    $sign = text('+')->or('-')->maybe();
    $digits = pattern('\d+');
    $map = fn($op, $l, $r) => [
        'op' => $op,
        'arg' => [$l, $r],
    ];

    $grammar['prio 0'] = $sign->andThen($digits, '.', $digits)->join()->map(fn($x) => (float) $x)
        ->or($sign->andThen($digits)->join()->map(fn($x) => (int) $x))
        ->between(text(' ')->or("\t", "\n", "\r")->repeatable()->optional());

    $lazy['prio 1'] = $grammar['prio 0']->andThen('^', $grammar['prio 0'])->map(fn($a) => [
        'op' => '^',
        'arg' => [$a[0], $a[2]],
    ])->or($grammar['prio 0']);

    $grammar['prio 2'] = $grammar['prio 1']->keepSplit(['*', '/'], $map)->or($grammar['prio 1']);

    $grammar['prio 3'] = $grammar['prio 2']->keepSplit(['+', '-'], $map)->or($grammar['prio 2']);

    return $grammar['prio 3']->end();
}

(For a working example, see the Calculator example)

Documentation

Additional documentation is available through the guide, the reference and/or the tests.