kuria / parser
Character-by-character string parsing library
Requires
- php: >=7.1
Requires (Dev)
- kuria/dev-meta: ^0.4.0
README
Character-by-character string parsing library.
Contents
Features
- line number tracking (can be disabled for performance)
- supports CR, LF and CRLF line endings
- verbose exceptions
- many methods to navigate and operate the parser
- forward / backward peeking and seeking
- forward / backward character consumption
- state stack
- character types
- expectations
Requirements
- PHP 7.1+
Usage
Creating a parser
Create a new parser instance with string input.
The parser begins at the first character.
<?php use Kuria\Parser\Parser; $input = 'foo bar baz'; $parser = new Parser($input);
Parser properties
The parser has several public properties that can be used to inspect its current state:
$parser->i
- current position$parser->char
- current character (orNULL
at the end of input)$parser->lastChar
- last character (orNULL
at the start of input)$parser->line
- current line (orNULL
if line tracking is disabled)$parser->end
- end of input indicator (TRUE
at the end,FALSE
otherwise)$parser->vars
- user-defined variables attached to the current state
Warning
All of the public properties (with the exception of $parser->vars
)
are read-only and must not be modified directly by the calling code.
Use the built-in parser methods to mutate the parser state. See Parser method overview.
Parser method overview
Refer to doc comments of the respective methods for more information.
Also see Character types.
Static methods
getCharType($char): int
- determine character typegetCharTypeName($charType): string
- get human-readable character type name
Instance methods
getInput(): string
- get the input stringsetInput($input): void
- replace the input string (this also resets the parser)getLength(): int
- get length of the input stringisTrackingLineNumbers(): bool
- see if line number tracking is enabledtype(): int
- get type of the current characteris(...$types): bool
- check whether the current character is of one of the specified typesatNewline(): bool
- see if the parser is at the start of a newline sequenceeat(): ?string
- go to the next character and return the current one (returnsNULL
at the end)spit(): ?string
- go to the previous character and return the current one (returnsNULL
at the beginning)shift(): ?string
- go to the next character and return it (returnsNULL
at the end)unshift(): ?string
- go to the previous character and return it (returnsNULL
at the beginning)peek($offset, $absolute = false): ?string
- get character at the given offset or absolute position (does not affect state)seek($offset, $absolute = false): void
- alter current positionreset(): void
- reset states, vars and rewind to the beginningrewind(): void
- rewind to the beginningeatChar($char): ?string
- consume specific character and return the next charactertryEatChar(): bool
- attempt to consume specific character and return success stateeatType($type): string
- consume all characters of the specified typeeatTypes($typeMap): string
- consume all characters of the specified typeseatWs(): string
- consume whitespace, if anyeatUntil($delimiterMap, $skipDelimiter = true, $allowEnd = false): string
- consume all characters until the specified delimiterseatUntilEol($skip = true): string
- consume all character until end of line or inputeatEol(): string
- consume end of line sequenceeatRest(): string
- consume reamaining charactersgetChunk($start, $end): string
- get chunk of the input (does not affect state)detectEol(): ?string
- find and return the next end of line sequence (does not affect state)countStates(): int
- get number of stored statespushState(): void
- store the current staterevertState(): void
- revert to the last stored state and pop itpopState(): void
- pop the last stored state without reverting to itclearStates(): void
- throw away all stored statesexpectEnd(): void
- ensure that the parser is at the endexpectNotEnd(): void
- ensure that the parser is not at the endexpectChar($expectedChar): void
- ensure that the current character matches the expectationexpectCharType($expectedType): void
- ensure that the current character is of the given type
Example INI parser implementation
<?php use Kuria\Parser\Parser; /** * INI parser (example) */ class IniParser { /** * Parse an INI string */ public function parse(string $string): array { // create parser $parser = new Parser($string); // prepare variables $data = []; $currentSection = null; // parse while (!$parser->end) { // skip whitespace $parser->eatWs(); if ($parser->end) { break; } // parse the current thing if ($parser->char === '[') { // a section $currentSection = $this->parseSection($parser); } elseif ($parser->char === ';') { // a comment $this->skipComment($parser); } else { // a key=value pair [$key, $value] = $this->parseKeyValue($parser); // add to output if ($currentSection === null) { $data[$key] = $value; } else { $data[$currentSection][$key] = $value; } } } return $data; } /** * Parse a section and return its name */ private function parseSection(Parser $parser): string { // we should be at the [ character now, eat it $parser->eatChar('['); // eat everything until ] $sectionName = $parser->eatUntil(']'); return $sectionName; } /** * Skip a commented-out line */ private function skipComment(Parser $parser): void { // we should be at the ; character now, eat it $parser->eatChar(';'); // eat everything until the end of line $parser->eatUntilEol(); } /** * Parse a key=value pair */ private function parseKeyValue(Parser $parser): array { // we should be at the first character of the key // eat characters until = is found $key = $parser->eatUntil('='); // eat everything until the end of line // that is our value $value = trim($parser->eatUntilEol()); return [$key, $value]; } }
Using the parser
<?php $iniParser = new IniParser(); $iniString = <<<INI ; An example comment name=Foo type=Bar [options] size=150x100 onload= INI; $data = $iniParser->parse($iniString); print_r($data);
Output:
Array ( [name] => Foo [type] => Bar [options] => Array ( [size] => 150x100 [onload] => ) )
Character types
The table below lists the default character types.
These types are available as constants on the Parser class
:
Parser::C_NONE
- no character (NULL)Parser::C_WS
- whitespace (tab, linefeed, vertical tab, form feed, carriage return and space)Parser::C_NUM
- numeric character (0-9
)Parser::C_STR
- string character (a-z
,A-Z
,_
and any 8-bit char)Parser::C_CTRL
- control character (ASCII 127 and ASCII < 32 except whitespace)Parser::C_SPECIAL
-!"#$%&'()*+,-./:;<=>?@[\\]^\`{|}~
Customizing character types
Character types can be customized by extending the base Parser
class.
The following example changes "-
" and ".
" from CHAR_SPECIAL
to CHAR_STR
and inherits everything else.
<?php class CustomParser extends Parser { const CHAR_TYPE_MAP = [ '-' => self::C_STR, '.' => self::C_STR, ] + parent::CHAR_TYPE_MAP; // inherit everything else } // usage example $parser = new CustomParser('foo-bar.baz'); var_dump($parser->eatType(CustomParser::C_STR));
Output:
string(11) "foo-bar.baz"