yui-ezic / z99-lexer
Php lexer for Z99 (own Pascal-like programming language).
Requires
- ext-ctype: *
- ext-json: *
- graphp/graphviz: ^0.2.2
Requires (Dev)
- roave/security-advisories: dev-master
This package is auto-updated.
Last update: 2024-10-27 03:10:57 UTC
README
Z99 is Pascal-like programming language developed for education purposes. Example of program written on Z99
program first
var i: int;
sum, value : real;
begin
sum = 0.0;
i = 1;
repeat
read (value);
sum = sum + value;
write(i, sum);
i = i + 1;
until i <= 100;
sum = sum / 100;
write(sum);
end.
Initialization of grammar
Lexer written with using the Finite State Machine which is represented by Z99Lexer\FSM\FSM class.
require 'vendor/autoload.php'; $fsm = new Z99Lexer\FSM\FSM();
The initialization of the state graph occurs in the file "create_fsm.php". You can run visualize()
method to see graph of states as picture.
dgt - digit
chr - character
def - default
WS - white space
For convenience, all final states begin with a minus and are highlighted in blue. 0 is start state.
Create own grammar
Firstly create the start State.
$fsm->addStart(0);
Than several intermediate state
$fsm->addState(1); $fsm->addState(2);
And add final state which has callback function which handle the substring and adds token to tokens table. The last argument tells the lexer when it's move to the initial state whether to take the next character or not.
$keywords = [ 'program', 'var', 'begin', 'read', 'write', 'repeat', 'until', 'if', 'then', 'fi' ]; $types = ['int', 'real', 'bool']; $boolConstants = ['true', 'false']; $fsm->addFinalState(-2, static function (LexerWriterInterface $writer, string $string, int $line) use ($keywords, $types, $boolConstants) { $index = null; $string = substr($string, 0, -1); if (in_array($string, $keywords, true)) { $token = 'Keyword'; } elseif (in_array($string, $types, true)) { $token = 'Type'; } elseif (in_array($string, $boolConstants, true)) { $token = 'BoolConst'; } else { $token = 'Ident'; $index = $writer->addIdentifier($string); } $writer->addToken($line, $string, $token, $index); }, false); $fsm->addFinalState('error', static function (LexerWriterInterface $writer, string $string, int $line) { throw new LexerException('Unknown char.', $string, $line); });
Then adds triggers (edges of graph)
$fsm->addTrigger(TriggerTypes::LETTER, 0, 1); $fsm->addTrigger(FSM::DEFAULT_STATE, 0, 'error'); $fsm->addTrigger(TriggerTypes::LETTER, 1, 1); $fsm->addTrigger(FSM::DEFAULT_STATE, 1, -2); $fsm->addTrigger(TriggerTypes::DIGIT, 1, 2); $fsm->addTrigger(FSM::DEFAULT_STATE, 2, -2); $fsm->addTrigger(TriggerTypes::LETTER, 2, 2); $fsm->addTrigger(TriggerTypes::DIGIT, 2, 2);
And display the graph of states
$fsm->visualize();
Lexer
To create tables of tokens, constants and identifiers you need to create Lexer class which receives CharStreamInterface and FSM with our grammar.
$stream = new FileStream('example.z99'); // implements CharStreamInterface
$lexer = new Lexer($stream, $fsm);
And run a tokenize()
method
try { $lexer->tokenize(); foreach ($lexer->getTokens() as $token) { echo $token . PHP_EOL; } foreach ($lexer->getConstants() as $const) { echo $const . PHP_EOL; } foreach ($lexer->getIdentifiers() as $identifier) { echo $identifier . PHP_EOL; } } catch (LexerException $e) { echo $e->getMessage() . "\n With string: '" . $e->getString() . '\'' . "\n in line " . $e->getLine(); }