undercoder/magenizer

Implements Iterator interface over a String for extract data with an array of tokens. The name comes from Mage and tokenizer :)

v0.1.9 2020-01-04 00:27 UTC

This package is auto-updated.

Last update: 2025-06-04 12:52:07 UTC


README

Implements Iterator interface over a String for extract data with an array of tokens. The name comes from Mage and tokenizer :)

Given an array of tokens, lets iterate over a String, returning every time the corresponding data between the corresponding tokens.

Ex.:

$raw = <<<RAW
STRANGE Document With Weird LAYOUT
THIS document is ELECTRONIC
ANDRES REYES
N°3
WORK AT: HEROICAL SERVICES FOR FREE,
ADDRESS  :  666, Where The Braves Dies Street, Santiago, Chile.
...Some more of weird strings...
RAW;
$tokens = array(
  "name"        => array("from" => "ELECTRONIC",   "to" => "N\s*\°"),
  "doc-number"  => array("from" => "N\s*\°",       "to" => "WORK"),
  "job"         => array("from" => "K AT\:",       "to" => ",\nADDRESS"),
  "address"     => array("from" => "ADDRESS\s*\:", "to" => "\...Some")
);

$mage = new Magenizer($raw, $tokens);
foreach ($m as $next) {
  echo "$next </br>";
}

//OUTPUT:
//------
//ANDRES REYES
//3
// HEROICAL SERVICES FOR FREE
// 666, Where The Braves Dies Street, Santiago, Chile.

As you could see in the example, regexr are supported, but without the delimiting / because they are added internally.

@author Andrés Reyes

Short Story

I wrote this class when at work I had to extract data from a raw string produced by the output of pdftotext. This output was very irregular and its disposition changed from case to case, so I thought the data as a target surrounded by a starting token and an ending token. E.g. (the <> are only for the example): <name> Ermistuligius Of Bermelloauis <number> 12 <heigth> 145 ..etc... So, I could fetch the data pointing to the string between the tokens. At that time, I was studying design patterns, so I implemented this solution using the iterator pattern.

Disclaimer

  • I'm a novice programmer, so feedback is always welcomed.
  • Some coworkers ask me to submit this to Packagist.
  • I'm a native Spanish speaker.