gdianov / araneus
Araneus is php library for flexible parsing of data from different sources
v1.0.0
2019-02-25 14:37 UTC
Requires
- php: >=7.0.0
- rmccue/requests: >=1.0
This package is auto-updated.
Last update: 2025-04-25 06:21:09 UTC
README
Araneus is php library for flexible parsing of data from different sources
Supported Sources: docx, txt, http resources
- The minimum required PHP version >= PHP 7.0.
For install use command: composer require gdianov/araneus
How to use?
- Create Rule
<?php require_once 'vendor/autoload.php'; //Create new Rule class TitleRule extends \Araneus\Rules\BaseRule implements \Araneus\Interfaces\RuleInterface { public function getPattern(): string { return '|<title[^>]*?>(.*?)</title>|sei'; } }
- Create Http Parser
$parseHttp = new \Araneus\Parser( new \Araneus\Http\Http('https://google.com') ); //Attach created rule $parseHttp->attachRules(new TitleRule()); //You can attach many rules $result = $parser->run()->fetch(); //array key = regexp, value = found values $result = $parser->run()->fetchRules(); //array of Rule objects ...
- Create Plain Text Parser
$parseTxt = new \Araneus\Parser( new \Araneus\File\FilePlainText(__DIR__.'/dst/txt/demo.txt') ); $parseTxt->attachRules( new NumberRule(), new DirtyWordsRule(), new UidRule() ); $result = $parseTxt->run()->fetch();
- Create Microsoft Word Document Parser
$parseDocx = new \Araneus\Parser( new \Araneus\File\FileDocument(__DIR__.'/dst/documents/demo.docx') ); $parseDocx->attachRules( new UsersRule(), new LinksToBooksRule() ); $result = $parseDocx->run()->fetch();
You can expand the possibilities by adding your sources or modify existing ones by implementing the interfaces: SourceInterface, ContentInterface, RuleInterface