gdianov/araneus

Araneus is php library for flexible parsing of data from different sources

v1.0.0 2019-02-25 14:37 UTC

This package is auto-updated.

Last update: 2024-04-25 04:09:06 UTC


README

Araneus is php library for flexible parsing of data from different sources

Supported Sources: docx, txt, http resources

  • The minimum required PHP version >= PHP 7.0.

For install use command: composer require gdianov/araneus

How to use?

  1. Create Rule
<?php

require_once 'vendor/autoload.php';

//Create new Rule
class TitleRule extends \Araneus\Rules\BaseRule implements  \Araneus\Interfaces\RuleInterface
{
    public function getPattern(): string
    {
        return '|<title[^>]*?>(.*?)</title>|sei';
    }
}
  1. Create Http Parser
$parseHttp = new \Araneus\Parser(
  new \Araneus\Http\Http('https://google.com')
);

//Attach created rule
$parseHttp->attachRules(new TitleRule()); //You can attach many rules

$result = $parser->run()->fetch(); //array key = regexp, value = found values     
$result = $parser->run()->fetchRules(); //array of Rule objects

              ...
  1. Create Plain Text Parser
$parseTxt = new \Araneus\Parser(
    new \Araneus\File\FilePlainText(__DIR__.'/dst/txt/demo.txt')
);

$parseTxt->attachRules(
  new NumberRule(), 
  new DirtyWordsRule(),
  new UidRule()
);

$result = $parseTxt->run()->fetch();
  1. Create Microsoft Word Document Parser
$parseDocx = new \Araneus\Parser(
    new \Araneus\File\FileDocument(__DIR__.'/dst/documents/demo.docx')
);

$parseDocx->attachRules(
  new UsersRule(),
  new LinksToBooksRule()
);

$result = $parseDocx->run()->fetch();

You can expand the possibilities by adding your sources or modify existing ones by implementing the interfaces: SourceInterface, ContentInterface, RuleInterface