userforce/scraper

Scrape web pages and structure results.

Installs: 6

Dependents: 0

Suggesters: 0

Security: 0

Stars: 0

Watchers: 0

Forks: 0

Open Issues: 0

pkg:composer/userforce/scraper

v1.1.2 2019-07-09 22:45 UTC

This package is not auto-updated.

Last update: 2025-10-23 05:27:56 UTC


README

Scrape web pages and structure results using regular expressions.

Latest Stable Version Total Downloads composer.lock

Installation

Require this package with Composer

composer require userforce/scraper

Register Scraper with Laravel. Open config/app.php and add UserForce\ScraperServiceProvider at the end of providers list

'providers' => [
    ...
    UserForce\Scraper\ScraperServiceProvider::class,
],

Then at the end of the aliases list in the same config/app.php add UserForce\Facade\Scraper

'aliases' => [
    ...
    'Scraper' => UserForce\Scraper\Facade\Scraper::class,
],

You can now begin using Scraper

Usage

use Scraper;

Scraper has one method find that accepts one parameter:

$result = Scraper::find($config);

Example

Each config option must have to keys url and regex.
You can define multiple config options in a tree ( the structure will be kept ).
Also regex can be a string or an associative array and can't be empty. Each string will be interpreted as regular expression.

$config = [
    'ibmachine' => [
        'url' => 'https://ibmachine.com/machine',
        'regex' => [
            'name' => 'machine\/view\/[0-9]{1,7}" itemprop="name">\s*(<span.*\/span>)?\s*(.*)\s*<\/a>',
            'links' => [
                'url' => 'href=\"(http.*machine\/view\/[\d]{1,7})\"\sitemprop'
            ]
        ]
    ]
];

$result = Scraper::find($config);

$result->get();

Result

array:1 [▼
  "ibmachine" => array:2 [▼
    "name" => array:3 [▼
      0 => array:20 [▶]
      1 => array:20 [▶]
      2 => array:20 [▼
        0 => "Alesatrice TOS Whn q 13 anno 2001"
        1 => "CURVATRICE TAURING mod. DELTA 60 CNC"
        2 => "Calandre idrauliche 3 rulli"
        ...
      ]
    ]
    "links" => array:1 [▼
      "url" => array:2 [▼
        0 => array:20 [▶]
        1 => array:20 [▶]
      ]
    ]
  ]
]