riodevnet/elephscraper

ElephScraper is a lightweight and PHP-native web scraping toolkit built using Guzzle and Symfony DomCrawler. It provides a clean and powerful interface to extract HTML content, metadata, and structured data from any website.

v1.0.0 2025-07-03 17:07 UTC

This package is auto-updated.

Last update: 2025-09-15 07:36:39 UTC


README

ElephScraper is a lightweight and PHP-native web scraping toolkit built using Guzzle and Symfony DomCrawler. It provides a clean and powerful interface to extract HTML content, metadata, and structured data from any website.

Fast. Clean. Eleph-style scraping. ๐Ÿ˜โšก

๐Ÿš€ Features

  • โœ… Extract metadata: title, description, keywords, author, charset, canonical, and more
  • โœ… Supports Open Graph, Twitter Card, CSRF tokens, and HTTP-equiv headers
  • โœ… Extract headings, paragraphs, images, lists, and links
  • โœ… Powerful filter() method with support for class/ID/tag-based selectors
  • โœ… Return raw HTML or clean plain text
  • โœ… Clean return types: string, array, or associative array
  • โœ… Built with Guzzle + Symfony DomCrawler + CssSelector

๐Ÿ“ฆ Installation

Install via Composer:

composer require riodevnet/elephscraper

Requires PHP 7.4 or newer.

๐Ÿ› ๏ธ Basic Usage

<?php

require_once __DIR__ . '/vendor/autoload.php';

use Riodevnet\Elephscraper\ElephScraper;

$scraper = new ElephScraper("https://example.com");

echo $scraper->title(); // "Welcome to Example.com"
echo $scraper->description(); // "Example site for testing"
print_r($scraper->h1()); // ["Main Title", "News"]
print_r($scraper->openGraph());

๐Ÿงช Available Methods

๐Ÿ”น Page Metadata

$scraper->title();
$scraper->description();
$scraper->keywords();
$scraper->keywordString();
$scraper->charset();
$scraper->canonical();
$scraper->contentType();
$scraper->author();
$scraper->csrfToken();
$scraper->image();

๐Ÿ”น Open Graph & Twitter Card

$scraper->openGraph();                 // All OG meta
$scraper->openGraph("og:title");      // Specific OG tag

$scraper->twitterCard();              // All Twitter tags
$scraper->twitterCard("twitter:title");

๐Ÿ”น Headings & Text

$scraper->h1();
$scraper->h2();
$scraper->h3();
$scraper->h4();
$scraper->h5();
$scraper->h6();
$scraper->p();

๐Ÿ”น Lists

$scraper->ul(); // all <ul><li> text
$scraper->ol(); // all <ol><li> text

๐Ÿ”น Images

$scraper->images();         // just src URLs
$scraper->imageDetails();   // src, alt, title

๐Ÿ”น Links

$scraper->links();        // just hrefs
$scraper->linkDetails();  // full detail

๐Ÿ” Custom DOM Filtering

โ–ธ Example: Filter Single Element

$scraper->filter(
    element: 'div',
    attributes: ['id' => 'main'],
    multiple: false,
    extract: ['.title', '#desc', 'p'],
    returnHtml: false
);

โ–ธ Example: Filter Multiple Elements

$scraper->filter(
    element: 'div',
    attributes: ['class' => 'card'],
    multiple: true,
    extract: ['h2', '.subtitle', '#info'],
    returnHtml: false
);

โ–ธ Example: Return HTML Content

$scraper->filter(
    element: 'section',
    attributes: ['class' => 'hero'],
    returnHtml: true
);

Extract selectors support:

  • Tag names: h1, p, span, etc.
  • Class: .className
  • ID: #idName

Output keys auto-normalized to original selector.

๐Ÿค Contributing

Found a bug? Want to add features? Open an issue or create a pull request!

๐Ÿ“„ License

MIT License ยฉ 2025 โ€” ElephScraper

๐Ÿ”— Related Libraries

๐Ÿ’ก Why ElephScraper?

ElephScraper is your dependable PHP elephant โ€” strong, smart, and always ready to extract the right data.