riodevnet / elephscraper
ElephScraper is a lightweight and PHP-native web scraping toolkit built using Guzzle and Symfony DomCrawler. It provides a clean and powerful interface to extract HTML content, metadata, and structured data from any website.
Requires
- guzzlehttp/guzzle: ^7.9
- symfony/css-selector: ^7.3
- symfony/dom-crawler: ^6.0
README
ElephScraper is a lightweight and PHP-native web scraping toolkit built using Guzzle and Symfony DomCrawler. It provides a clean and powerful interface to extract HTML content, metadata, and structured data from any website.
Fast. Clean. Eleph-style scraping. ๐โก
๐ Features
- โ Extract metadata: title, description, keywords, author, charset, canonical, and more
- โ Supports Open Graph, Twitter Card, CSRF tokens, and HTTP-equiv headers
- โ Extract headings, paragraphs, images, lists, and links
- โ
Powerful
filter()
method with support for class/ID/tag-based selectors - โ Return raw HTML or clean plain text
- โ Clean return types: string, array, or associative array
- โ Built with Guzzle + Symfony DomCrawler + CssSelector
๐ฆ Installation
Install via Composer:
composer require riodevnet/elephscraper
Requires PHP 7.4 or newer.
๐ ๏ธ Basic Usage
<?php require_once __DIR__ . '/vendor/autoload.php'; use Riodevnet\Elephscraper\ElephScraper; $scraper = new ElephScraper("https://example.com"); echo $scraper->title(); // "Welcome to Example.com" echo $scraper->description(); // "Example site for testing" print_r($scraper->h1()); // ["Main Title", "News"] print_r($scraper->openGraph());
๐งช Available Methods
๐น Page Metadata
$scraper->title(); $scraper->description(); $scraper->keywords(); $scraper->keywordString(); $scraper->charset(); $scraper->canonical(); $scraper->contentType(); $scraper->author(); $scraper->csrfToken(); $scraper->image();
๐น Open Graph & Twitter Card
$scraper->openGraph(); // All OG meta $scraper->openGraph("og:title"); // Specific OG tag $scraper->twitterCard(); // All Twitter tags $scraper->twitterCard("twitter:title");
๐น Headings & Text
$scraper->h1(); $scraper->h2(); $scraper->h3(); $scraper->h4(); $scraper->h5(); $scraper->h6(); $scraper->p();
๐น Lists
$scraper->ul(); // all <ul><li> text $scraper->ol(); // all <ol><li> text
๐น Images
$scraper->images(); // just src URLs $scraper->imageDetails(); // src, alt, title
๐น Links
$scraper->links(); // just hrefs $scraper->linkDetails(); // full detail
๐ Custom DOM Filtering
โธ Example: Filter Single Element
$scraper->filter( element: 'div', attributes: ['id' => 'main'], multiple: false, extract: ['.title', '#desc', 'p'], returnHtml: false );
โธ Example: Filter Multiple Elements
$scraper->filter( element: 'div', attributes: ['class' => 'card'], multiple: true, extract: ['h2', '.subtitle', '#info'], returnHtml: false );
โธ Example: Return HTML Content
$scraper->filter( element: 'section', attributes: ['class' => 'hero'], returnHtml: true );
Extract selectors support:
- Tag names:
h1
,p
,span
, etc.- Class:
.className
- ID:
#idName
Output keys auto-normalized to original selector.
๐ค Contributing
Found a bug? Want to add features? Open an issue or create a pull request!
๐ License
MIT License ยฉ 2025 โ ElephScraper
๐ Related Libraries
๐ก Why ElephScraper?
ElephScraper is your dependable PHP elephant โ strong, smart, and always ready to extract the right data.