arefshojaei / spider
PHP web crawler
Requires
- php: ^8.0
Requires (Dev)
- phpunit/phpunit: ^10
README
๐ท๏ธ Spider - PHP Web Crawler & HTML Parser
A lightweight and powerful PHP web crawler inspired by jQuery-style DOM manipulation.
Fetch web pages, parse HTML documents, search elements with CSS selectors, manipulate the DOM, and export modified pages with an elegant and simple API.
โจ Features
- ๐ Load and parse any HTML web page
- ๐ CSS selector-based element searching
- ๐ Extract text, HTML, and attributes
- ๐ Iterate over multiple DOM elements
- ๐งน Remove and clean HTML elements
- ๐๏ธ Modify the DOM structure dynamically
- ๐จ Manage CSS classes and IDs
- ๐พ Export modified HTML documents
- โก Lightweight and dependency-free PHP implementation
๐ฅ Installation
Install with Composer
composer require arefshojaei/spider
Clone from GitHub
git clone https://github.com/ArefShojaei/Spider.git
cd Spider
๐ Quick Start
Fetch a page and extract its content:
<?php use Spider\Spider; $spider = new Spider(); $page = $spider->loadHTML("https://google.com"); echo $page->find("title")->text() . PHP_EOL; $page->findAll("a")->each(function ($key, $link) { echo "[LINK] " . $link->attr("href") . PHP_EOL; });
๐ Finding Elements
Search DOM elements using CSS selectors.
Find a single element
$page->find("a"); $page->find(".product"); $page->find("#header");
Find multiple elements
$page->findAll("a"); $page->findAll(".product");
๐ Iterating Elements
Perform operations on element collections.
each()
Loop through every element:
$page->findAll("a")->each(function ($key, $anchor) { echo $anchor->text(); });
map()
Transform elements:
$anchors = $page->findAll("a")->map(function ($key, $anchor) { $anchor->attr("data-id", rand()); return $anchor; });
filter()
Filter elements by a condition:
$links = $page->findAll("a")->filter( fn($key, $anchor) => $anchor->attr("href") );
๐ณ DOM Traversing
Navigate and modify element relationships.
Parent element
$parent = $page->find(".product")->parent();
Insert sibling elements
$page->find(".product") ->before("<p>Before Element</p>"); $page->find(".product") ->after("<p>After Element</p>");
Insert child elements
$page->find(".product") ->append("<p>New Child</p>"); $page->find(".product") ->prepend("<p>First Child</p>");
๐งน Cleaning Elements
Remove content or complete elements.
Empty content
$page->find("p")->empty();
Remove element
$page->find("p")->remove();
๐ Working with Content
Get text or HTML
$text = $page->find("p")->text(); $html = $page->find("p")->html();
Update content
$page->find("p")->text("New text"); $page->find("p")->html("<strong>New HTML</strong>");
๐ท๏ธ Working with Attributes
Read attributes
$attributes = $page->find("a")->attr(); $link = $page->find("a")->attr("href");
Set attributes
$page->find("a")->attr("data-id", 123);
๐จ CSS Classes & IDs
Classes
$page->find("p")->addClass("active"); $page->find("p")->removeClass("active"); $page->find("p")->hasClass("active");
IDs
$page->find("p")->addID("article"); $page->find("p")->removeID("article"); $page->find("p")->hasID("article");
๐พ Export HTML
Save the current DOM document to a file.
$filename = "page"; $path = __DIR__ . "/html/" . $filename . rand() . ".html"; $page->export($path);
๐ก Example Use Cases
Spider can be used for:
- Web scraping and data extraction
- SEO analysis
- Content migration
- HTML cleaning and transformation
- Static website processing
- Automated testing of HTML pages
- Learning how browser DOM engines work
๐ฅ Why Spider?
Spider brings the simplicity of jQuery-style DOM APIs into PHP.
Instead of dealing with complex DOMDocument operations, you can navigate and manipulate HTML documents using a clean and expressive syntax.
It is a great educational project for learning:
- Web crawling concepts
- HTML parsing
- DOM tree manipulation
- CSS selector engines
- Collection processing
- Parser design
๐ค Contributing
Contributions are welcome.
-
Fork the repository
-
Create a feature branch:
git checkout -b feature/amazing-feature
- Commit your changes:
git commit -m "Add amazing feature"
- Push your branch:
git push origin feature/amazing-feature
- Open a Pull Request.
๐จโ๐ป Author
Aref Shojaei
- ๐ง Email: arefshojaei82@gmail.com
- ๐ GitHub: @ArefShojaei
- ๐ฆ Packagist: arefshojaei/spider
โญ Show Your Support
If this project helps you understand web crawling, HTML parsing, and DOM manipulation, consider giving it a Star โญ on GitHub.
Your support motivates future improvements.