arefshojaei/spider

PHP web crawler

Maintainers

Package info

github.com/ArefShojaei/Spider

pkg:composer/arefshojaei/spider

Statistics

Installs: 9

Dependents: 1

Suggesters: 0

Stars: 0

Open Issues: 0

2.3.2 2026-06-16 16:05 UTC

This package is auto-updated.

Last update: 2026-06-16 16:05:59 UTC


README

logo

๐Ÿ•ท๏ธ Spider - PHP Web Crawler & HTML Parser

A lightweight and powerful PHP web crawler inspired by jQuery-style DOM manipulation.

Fetch web pages, parse HTML documents, search elements with CSS selectors, manipulate the DOM, and export modified pages with an elegant and simple API.

โœจ Features

  • ๐ŸŒ Load and parse any HTML web page
  • ๐Ÿ” CSS selector-based element searching
  • ๐Ÿ“„ Extract text, HTML, and attributes
  • ๐Ÿ” Iterate over multiple DOM elements
  • ๐Ÿงน Remove and clean HTML elements
  • ๐Ÿ—๏ธ Modify the DOM structure dynamically
  • ๐ŸŽจ Manage CSS classes and IDs
  • ๐Ÿ’พ Export modified HTML documents
  • โšก Lightweight and dependency-free PHP implementation

๐Ÿ“ฅ Installation

Install with Composer

composer require arefshojaei/spider

Clone from GitHub

git clone https://github.com/ArefShojaei/Spider.git
cd Spider

๐Ÿš€ Quick Start

Fetch a page and extract its content:

<?php

use Spider\Spider;

$spider = new Spider();

$page = $spider->loadHTML("https://google.com");

echo $page->find("title")->text() . PHP_EOL;

$page->findAll("a")->each(function ($key, $link) {
    echo "[LINK] " . $link->attr("href") . PHP_EOL;
});

๐Ÿ”Ž Finding Elements

Search DOM elements using CSS selectors.

Find a single element

$page->find("a");
$page->find(".product");
$page->find("#header");

Find multiple elements

$page->findAll("a");
$page->findAll(".product");

๐Ÿ” Iterating Elements

Perform operations on element collections.

each()

Loop through every element:

$page->findAll("a")->each(function ($key, $anchor) {
    echo $anchor->text();
});

map()

Transform elements:

$anchors = $page->findAll("a")->map(function ($key, $anchor) {
    $anchor->attr("data-id", rand());

    return $anchor;
});

filter()

Filter elements by a condition:

$links = $page->findAll("a")->filter(
    fn($key, $anchor) => $anchor->attr("href")
);

๐ŸŒณ DOM Traversing

Navigate and modify element relationships.

Parent element

$parent = $page->find(".product")->parent();

Insert sibling elements

$page->find(".product")
     ->before("<p>Before Element</p>");

$page->find(".product")
     ->after("<p>After Element</p>");

Insert child elements

$page->find(".product")
     ->append("<p>New Child</p>");

$page->find(".product")
     ->prepend("<p>First Child</p>");

๐Ÿงน Cleaning Elements

Remove content or complete elements.

Empty content

$page->find("p")->empty();

Remove element

$page->find("p")->remove();

๐Ÿ“„ Working with Content

Get text or HTML

$text = $page->find("p")->text();

$html = $page->find("p")->html();

Update content

$page->find("p")->text("New text");

$page->find("p")->html("<strong>New HTML</strong>");

๐Ÿท๏ธ Working with Attributes

Read attributes

$attributes = $page->find("a")->attr();

$link = $page->find("a")->attr("href");

Set attributes

$page->find("a")->attr("data-id", 123);

๐ŸŽจ CSS Classes & IDs

Classes

$page->find("p")->addClass("active");

$page->find("p")->removeClass("active");

$page->find("p")->hasClass("active");

IDs

$page->find("p")->addID("article");

$page->find("p")->removeID("article");

$page->find("p")->hasID("article");

๐Ÿ’พ Export HTML

Save the current DOM document to a file.

$filename = "page";

$path = __DIR__ . "/html/" . $filename . rand() . ".html";

$page->export($path);

๐Ÿ’ก Example Use Cases

Spider can be used for:

  • Web scraping and data extraction
  • SEO analysis
  • Content migration
  • HTML cleaning and transformation
  • Static website processing
  • Automated testing of HTML pages
  • Learning how browser DOM engines work

๐Ÿ”ฅ Why Spider?

Spider brings the simplicity of jQuery-style DOM APIs into PHP.

Instead of dealing with complex DOMDocument operations, you can navigate and manipulate HTML documents using a clean and expressive syntax.

It is a great educational project for learning:

  • Web crawling concepts
  • HTML parsing
  • DOM tree manipulation
  • CSS selector engines
  • Collection processing
  • Parser design

๐Ÿค Contributing

Contributions are welcome.

  1. Fork the repository

  2. Create a feature branch:

git checkout -b feature/amazing-feature
  1. Commit your changes:
git commit -m "Add amazing feature"
  1. Push your branch:
git push origin feature/amazing-feature
  1. Open a Pull Request.

๐Ÿ‘จโ€๐Ÿ’ป Author

Aref Shojaei

โญ Show Your Support

If this project helps you understand web crawling, HTML parsing, and DOM manipulation, consider giving it a Star โญ on GitHub.

Your support motivates future improvements.