marioungui / php-component-spider
a PHP package for scraping Brands Websites
Requires
- php: ^8.1.2
- league/csv: ^9.8
- spekulatius/phpscraper: ^2.0
- symfony/browser-kit: ^6.4
- symfony/http-kernel: ^5.4
- dev-main
- v0.7.2
- v0.7.1
- v0.7.0
- v0.6.0
- v0.5.5
- v0.5.4
- v0.5.3
- v0.5.2
- v0.5.1
- v0.5
- v0.4.10
- v0.4.9
- v0.4.8
- v0.4.7
- v0.4.6
- v0.4.4
- v0.4.3
- v0.4.2
- v0.4.1
- v0.4.0
- v0.3.1
- v0.3.0
- v0.2.1
- v0.2.0
- v0.1.1
- v0.1.0
- dev-dependabot/composer/symfony/http-client-6.4.15
- dev-dependabot/composer/symfony/http-foundation-6.4.14
- dev-dependabot/composer/symfony/http-client-6.4.14
This package is auto-updated.
Last update: 2024-11-13 16:50:12 UTC
README
This PHP Component Spider is designed to scrape websites for specific components or search criteria defined by XPath filters. It uses the PHPScraper library to fetch and process web pages, and the League\Csv library to log the results in CSV files. This tool is easy to extend with custom XPath filters to meet various scraping needs.
Features
- Scrape websites for specific components or text based on XPath filters.
- Log results into CSV files for further analysis.
- Configurable timeout and maximum redirects.
- Easy to extend with additional filters.
Requirements
- PHP 8.1 or higher
- Composer
Build & Run from Source Code
- Clone the repository:
git clone https://github.com/marioungui/PHP-Component-Spider.git
- Navigate to the project directory:
cd PHP-Component-Spider
- Install the dependencies using Composer:
composer install
- Build the Phar package:
php -d phar.readonly=0 phar-creator.php
- Run the batch spider.bat
- Follow the on-screen instructions to select the component to search for and the domain to scrape.
Filters
The filters are defined in filters.php and use XPath to identify specific components on the web pages. Here are the current filters available:
Extending with Custom Filters
Extending the tool with new filters is simple:
- Open the
filters.php
file. - Add a new
case
in theswitch
statement with your component name or index. - Define the
$component
and$filter
variables with your custom XPath.
Example:
case 'new-component': case 11: $component = "New Component"; $filter = "//*[@class='new-component-class']"; break;
Contributing
Feel free to submit issues or pull requests if you have any improvements or new features you'd like to add.
License
This project is licensed under the MIT License.