mvrc / php-email-crawler
PHP Website Email Crawler, for crawling websites for emails.
Fund package maintenance!
Liberapay
This package is auto-updated.
Last update: 2021-03-06 03:35:47 UTC
README
Simple Email Crawler
A PHP Email Crawler. Crawl a single website or multiple websites for email address(s) using simple_html_dom
Features
- Crawl emails from target website(s)
- Crawl's emails even if the @ sign is (at) or something else! (check classes/config.class.php) for controlling the regexes
- Deep crawl (crawler navigates through the target site) (check classes/config.class.php) for controlling the path
- Easily output into a comma separated list or in plaintext
- Bulk crawl websites (wip)
- Filter out duplicate email address(s)
- Tests site connection and validates link before crawling
- Validates emails before returning to make sure they are valid
Installation
git clone https://github.com/marcosraudkett/simple-email-crawler.git
Usage
Including with autoloader:
<?php /* use autoloader */ require_once "/path/to/includes/init.php"; ?>
including without autoloader:
<?php /* include email_crawler */ require_once "/path/to/classes/email_crawler.class.php"; ?>
Crawling a site
<?php /* Your url that you wish to crawl */ $url = 'http://example-site.com'; $crawler = new email_crawler($url, false); $crawl = $crawler->crawl_site(); if($crawl['results'] != '') { if(count($crawl['results']) != 0) { foreach($crawl['results'] as $result) { echo $result['email'].' (Element: '.$result['element'].') <br>'; } } } /* Example output: info@examplemail.com (Element: a) info@example.com (Element: p) info@divexample.com (Element: div) info@spanexample.com (Element: span) */ ?>
Crawling a site (into a comma separated list)
<?php /* Your url that you wish to crawl */ $url = 'http://example-site.com'; /* settings: unique: true, depth: null, print_type: list (comma separated) */ $crawler = new email_crawler($url, true, null, 'list'); $crawl = $crawler->crawl_site(); if($crawl != '') { print_r($crawl); } /* Example output: info@examplemail.com, info@example.com, info@divexample.com, info@spanexample.com */ ?>
Crawling a site (plain list)
<?php /* Your url that you wish to crawl */ $url = 'http://example-site.com'; /* settings: unique: false, depth: null, print_type: emails_only_plain */ $crawler = new email_crawler($url, false, null, 'emails_only_plain'); $crawl = $crawler->crawl_site(); if($crawl != '') { print_r($crawl); } /* Example output: info@examplemail.com info@example.com info@divexample.com info@spanexample.com */ ?>
Contributing
Feel free to help this project or if you've found a bug then feel free to visit the issues page.