mvrc/php-email-crawler

This package is abandoned and no longer maintained. No replacement package was suggested.

PHP Website Email Crawler, for crawling websites for emails.

dev-master 2020-08-06 02:14 UTC

This package is auto-updated.

Last update: 2021-03-06 03:35:47 UTC


README

CircleCI

Simple Email Crawler

A PHP Email Crawler. Crawl a single website or multiple websites for email address(s) using simple_html_dom

Features

  • Crawl emails from target website(s)
  • Crawl's emails even if the @ sign is (at) or something else! (check classes/config.class.php) for controlling the regexes
  • Deep crawl (crawler navigates through the target site) (check classes/config.class.php) for controlling the path
  • Easily output into a comma separated list or in plaintext
  • Bulk crawl websites (wip)
  • Filter out duplicate email address(s)
  • Tests site connection and validates link before crawling
  • Validates emails before returning to make sure they are valid

Installation

git clone https://github.com/marcosraudkett/simple-email-crawler.git

Usage

Including with autoloader:

<?php
  /* use autoloader */
  require_once "/path/to/includes/init.php";
?>
including without autoloader:
<?php
  /* include email_crawler */
  require_once "/path/to/classes/email_crawler.class.php";
?>
Crawling a site
<?php
  /* Your url that you wish to crawl */
  $url = 'http://example-site.com';
  $crawler = new email_crawler($url, false);
  $crawl = $crawler->crawl_site();
  
  if($crawl['results'] != '')
  {
    if(count($crawl['results']) != 0) 
    {
      foreach($crawl['results'] as $result) 
      {
        echo $result['email'].' (Element: '.$result['element'].') <br>'; 
      }
    }
  }
  
  /* 
  Example output:
    info@examplemail.com (Element: a) 
    info@example.com (Element: p) 
    info@divexample.com (Element: div) 
    info@spanexample.com (Element: span) 
  */
?>
Crawling a site (into a comma separated list)
<?php
  /* Your url that you wish to crawl */
  $url = 'http://example-site.com';
  /* settings: unique: true, depth: null, print_type: list (comma separated) */
  $crawler = new email_crawler($url, true, null, 'list');
  $crawl = $crawler->crawl_site();
  if($crawl != '') { print_r($crawl); }
  
  /* 
  Example output:
    info@examplemail.com, info@example.com, info@divexample.com, info@spanexample.com
  */
?>
Crawling a site (plain list)
<?php
  /* Your url that you wish to crawl */
  $url = 'http://example-site.com';
  /* settings: unique: false, depth: null, print_type: emails_only_plain */
  $crawler = new email_crawler($url, false, null, 'emails_only_plain');
  $crawl = $crawler->crawl_site();
  if($crawl != '') { print_r($crawl); }
  
  /* 
  Example output:
    info@examplemail.com info@example.com info@divexample.com info@spanexample.com
  */
  
?>

Contributing

Feel free to help this project or if you've found a bug then feel free to visit the issues page.