franzip / serp-fetcher
Wrapper around SimpleHtmlDom to easily fetch data from Search Engine Result Pages with built-in caching support.
Requires
- php: >=5.4.0
- thauex/simple-html-dom: dev-master
Requires (Dev)
- phpunit/phpunit: 4.0.*
- satooshi/php-coveralls: dev-master
This package is auto-updated.
Last update: 2024-11-13 19:21:28 UTC
README
SerpFetcher
Wrapper around SimpleHtmlDom to easily fetch data from Search Engine Result Pages with built-in caching support.
Installing via Composer (recommended)
Install composer in your project:
curl -s http://getcomposer.org/installer | php
Create a composer.json file in your project root:
{
"require": {
"franzip/serp-fetcher": "0.2.*@dev"
}
}
Install via composer
php composer.phar install
Supported Search Engines
- Bing
- Ask
- Yahoo
Legal Disclaimer
Under no circumstances I shall be considered liable to any user for direct, indirect, incidental, consequential, special, or exemplary damages, arising from or relating to userʹs use or misuse of this software. Consult the following Terms of Service before using SerpFetcher:
Description
You can create a SerpFetcher using both the provided Factory or importing the fetcher you need directly into your namespace.
All the various implementations share a common abstract ancestor class
SerpFetcher
, and therefore expose five main configurable attributes through
setters:
SerpFetcher($cacheDir = 'cache', $cacheTTL = 24, $caching = true, $cachingForever = false, $charset = 'UTF-8')
$cacheDir
- Path to the folder to use as temporary cache.
- You can specify an absolute or relative path.
- If it doesn't exist, the folder will be automatically created on instantiation.
$cacheTTL
- The expiration time of the cache, expressed in hours.
$caching
- Flag if the object should use caching.
$cacheForever
- Flag if the object should use permanent caching (cached pages will never expire).
$charset
- Charset to use.
- Note: Only UTF-8 (used as default) has been tested so far.
The main method fetch()
implemented for each class returns an associative array
with urls, snippets and titles for a given SERP url.
If the array with fetched results has less than 10 entries, padding will be added
to sum up to 10.
Constructor (using Factory)
Supply the name of the search engine and you are ready to go. It is possible to pass an optional array with custom arguments.
use Franzip\SerpFetcher\SerpFetcherBuilder; $googleFetcher = SerpFetcherBuilder::create('Google'); $askFetcher = SerpFetcherBuilder::create('Ask', array($cacheDir = 'foo/bar')); $bingFetcher = SerpFetcherBuilder::create('Bing', array($cacheDir = 'baz', $cacheTTL = 1)); ...
Constructor (using Fetchers directly)
use Franzip\SerpFetcher\Fetchers\AskFetcher; use Franzip\SerpFetcher\Fetchers\BingFetcher; use Franzip\SerpFetcher\Fetchers\GoogleFetcher; $googleFetcher = new GoogleFetcher(); $askFetcher = new AskFetcher('foo/bar'); $bingFetcher = new BingFetcher('baz', 1); ...
Basic Usage
use Franzip\SerpFetcher\SerpFetcherBuilder; $googleFetcher = SerpFetcherBuilder::create('Google'); $urlToFetch = 'http://www.google.com/search?q=foo'; $fetchedResults = $googleFetcher->fetch($urlToFetch); // doing your things with the results...
cacheHit()
Your code can handle cache hit and cache miss.
use Franzip\SerpFetcher\SerpFetcherBuilder; $googleFetcher = SerpFetcherBuilder::create('Google'); $urlToFetch = 'http://www.google.com/search?q=foo'; var_dump($googleFetcher->cacheHit($urlToFetch)); // bool(false) $fetchedResults = $googleFetcher->fetch('http://www.google.com/search?q=foo'); var_dump($googleFetcher->cacheHit($urlToFetch)); // bool(true) if ($googleFetcher->cacheHit($urlToFetch)) { // handle cache hit } else { // handle cache miss }
flushCache() and removeCache()
Each fetched url get cached as a single file.
You can remove all those files by calling flushCache()
.
removeCache()
will also remove the folder used as cache.
use Franzip\SerpFetcher\SerpFetcherBuilder; $googleFetcher = SerpFetcherBuilder::create('Google'); $urlToFetch = 'http://www.google.com/search?q=foo'; var_dump($googleFetcher->cacheHit($urlToFetch)); // bool(false) $fetchedResults = $googleFetcher->fetch('http://www.google.com/search?q=foo'); var_dump($googleFetcher->cacheHit($urlToFetch)); // bool(true) $googleFetcher->flushCache(); var_dump($googleFetcher->cacheHit($urlToFetch)); // bool(false)
Fine Tuning (Setters)
use Franzip\SerpFetcher\SerpFetcherBuilder; $googleFetcher = SerpFetcherBuilder::create('Google'); // change cache folder to foo/ $googleFetcher->setCacheDir('foo'); // change cache expiration to 2 days $googleFetcher->setCacheTTL(48); // enable permanent caching $googleFetcher->enableCachingForever();
Using multiple cache directories
Just switch between folders with the setCacheDir()
method
use Franzip\SerpFetcher\SerpFetcherBuilder; $googleFetcher = SerpFetcherBuilder::create('Google', array('foo')); // fetch some stuff... foo/ will be used as cache folder now ... // fetched results will now be cached in foobar/ $googleFetcher->setCacheDir('foobar'); // switch back to the initial cache folder foo/ $googleFetcher->setCacheDir('foo');
TODOs
- A decent exceptions system.
- Support for HHVM.
- Implement and test different charset support.
- Refactoring messy tests.
License
MIT Public License.