nzo / grabber-bundle
The NzoGrabberBundle is a Symfony Bundle used to Crawl and to Grab all types of links and Tags for (img, js, css) from any website
Installs: 56
Dependents: 0
Suggesters: 0
Security: 0
Stars: 8
Watchers: 2
Forks: 2
Type:symfony-bundle
Requires
- php: ^5.5.9|>=7.0.8
- fabpot/goutte: ^3.0|^4.0
- symfony/framework-bundle: ^3.0|^4.0
Requires (Dev)
- phpunit/phpunit: ^4.8|^5.0
README
The NzoGrabberBundle is a Symfony Bundle used to Crawl
and to Grab
all types of links
, URLs
and Tags
for (img, js, css) from any website.
Features include:
- Compatible Symfony version 3 & 4
- Url Grabber/Crawler for
HTTP/HTTPS
- Url Grabber/Crawler for
HREF / SRC / IMG
types - Exclude any type of file by extension
- Prevent specified URLs from Grabbing
- Compatible php version 5 & 7
Installation
Through Composer:
Install the bundle:
$ composer require nzo/grabber-bundle
Register the bundle in app/AppKernel.php (Symfony V3):
// app/AppKernel.php public function registerBundles() { return array( // ... new Nzo\GrabberBundle\NzoGrabberBundle(), ); }
Usage
In the controller use the Grabber service and specify the options needed:
Get all URLs:
public function indexAction($url) { $tableOfUrls = $this->get('nzo_grabber.grabber')->grabUrls($url); //.... }
OR .. get all URLs not recursively:
Get all URLs no recursive:
public function indexAction($url) { $tableOfUrls = $this->get('nzo_grabber.grabber')->grabUrlsNoRecursive($url); //.... }
OR .. get all URLs that does not figure in the exclude array
:
public function indexAction($url) { $notScannedUrlsTab = ['http://www.exemple.com/about'] $tableOfUrls = $this->get('nzo_grabber.grabber')->grabUrls($url, $notScannedUrlsTab); //.... }
OR .. you can exclude URLs that contains a specified text
and also you can select by file extension
:
public function indexAction($url) { $exclude = 'someText_to_exclude'; $tableOfUrls = $this->get('nzo_grabber.grabber')->grabUrls($url, null, $exclude, array('png', 'pdf')); //.... }
OR .. get all URLs selected by file extension
:
public function indexAction($url) { $tableOfUrls = $this->get('nzo_grabber.grabber')->grabUrls($url, null, null, array('png', 'pdf')); //.... }
OR .. get all Img Files
from the specified URL:
public function indexAction($url) { $img = $this->get('nzo_grabber.grabber')->grabImg($url); //.... }
OR .. get all Js Files
from the specified URL:
public function indexAction($url) { $js = $this->get('nzo_grabber.grabber')->grabJs($url); //.... }
OR .. get all Css Files
from the specified URL:
public function indexAction($url) { $css = $this->get('nzo_grabber.grabber')->grabCss($url); //.... }
OR .. get all Css
, Img
and Js
Files from the specified URL:
public function indexAction($url) { $extrat = $this->get('nzo_grabber.grabber')->grabExtrat($url); //.... }
License
This bundle is under the MIT license. See the complete license in the bundle: