piedweb / url-harvester
Harvest statistics and meta data from an URL or his source code (seo oriented).
Installs: 20 889
Dependents: 2
Suggesters: 0
Security: 0
Stars: 1
Watchers: 2
Forks: 1
Open Issues: 0
Requires
- php: ^7.3|^8.0
- jeremykendall/php-domain-parser: ^6.1
- league/uri: ^6.5
- neitanod/forceutf8: ^2.0.4
- piedweb/curl: ^0.0.18
- piedweb/text-analyzer: ^0.0.4
- spatie/robots-txt: ^1.0.10|^2
- symfony/css-selector: ^5.2
- symfony/dom-crawler: ^5.2
Requires (Dev)
- friendsofphp/php-cs-fixer: ^3.2
- phpunit/phpunit: ^9.5
- symfony/var-dumper: ^5.3
- vimeo/psalm: ^4.4
README
Url Meta Data Harvester
Harvest statistics and meta data from an URL or his source code (seo oriented).
Implemented in Seo Pocket Crawler (source on github).
Install
Via Packagist
$ composer require piedweb/url-harvester
Usage
Harvest Methods :
use \PiedWeb\UrlHarvester\Harvest; use \PiedWeb\UrlHarvester\Link; $url = 'https://piedweb.com'; Harvest::fromUrl($url) ->getResponse()->getInfo('total_time') // load time ->getResponse()->getInfo('size_download') ->getResponse()->getStatusCode() ->getResponse()->getContentType() ->getRes... ->getTag('h1') // @return first tag content (could be html) ->getUniqueTag('h1') // @return first tag content in utf8 (could contain html) ->getMeta('description') // @return string from content attribute or NULL ->getCanonical() // @return string|NULL ->isCanonicalCorrect() // @return bool ->getRatioTxtCode() // @return int ->getTextAnalysis() // @return \PiedWeb\TextAnalyzer\Analysis ->getKws() // @return 10 more used words ->getBreadCrumb() ->indexable($userAgent = 'googlebot') // @return int corresponding to a const from Indexable ->getLinks() ->getLinks(Link::LINK_SELF) ->getLinks(Link::LINK_INTERNAL) ->getLinks(Link::LINK_SUB) ->getLinks(Link::LINK_EXTERNAL) ->getLinkedRessources() // Return an array with all attributes containing a href or a src property ->mayFollow() // check headers and meta and return bool ->getDomain() ->getBaseUrl() ->getRobotsTxt() // @return \Spatie\Robots\RobotsTxt or empty string ->setRobotsTxt($content) // @param string or RobotsTxt
Testing
$ composer test
Contributing
Please see contributing
Credits
License
The MIT License (MIT). Please see License File for more information.