creode / craft-page-crawler
This will allow a page to be crawled for useful content during an indexing process.
Installs: 40
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 2
Type:craft-plugin
Requires
- craftcms/cms: ^4.0
- phpquery/phpquery: ^0.0.4
Requires (Dev)
README
Versions
For details about which version of this package to use with your version of Craft CMS please see the table below:
Required config file
Please include and populate the config file "config/page-crawler.php". Use the following as a starting point.
<?php
use craft\helpers\App;
return [
/*
* CSS selectors for elements which should be removed from rendered page markup during a page crawl.
*/
'elementsToRemove' => [
],
/**
* If site is behind a htaccess password, you can add the below variables to your .env to determine if we need
* to bypass it.
*/
'http-auth-credentials' => [
'username' => App::env('PAGE_CRAWLER_AUTH_USER'),
'password' => App::env('PAGE_CRAWLER_AUTH_PASSWORD')
],
];
Performing a crawl
You can perform a crawl using PHP by calling the following function. This function accepts a relative page path and will return all relevant page content as text.
$content = \creode\pagecrawler\Plugin::$plugin->crawlerService->crawl($pagePath);