lobotomised / laravel-autocrawler
A tool to crawl your own laravel installation checking your HTTP status codes
Installs: 27 147
Dependents: 0
Suggesters: 0
Security: 0
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 1
Requires
- php: ^8.1
- illuminate/console: ^10.0|^11.0
- illuminate/filesystem: ^10.0|^11.0
- spatie/crawler: ^8.2
Requires (Dev)
- nunomaduro/collision: ^6.2|^8.0
- nunomaduro/larastan: ^2.1|^2.4
- orchestra/testbench: ^8.0|^9.0
- pestphp/pest: ^1.21|^2.34
- pestphp/pest-plugin-laravel: ^1.2|^2.3
- phpunit/phpunit: ^9.5|^10.5
This package is auto-updated.
Last update: 2025-02-05 23:18:25 UTC
README
Using this package you can check if your application have broken links.
php artisan crawl 200 OK - http://myapp.test/ 200 OK - http://myapp.test/login found on http://myapp.test/ 200 OK - http://myapp.test/register found on http://myapp.test/ 301 301 Moved Permanently - http://myapp.test/homepage found on http://myapp.test/register 404 Not Found - http://myapp.test/brokenlink found on http://myapp.test/register 200 OK - http://myapp.test/features found on http://myapp.test/ Crawl finished Results: Status 200: 4 founds Status 301: 1 found Status 404: 1 found
Installation
This package can be installed via Composer:
composer require --dev lobotomised/laravel-autocrawler
When crawling your site, it will automatically detect the url your application is using. If instead it scan http://localhost, check in your .env you properly configure the APP_URL variable
APP_URL="http://myapp.test"
Usage
Crawl a specific url
By default, the crawler will crawl the URL from your current laravel installation. You can force the url with the --url
option:
php artisan crawl --url=http://myapp.test/my-page
Concurrent connection
The crawler run with 10 concurrent connections to speed up the crawling process. You can change that by passing the --concurrency
option:
php artisan crawl --concurrency=5
Timeout
The request timeout is by default 30 seconds. Use the --timeout
to change this value
php artisan crawl --timeout=10
Ignore robots.txt
By default, the crawler respect the robots.txt. These rules can be ignored with the --ignore-robots
option:
php artisan crawl --ignore-robots
External link
When the crawler find an external link, it will check this link. It can be deactivated with the --ignore-external-links
option:
php artisan crawl --ignore-external-links
Log non-2xx or non-3xx status code
By default, the crawler will only in your console. You can log all non-2xx or non 3xx status code to a file with the --output
option. Result will be store in storage/autocrawler/output.txt
php artisan crawl --output
The output.txt will look like that:
403 Forbidden - http://myapp.test/dashboard found on http://myapp.test/home
404 Not Found - http://myapp.test/brokenlink found on http://myapp.test/register
Fail when non-2xx or non-3xx are found
By default, the command exit codes is 0. You can change it to 1 to indicate that the command has failed with the --fail-on-error
php artisan crawl --fail-on-error
Launch the robot interactively
Eventually, you may configure the crawler interactively by using the --interactive
option:
php artisan crawl --interactive
Working with GitHub actions
To execute the crawler you first need to start a web server. You can choose to install apache or nginx. Here is an example using the php build-in webserver
If the crawl found some non-2xx or non-3xx response, the action will fail, and the result will be store as an artifacts of the Action.
steps:
- uses: actions/checkout@v3
- name: Prepare The Environment
run: cp .env.example .env
- name: Install Composer Dependencies
run: composer install
- name: Generate Application Key
run: php artisan key:generate
- name: Install npm Dependencies
run: npm ci
- name: Compile assets
run: npm run build
- name: Start php build-in webserver
run: (php artisan serve &) || /bin/true
- name: Crawl website
run: php artisan crawl --url=http://localhost:8000/ --fail-on-error --output
- name: Upload artifacts
if: failure()
uses: actions/upload-artifact@master
with:
name: Autocrawler
path: ./storage/autocrawler
Documentation
All commands and informations are available with the command:
php artisan crawl --help
Alternatives
This package is heavily inspire by spatie/http-status-check, but instead of being a project dependency, it is a global installation
Testing
First we need to start the included node http server in a separate terminal.
make start
Then to run the tests:
make test
Changelog
Please see CHANGELOG for more information on what has changed recently.
License
The MIT License (MIT). Please see License File for more information.