sjaakp/yii2-linkchecker

Link Checker Extension for Yii2

Installs: 2

Dependents: 0

Suggesters: 0

Security: 0

Stars: 3

Watchers: 2

Forks: 0

Open Issues: 0

Type:yii2-extension

1.0.0 2021-03-04 11:23 UTC

This package is auto-updated.

Last update: 2024-04-04 18:10:58 UTC


README

Scan links for errors with Yii2 PHP Framework

Latest Stable Version Total Downloads License

Linkchecker is a module for the Yii 2.0 PHP Framework. It can check all the links and image sources stored in the web site's database.

Installation

Install yii2-linkchecker in the usual way with Composer. Add the following to the require section of your composer.json file:

"sjaakp/yii2-linkchecker": "*"

or run:

composer require sjaakp/yii2-linkchecker

You can manually install yii2-linkchecker by downloading the source in ZIP-format.

Module

Linkchecker is a module in the Yii2 framework. It has to be configured in the main configuration file, usually called web.php or main.php in the config directory. Add the following to the configuration array:

<?php
// ...
'modules' => [
    'linkchecker' => [
        'class' => 'sjaakp\linkchecker\Module',
        // several options
    ],
],
// ...

The module has to be bootstrapped. Do this by adding the following to the application configuration array:

<php
// ...
'bootstrap' => [
    'linkchecker',
]
// ...

There probably already is a bootstrap property in your configuration file; just add 'linkchecker' to it.

Important: the module should also be set up in the same way in the console configuration (usually called console.php).

Console command

To complete the installation, a console command has to be run. It will create a database table for the checked URLs:

yii migrate

The migration applied is called sjaakp\linkchecker\migrations\m000000_000000_init.

Usage

After installation, Linkchecker reports its findings on www.example.com/linkchecker. Initially, the list shown there will be empty. The page sports a 'Scan' button. After clicking it, Linkchecker will collect all the URL links and image URIs (href, src, and srcset values), store them in its database table, and try to access them. This process may take some time, heavily dependent on the quality of the URLs. Checking over 400 URL's typically costed less than 20 seconds on my test system with a high speed internet connection.

For this to work, Linkchecker has to be instructed where to look for URLs. This involves the setting of the source-option of the module.

Suppose we have a web site publishing very interesting articles. It is built with two models:

/**
* @property string $mainHtml    // contains HTML with href's and src's
* @property string $asideHtml   // likewise
* ... other attributes, like $title, $publicationDate ...
*/
class Article extends ActiveRecord
{
    // ... Article methods and stuff ...
}

/**
* @property string $homepageUrl   // contains pure URL
* ... other attributes, like $name, $birthday ...
*/
class Author extends ActiveRecord
{
    // ... Author methods and stuff ...
}

We can instruct Linkchecker to check the URLs in both classes by initializing the module like so:

<?php
// ...
'modules' => [
    'linkchecker' => [
        'class' => 'sjaakp\linkchecker\Module',
        'source' => [
            [
                'model' => 'app\models\Article',
                'htmlAttributes' => [ 'mainHtml', 'asideHtml']
            ],
            [
                'model' => 'app\models\Author',
                'urlAttributes' => 'homepageUrl'
            ],
        ]
        // ... other options ...
    ],
],
// ...
  • source is an array of arrays, describing the places where Linkchecker should look for URLs. They have the following fields:

    • model: the fully classified class name of the ActiveRecord. It can also be a table name.
    • htmlAttributes: one (string) ore more (array of strings) names of attributes containing HTML. Linkchecker will distill URLs of all the hrefs, srcs, and srcsets in the HTML. Default: [ ] (empty array).
    • urlAttributes: one (string) ore more (array of strings) names of attributes containing an URL. Default: [ ] (empty array).
    • mode: instructs Linkchecker to look for absolute ('abs', default), relative ('rel') URLs, or both ('both').

Other options

The Linkchecker module takes a few more options:

  • greenlist: array of URLs Linkchecker will neglect. These may be literal strings, or regular expressions (PCRE) without delimiters. Default: [ ] (empty array).
  • maxRequests: int, maximum number of requests (access attempts) handled simultaneously. This may be lowered to save CPU-cycles. Default: 40.
  • timeout: int, timeout of requests in milliseconds. Default: 3000.
  • curlOptions: array of extra options to be used in Curl requests There are lots of them. Please, use this option with care (or rather: not at all). Default: [ ] (empty array).
  • tableName: string, name of the Linkchecker database table. By default, the table is named 'linkchecker'.

HTTP Errors

Notice that not all errors reported by Linkchecker may be severe. For several reasons, web sites may report 400 Bad Request or 403 Forbidden while working perfectly normal in practice. Spotify, for instance, always seems to respond with 400 Bad Request.

This holds even more for redirection links. Often, 302 Found is perfectly acceptable. 301 Moved Permanently or 301 TLS Redirect may be a good reason to change the stored URL (often involving nothing more than modifying the protocol from http to https), but there are exceptions.

The opposite is also true. Receiving 200 OK doesn't always really mean everything is OK. YouTube, for instance, always reports 200 OK, even if the requested video does not exist.