Checks Links

0.8.0 2019-04-07 14:31 UTC

README

Build Status

Fink (pronounced "Phpink") is a command line tool for checking HTTP links written in PHP.

  • Check websites for broken links or error pages.
  • Asynchronous HTTP requests.

recording

Installation

Install as a stand-alone tool or as a project dependency:

Installing as a project dependency

$ composer require dantleech/fink --dev

Installing from a PHAR

Download the PHAR from the Releases page.

Building your own PHAR with Box

You can build your own PHAR by cloning this repository and running:

$ ./vendor/bin/box compile

Usage

Run the command with a single URL to start crawling:

$ ./vendor/bin/fink https://www.example.com

Use --output=somefile to log verbose information for each URL in JSON format, including:

  • url: The tested URL.
  • status: The HTTP status code.
  • referrer: The page which linked to the URL.
  • referrer_title: The value (e.g. link title) of the referring element.
  • referrer_xpath: The path to the node in the referring document.
  • distance: The number of links away from the start document.
  • request_time: Number of microseconds taken to make the request.
  • timestamp: The time that the request was made.
  • exception: Any runtime exception encountered (e.g. malformed URL, etc).

Arguments

  • url (multiple) Specify one or more base URLs to crawl (mandatory).

Options

  • --client-max-body-size 'Max body size for HTTP client (in bytes).
  • --client-max-header-size 'Max header size for HTTP client (in bytes).
  • --client-redirects=5 Set the maximum number of times the client should redirect (0 to never redirect).
  • --client-security-level=1 Set the default SSL secutity level
  • --client-timeout=15000 Set the maximum amount of time (in milliseconds) the client should wait for a response, defaults to 15,000 (15 seconds).
  • --concurrency: Number of simultaneous HTTP requests to use.
  • --display-bufsize=10 Set the number of URLs to consider when showing the display.
  • --display=+memory Set, add or remove elements of the runtime display (prefix with - or + to modify the default set).
  • --exclude-url=logout (multiple) Exclude URLs matching the given PCRE pattern.
  • --header="Foo: Bar" (multiple) Specify custom header(s).
  • --include-link=foobar.html Include given link as if it were linked from the base URL.
  • --insecure: Do not verify SSL certificates.
  • --load-cookies: Load from a cookies.txt.
  • --max-distance: Maximum allowed distance from base URL (if not specified then there is no limitation).
  • --max-external-distance: Limit the external (disjoint) distance from the base URL.
  • --no-dedupe: Do not filter duplicate URLs (can result in a non-terminating process).
  • --output=out.json: Output JSON report for each URL to given file (truncates existing content).
  • --publisher=csv Set the publisher (defaults to json) can be either json or csv.
  • --rate Set a maximum number of requests to make in a second.

Examples

Crawl a single website

$ fink http://www.example.com --max-external-distance=0

Crawl a single website and check the status of external links

$ fink http://www.example.com --max-external-distance=1

Use jq to analyse results

jq is a tool which can be used to query and manipulate JSON data.

$ fink http://www.example.com -x0 -oreport.json
$ cat report.json| jq -c '. | select(.status==404) | {url: .url, referrer: .referrer}' | jq

Exit Codes

  • 0: All URLs were successful.
  • 1: Unexpected runtime error.
  • 2: At least one URL failed to resolve successfully.