scherhak/is-gibberish

Zero-dependency PHP package for detecting obvious gibberish input using fast heuristics.

Maintainers

Package info

github.com/scherhak/is-gibberish

pkg:composer/scherhak/is-gibberish

Statistics

Installs: 0

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 0

v0.1.0 2026-05-19 06:14 UTC

This package is auto-updated.

Last update: 2026-05-20 04:03:15 UTC


README

License: MIT PHP Version

A tiny, zero-dependency PHP package to detect if a string is just random keyboard smashing or "gibberish". Perfect for pre-filtering contact forms before sending emails.

🚀 Why use this?

Bots and frustrated users often fill form fields with nonsense like asdfghjkl, sdfa sdfas df sadf asdfas asdf, or kfkJHzgHjb6)?)7). This package helps you identify these strings using fast heuristic analysis instead of heavy machine learning.

✨ Features

  • Ultra Lightweight: No dependencies, just pure PHP.
  • Fast: Uses lightweight heuristics and pattern distribution for near-instant results.
  • Multilingual: Correctly handles German Umlauts (ä, ö, ü) and accented letters conservatively.
  • Customizable: Adjust thresholds, heuristic weights and keyboard rows globally or per call.
  • Explainable: Optional analysis result with score and triggered reasons.

📦 Installation

composer require yourname/is-gibberish
use IsGibberish\Detector;

$detector = new Detector();

$input = "asdf 24qf waefasdf arg aerg ergeara asd";

if ($detector->isGibberish($input)) {
    // Handle as invalid input
    echo "Please enter a real message.";
}

The default configuration is tuned to detect the most common gibberish patterns in typical form submissions, including contact forms, support requests, and checkout-related text fields. For most applications, it should work well out of the box without additional tuning.

Per-call overrides are also supported when you need to tune the detector for a specific field, workflow, or validation policy:

if ($detector->isGibberish($input, [
    'threshold' => 35.0,
])) {
    echo "Please enter a real message.";
}

You can also override heuristic weights for a single boolean check:

if ($detector->isGibberish($input, [
    'weights' => [
        'token_pattern' => 30.0,
        'keyboard_pattern' => 70.0,
    ],
])) {
    echo "Please enter a real message.";
}

🔎 Detailed Analysis

$result = $detector->analyze("asdf 24qf waefasdf arg aerg ergeara asd");

$result->isGibberish(); // true
$result->score();       // e.g. 58.0
$result->threshold();   // e.g. 45.0
$result->reasons();     // triggered heuristic reasons
$result->breakdown();   // heuristic scores

You can override configuration for a single analysis call:

$result = $detector->analyze($input, [
    'threshold' => 35.0,
    'weights' => [
        'keyboard_pattern' => 70.0,
        'token_pattern' => 30.0,
    ],
    'keyboard_rows' => ['qwertyuiop', 'asdfghjkl', 'zxcvbnm'],
]);

If you prefer to configure the detector up front, you can still pass a DetectorConfig to the constructor:

use IsGibberish\Config\DetectorConfig;
use IsGibberish\Detector;

$config = DetectorConfig::default()
    ->withThreshold(40.0)
    ->withMergedWeights([
        'keyboard_pattern' => 60.0,
    ]);

$detector = new Detector($config);

⚙️ Configuration Options

Both isGibberish() and analyze() accept an optional second argument:

$result = $detector->analyze($input, [
    'threshold' => 35.0,
    'weights' => [
        'keyboard_pattern' => 70.0,
    ],
    'keyboard_rows' => ['qwertyuiop', 'asdfghjkl', 'zxcvbnm'],
]);

Supported options:

  • threshold (float): Overrides the score threshold for this single call.
  • weights (array<string, float>): Overrides selected heuristic weights for this single call. Missing keys keep their default values.
  • keyboard_rows (list<string>): Replaces the keyboard row list for this single call.

Available heuristic weight keys:

  • special_character_density
  • repetition
  • keyboard_pattern
  • vowel_consonant_balance
  • character_distribution
  • token_pattern

Default configuration:

  • threshold: 45.0
  • weights.special_character_density: 30.0
  • weights.repetition: 30.0
  • weights.keyboard_pattern: 50.0
  • weights.vowel_consonant_balance: 20.0
  • weights.character_distribution: 20.0
  • weights.token_pattern: 35.0
  • keyboard_rows: ['qwertyuiop', 'asdfghjkl', 'zxcvbnm', 'qwertzuiop', 'yxcvbnm']

🧠 Current Heuristics

  • Special character density
  • Repeated characters and repeated short blocks
  • Keyboard row patterns such as asdfghjkl and qwertzuiop
  • Vowel to consonant imbalance
  • Unnatural character-class distribution
  • Suspicious multi-token fragment patterns

🛠️ Development

Install development dependencies:

composer install

Run the test suite:

vendor/bin/phpunit

For contribution guidelines, development workflow and testing details, see CONTRIBUTING.md.