README

🚀 Introduction

Sentinel is an advanced, flexible, and powerful content analysis package for Laravel applications. It provides comprehensive protection against offensive content through multiple detection strategies, context-aware analysis, and multilingual support.

This documentation has been generated almost in its entirety using 🦠 Claude 3.5 Haiku based on source code analysis. Some sections may be incomplete, outdated or may contain documentation for planned or not-released features. For the most accurate information, please refer to the source code or open an issue on the package repository.

❤️ Features

Multiple service providers support:
- Local dictionary-based analysis
- Tisane AI
- Prism LLM (with support for multiple LLMs)
- Azure AI
- Perspective AI
Advanced detection strategies:
- Exact match with Trie indexing
- Pattern matching with character substitutions
- N-gram analysis for phrase detection
- Variation detection for obfuscated content
- Repeated character detection
- Levenshtein distance matching
- Alphanumeric variation detection
- Phonetic matching
- Zero-width character detection
- Reversed word detection
Rich analysis results:
- Sentiment analysis
- Content categorization
- Match position tracking
- Context and confidence scoring
Multi-language support
Whitelist functionality
Configurable dictionaries
Laravel Facade and helper functions
Laravel validation rule
Caching system
Full Octane compatibility

Planned Features

Unicode support enhancement
More service providers
Machine learning enhancements

📦 Installation

You can install the package via composer:

composer require diego-ninja/sentinel

After installing, publish the configuration file and dictionaries:

php artisan vendor:publish --tag="sentinel-config"
php artisan vendor:publish --tag="sentinel-dictionaries"

🎛️ Configuration

The package configuration file will be published at config/sentinel.php. Here you can configure:

Default language and available languages
Default profanity service
Mask character for moderated words
Character replacements for evasion detection
Whitelisted words
Dictionary path
Service-specific configurations
Cache settings
Analysis thresholds

API Keys Configuration

Some services require API keys. Add these to your .env file:

SENTINEL_THRESHOLD_SCORE=0.5
SENTINEL_CACHE_ENABLED=true
SENTINEL_CACHE_TTL=3600
SENTINEL_CACHE_STORE=redis

PERSPECTIVE_AI_API_KEY=your-perspective-api-key
TISANE_AI_API_KEY=your-tisane-api-key
AZURE_AI_API_KEY=your-azure-api-key
AZURE_AI_ENDPOINT=your-azure-endpoint

# Prism Configuration
PRISM_PROVIDER=anthropic
PRISM_MODEL=claude-3-sonnet-latest

⚙️ Basic Usage

You can use Sentinel in three ways:

1. Facade

use Ninja\Sentinel\Facades\Sentinel;

// Check if text contains offensive content
$isOffensive = Sentinel::offensive('some text');

// Get cleaned version of text
$cleanText = Sentinel::clean('some text');

// Get detailed analysis with sentiment and matches
$result = Sentinel::check('some text');

// Use a specific provider
$result = Sentinel::with(Provider::Prism, 'some text');

2. Helper Functions

// Check if text is offensive
$isOffensive = is_offensive('some text');

// Clean offensive content
$cleanText = clean('some text');

3. Validation Rule

$rules = [
    'comment' => ['required', 'string', 'offensive']
];

Available Moderation Providers

Local Provider

Uses local dictionaries with multiple detection strategies for offline profanity checking.

use Ninja\Sentinel\Enums\Provider;

$result = Sentinel::with(Provider::Local, 'text to check');

Features:

Multiple detection strategies
Fast performance
No API dependencies
Configurable pattern matching

PurgoMalum

Free web service for basic profanity filtering.

$result = Sentinel::with(Provider::PurgoMalum, 'text to check');

Azure AI Content Safety

Uses Azure's AI content moderation service with advanced content analysis.

$result = Sentinel::with(Provider::Azure, 'text to check');

Perspective AI

Uses Google's Perspective API for toxicity and content analysis.

$result = Sentinel::with(Provider::Perspective, 'text to check');

Tisane AI

Natural language processing service for content moderation.

$result = Sentinel::with(Provider::Tisane, 'text to check');

Prism LLM Support

Access various Large Language Models through Prism:

use Ninja\Sentinel\Enums\Provider;

$result = Sentinel::with(Provider::Prism, 'text to check');

Supported models through Prism:

Anthropic (Claude models)
OpenAI (GPT models)
Gemini
Mistral
Ollama
DeepSeek
Groq
xAI

Working with Results

All services return a Result object with consistent methods:

$result = Sentinel::check('some text');

// Basic information
$result->offensive();    // bool: whether the text contains offensive content
$result->words();        // array: list of matched offensive words
$result->replaced();     // string: text with offensive words replaced
$result->original();     // string: original text
$result->score();        // Score: offensive content score
$result->confidence();   // Confidence: confidence level

// Detailed analysis
$result->sentiment();    // Sentiment: text sentiment analysis
$result->categories();   // array: detected content categories

// Match information
$result->matches();      // MatchCollection: detailed matches with positions

Working with Matches

The MatchCollection provides detailed information about each match:

$matches = $result->matches();

foreach ($matches as $match) {
    echo "Word: " . $match->word();
    echo "Type: " . $match->type();          // exact, pattern, variation, etc.
    echo "Score: " . $match->score();
    echo "Confidence: " . $match->confidence();
    
    // Get all occurrences of the match
    foreach ($match->occurrences() as $occurrence) {
        echo "Position: " . $occurrence->start();
        echo "Length: " . $occurrence->length();
    }
    
    // Context information if available
    if ($context = $match->context()) {
        echo "Original form: " . $context['original'];
        echo "Surrounding text: " . $context['surrounding'];
    }
}

Sentiment Analysis

Results include sentiment analysis when available:

$sentiment = $result->sentiment();

echo $sentiment->type();    // positive, negative, neutral, mixed
echo $sentiment->score();   // -1.0 to 1.0

Response Caching

External service responses are automatically cached to improve performance and reduce API calls. By default, all external services will cache their responses for 1 hour.

The local provider is not cached as it's already performant enough.

Configuring Cache

You can configure the cache in your .env file:

SENTINEL_CACHE_ENABLED=true # Enable caching (default: true)
SENTINEL_CACHE_TTL=3600 # Cache duration in seconds (default: 1 hour)
SENTINEL_CACHE_STORE=redis # Cache store (default: file)

Or in your config/sentinel.php:

    'cache' => [
        'enabled' => env('SENTINEL_CACHE_ENABLED', true),
        'store' => env('SENTINEL_CACHE_STORE', 'file'),
        'ttl' => env('SENTINEL_CACHE_TTL', 60),
    ],

Cache Keys

Cache keys are generated using the following format:

sentinel:{ServiceName}:{md5(text)}

Detection Strategies

The local checker uses a multi-strategy approach to detect offensive content accurately. Each piece of text is processed through different detection strategies in sequence:

Trie Index Strategy: Fast exact matching using a Trie data structure
Pattern Strategy: Handles exact matches and character substitutions
NGram Strategy: Detects offensive phrases by analyzing word combinations
Variation Strategy: Catches attempts to evade detection through character separation
Repeated Chars Strategy: Identifies words with intentionally repeated characters
Levenshtein Strategy: Uses string distance comparison for similar words

Each strategy can operate in either full word or partial matching mode. Results from all strategies are combined, deduplicated, and scored based on the type and quantity of matches found.

Custom Dictionaries

You can add your own dictionaries or modify existing ones:

Create a new PHP file in your resources/dict directory
Return an array of words to be moderated
Update your config to include the new language

// resources/dict/custom.php
return [
    'word1',
    'word2',
    // ...
];

// config/sentinel.php
'languages' => ['en', 'custom'],

Whitelist

You can whitelist words to prevent them from being moderated:

// config/sentinel.php
'whitelist' => [
    'word1',
    'word2',
],

Character Substitution

The package detects common character substitutions. Configure these in:

// config/sentinel.php
'replacements' => [
    'a' => '(a|@|4)',
    'i' => '(i|1|!)',
    // ...
],

🙏 Credits

This project is developed and maintained by 🥷 Diego Rin in his free time.

Special thanks to:

Laravel Framework for providing the most exciting and well-crafted PHP framework.
Snipe for developing the initial code that serves Sentinel as starting point.
All the contributors and testers who have helped to improve this project through their contributions.

If you find this project useful, please consider giving it a ⭐ on GitHub!

diego-ninja / laravel-censor

Maintainers

Details