guanhui07/dfa-sensitive

copy from FireLustre/php-dfa-sensitive,To achieve the filtering of sensitive words, based on the determination of finite automata (DFA) algorithm.

2.1.1 2024-10-23 09:16 UTC

This package is auto-updated.

Last update: 2024-11-23 09:32:30 UTC


README

安装扩展

    composer require guanhui07/dfa-sensitive

如果你需要手动引入

    require './vendor/autoload.php';
    
    use DfaFilter\SensitiveHelper;

构建敏感词库树

场景一: 可以拿到不同(用户)词库数组

// 获取感词库索引数组
$wordData = array(
    '察象蚂',
    '拆迁灭',
    '车牌隐',
    '成人电',
    '成人卡通',
    ......
);

// get one helper
$handle = SensitiveHelper::init()->setTree($wordData);

场景二: 全站使用一套敏感词库

// 获取感词库文件路径
$wordFilePath = 'tests/data/words.txt';

// get one helper
$handle = SensitiveHelper::init()->setTreeByFile($wordFilePath);

设置干扰因子集合

注意只干扰因子只支持单个字符或单个汉字,暂不支持词

但是多个干扰因子连在一起,敏感词可以准确识别

$handle = SensitiveHelper::init()->setStopWordList(['&', '*', '.'])->setTreeByFile($wordFilePath);

忽略大小写

注意该设置只有在构建敏感词库树之前调用

在构建敏感词库树之后调用,结果可能不符合预期

$handle = SensitiveHelper::init()->setIgnoreCase()->setTree(['Av', '赌球网'])

检测是否含有敏感词

$islegal = $handle->islegal($content);

敏感词过滤

// 敏感词替换为*为例(会替换为相同字符长度的*)
$filterContent = $handle->replace($content, '*', true);

// 或敏感词替换为***为例
$filterContent = $handle->replace($content, '***');

标记敏感词

$markedContent = $handle->mark($content, '<mark>', '</mark>');

获取文字中的敏感词

// 获取内容中所有的敏感词
$sensitiveWordGroup = $handle->getBadWord($content);
// 仅且获取一个敏感词
$sensitiveWordGroup = $handle->getBadWord($content, 1);