guanhui07 / dfa-sensitive
copy from FireLustre/php-dfa-sensitive,To achieve the filtering of sensitive words, based on the determination of finite automata (DFA) algorithm.
2.1.1
2024-10-23 09:16 UTC
Requires
- php: >=7.2
- ext-mbstring: *
Requires (Dev)
- phpstan/phpstan: ^0.11
- phpunit/phpunit: ^6.4
- squizlabs/php_codesniffer: ^3.5
This package is auto-updated.
Last update: 2024-11-23 09:32:30 UTC
README
安装扩展
composer require guanhui07/dfa-sensitive
如果你需要手动引入
require './vendor/autoload.php'; use DfaFilter\SensitiveHelper;
构建敏感词库树
场景一: 可以拿到不同(用户)词库数组
// 获取感词库索引数组 $wordData = array( '察象蚂', '拆迁灭', '车牌隐', '成人电', '成人卡通', ...... ); // get one helper $handle = SensitiveHelper::init()->setTree($wordData);
场景二: 全站使用一套敏感词库
// 获取感词库文件路径 $wordFilePath = 'tests/data/words.txt'; // get one helper $handle = SensitiveHelper::init()->setTreeByFile($wordFilePath);
设置干扰因子集合
注意只干扰因子只支持单个字符或单个汉字,暂不支持词
但是多个干扰因子连在一起,敏感词可以准确识别
$handle = SensitiveHelper::init()->setStopWordList(['&', '*', '.'])->setTreeByFile($wordFilePath);
忽略大小写
注意该设置只有在构建敏感词库树之前调用
在构建敏感词库树之后调用,结果可能不符合预期
$handle = SensitiveHelper::init()->setIgnoreCase()->setTree(['Av', '赌球网'])
检测是否含有敏感词
$islegal = $handle->islegal($content);
敏感词过滤
// 敏感词替换为*为例(会替换为相同字符长度的*) $filterContent = $handle->replace($content, '*', true); // 或敏感词替换为***为例 $filterContent = $handle->replace($content, '***');
标记敏感词
$markedContent = $handle->mark($content, '<mark>', '</mark>');
获取文字中的敏感词
// 获取内容中所有的敏感词 $sensitiveWordGroup = $handle->getBadWord($content); // 仅且获取一个敏感词 $sensitiveWordGroup = $handle->getBadWord($content, 1);