infocyph / phpprobe
Standalone PHP syntax and duplicate code checker.
Requires
- php: >=8.2
- nikic/php-parser: ^5.7
Requires (Dev)
- pestphp/pest: >=3.0 <5.0
README
Standalone PHP checker for syntax validation, duplicate-code detection, and public API snapshot checks.
PHPProbe is the checker runtime. It can be used directly as phpprobe, required by tool-combiner packages such as PHPForge, or called from PHP code through the public gateway classes.
Requirements
- PHP
>=8.2 nikic/php-parser>=5.0 <6.0
Install it as a Composer tool dependency:
composer require --dev infocyph/phpprobe
The package ships a Composer binary:
php vendor/bin/phpprobe
Commands
php vendor/bin/phpprobe syntax [options] [paths...] php vendor/bin/phpprobe duplicates [options] [paths...] php vendor/bin/phpprobe api [options] [paths...] php vendor/bin/phpprobe presets php vendor/bin/phpprobe preset <name>
Unknown commands print the top-level usage and exit 0. There is no separate --version command.
Quick Start
php vendor/bin/phpprobe syntax php vendor/bin/phpprobe duplicates php vendor/bin/phpprobe duplicates --json php vendor/bin/phpprobe duplicates --preset=strict --json src php vendor/bin/phpprobe api --write-baseline=.phpprobe-api-baseline.json src php vendor/bin/phpprobe api --baseline=.phpprobe-api-baseline.json src php vendor/bin/phpprobe presets php vendor/bin/phpprobe preset phpstorm
Public API
The package-facing checker gateways live directly under src/:
Infocyph\PHPProbe\SyntaxCheckerInfocyph\PHPProbe\DuplicateCheckerInfocyph\PHPProbe\ApiSnapshotChecker
All expose:
public function run(array $args): int
$args is the same argument list that follows the CLI subcommand. For example:
use Infocyph\PHPProbe\ApiSnapshotChecker; use Infocyph\PHPProbe\DuplicateChecker; use Infocyph\PHPProbe\SyntaxChecker; $syntaxCode = (new SyntaxChecker())->run(['--config=phpprobe.json', 'src']); $duplicateCode = (new DuplicateChecker())->run(['--preset=strict', '--json', 'src']); $apiCode = (new ApiSnapshotChecker())->run(['--baseline=.phpprobe-api-baseline.json', 'src']);
Everything else is internal implementation detail, grouped by role:
| Namespace | Purpose |
|---|---|
Api |
Public API snapshot extraction from parser ASTs. |
Console |
CLI dispatch for bin/phpprobe. |
Config |
Config lookup, preset lookup, JSON parsing, config merging, and shared CLI option handling. |
Detection |
Duplicate-code token indexing, AST block indexing, scoring, grouping, and pruning. |
Filesystem |
Git-aware PHP file discovery and path exclusion. |
Process |
Small proc_open runner wrappers. |
Util |
Narrow shared helpers. |
Config Lookup
The default config filename is phpprobe.json.
When a checker needs a config file and --config was not passed, PHPProbe resolves it in this order:
phpprobe.jsonin the current project root, meaning the current working directory.vendor/infocyph/phpprobe/resources/phpprobe.jsonunder the current project root.resources/phpprobe.jsononly when the current project itself isinfocyph/phpprobe.
If no config can be found, PHPProbe throws a runtime config error.
Preset files are bundled resources. They are resolved from:
vendor/infocyph/phpprobe/resources/presets/<name>.json.resources/presets/<name>.jsononly while developinginfocyph/phpprobeitself.
Project-root preset files are not looked up automatically.
When --config=FILE is passed explicitly and that file is missing, unreadable, empty, or invalid JSON, PHPProbe treats it as an empty config and continues with internal defaults plus any CLI options.
Config Format
The bundled resources/phpprobe.json is intentionally small:
{
"preset": "phpstorm"
}
A full project config may override any part of the selected preset:
{
"preset": "phpstorm",
"syntax": {
"paths": ["src"],
"exclude": ["src/generated"]
},
"duplicates": {
"paths": ["src"],
"exclude": ["src/generated"],
"mode": "audit",
"normalize": true,
"fuzzy": true,
"near_miss": true,
"min_lines": 5,
"min_tokens": 90,
"min_statements": 4,
"min_similarity": 0.85,
"baseline": "",
"write_baseline": "",
"json": false
},
"api": {
"paths": ["src"],
"exclude": ["src/generated"],
"include_protected": true,
"baseline": "",
"write_baseline": "",
"json": false
}
}
Config keys accept snake case, kebab case, and camel case. For example, min_tokens, min-tokens, and minTokens are equivalent. Excludes can be configured as either exclude or exclude_paths.
Internal duplicate defaults, before the bundled phpstorm config is applied, are mode=gate, normalize=true, fuzzy=false, near_miss=false, min_lines=5, min_tokens=70, min_statements=4, min_similarity=0.85, no baseline, no JSON output, and no configured paths or excludes.
Internal API defaults are include_protected=true, no baseline, no JSON output, and no configured paths or excludes.
Config merge order is:
- Internal checker defaults.
- Config-file
preset, when present. - Explicit values in the config file.
- CLI
--preset=NAME, when present. - Explicit CLI flags and CLI paths.
Local config values override the config-file preset. CLI --preset is a run-level override and can override config-file values. Explicit CLI flags still win after that.
Presets
Preset templates live in resources/presets/ and are loaded by Infocyph\PHPProbe\Config\PresetRepository.
Available presets:
| Preset | Duplicate policy | API policy |
|---|---|---|
phpstorm |
PhpStorm-aligned default. audit mode, normalized tokens, fuzzy identifiers, near-miss matching, min_lines=5, min_tokens=90, min_statements=4, min_similarity=0.85. |
Includes protected members. |
standard |
Quieter CI gate. gate mode, normalized tokens, fuzzy identifiers, no near-miss matching, min_lines=6, min_tokens=100, min_statements=5, min_similarity=0.9. |
Includes protected members. |
strict |
Sensitive audit. audit mode, normalized tokens, fuzzy identifiers, near-miss matching, min_lines=4, min_tokens=70, min_statements=3, min_similarity=0.8. |
Includes protected members. |
All presets include the same default syntax, duplicate, and API excludes:
tests, vendor, node_modules, .git, .idea, .vscode, coverage,
.phpunit.cache, .psalm-cache, build, dist, tmp, .tmp, storage,
bootstrap/cache, var/cache
Duplicate presets also exclude storage/framework/views.
Preset commands:
php vendor/bin/phpprobe presets php vendor/bin/phpprobe preset phpstorm
presets prints one preset name per line. preset <name> prints the bundled JSON template. Unknown preset names print an error and exit 2.
Syntax Checker
The syntax checker discovers PHP files, then runs PHP's native lint command against each file:
php -d display_errors=1 -l <file>
Command:
php vendor/bin/phpprobe syntax [options] [paths...]
Options:
| Option | Form | Meaning |
|---|---|---|
--config |
--config=FILE or --config FILE |
Read checker settings from a specific config file. |
--preset |
--preset=NAME or --preset NAME |
Apply phpstorm, standard, or strict as a run-level preset. |
--exclude |
--exclude=PATH or --exclude PATH |
Exclude a path. Repeatable. |
--help, -h |
flag | Print syntax checker help and exit 0. |
Path behavior:
- CLI paths override
syntax.pathsfrom config. - If CLI paths are empty,
syntax.pathsis used. - If both are empty, discovery starts from
.. - Config excludes and CLI excludes are merged.
Output and exits:
| Condition | Stream | Exit |
|---|---|---|
| No PHP files found | stdout: No PHP files found. |
0 |
| All files pass | stdout: Syntax OK: N PHP files checked. |
0 |
| One or more files fail | stderr: failing file list plus lint output |
1 |
| Unknown preset | stderr: preset error |
2 |
Public API Snapshot Checker
The API checker parses PHP files with nikic/php-parser, extracts the package-visible surface, and can compare it with a saved snapshot. It is intended for library BC drift checks, not type analysis.
Command:
php vendor/bin/phpprobe api [options] [paths...]
Options:
| Option | Form | Meaning |
|---|---|---|
--config |
--config=FILE or --config FILE |
Read checker settings from a specific config file. |
--preset |
--preset=NAME or --preset NAME |
Apply phpstorm, standard, or strict as a run-level preset. |
--exclude |
--exclude=PATH or --exclude PATH |
Exclude a path. Repeatable. |
--public-only |
flag | Ignore protected class members. |
--include-protected |
flag | Include protected members. This is the default. |
--baseline |
--baseline=FILE |
Compare the current API against a snapshot file. |
--write-baseline |
--write-baseline, --write-baseline=FILE |
Write the current API snapshot and exit 0. Bare flag writes .phpprobe-api-baseline.json. |
--json |
flag | Emit machine-readable JSON to stdout. |
--help, -h |
flag | Print API checker help and exit 0. |
Path behavior:
- CLI paths override
api.pathsfrom config. - If CLI paths are empty,
api.pathsis used. - If both are empty, discovery starts from
.. - Config excludes and CLI excludes are merged.
Snapshot contents:
- named classes, interfaces, traits, and enums
- top-level namespaced functions
- top-level namespaced constants
- public members always
- protected members unless
--public-onlyis used - class modifiers, inheritance, implemented interfaces, method signatures, property signatures, constants, enum cases, function signatures, and stable fingerprints
Output and exits:
| Condition | Stream | Exit |
|---|---|---|
| No baseline passed | stdout: Public API snapshot OK: N symbol(s) scanned. |
0 |
| Baseline matches | stdout: Public API unchanged: N symbol(s) scanned. |
0 |
| Baseline differs | stderr: added/removed/changed symbol list |
1 |
--json |
stdout: JSON result |
0 or 1, depending on drift |
--write-baseline |
stdout: baseline message or JSON result |
0 |
| Unknown preset | stderr: preset error |
2 |
Duplicate Checker
The duplicate checker combines token fingerprints, AST block structure, statement windows, near-miss similarity, grouping, pruning, ranking, and optional baseline suppression.
Command:
php vendor/bin/phpprobe duplicates [options] [paths...]
Options:
| Option | Form | Meaning |
|---|---|---|
--config |
--config=FILE or --config FILE |
Read checker settings from a specific config file. |
--preset |
--preset=NAME or --preset NAME |
Apply phpstorm, standard, or strict as a run-level preset. |
--exclude |
--exclude=PATH or --exclude PATH |
Exclude a path. Repeatable. |
--mode |
--mode=gate or --mode=audit |
gate runs token matching; audit also enables statement matching and near-miss matching. |
--min-lines |
--min-lines=N |
Minimum duplicated line span. Values below 1 become 1. |
--min-tokens |
--min-tokens=N |
Token fingerprint window size. Values below 1 become 1. |
--min-statements |
--min-statements=N |
Statement window size for audit matching. Values below 1 become 1. |
--min-similarity |
--min-similarity=N |
Near-miss threshold. Accepts 0.0..1.0 or 0..100; values above 1 are treated as percentages. |
--near-miss |
flag | Enable bounded statement/shape similarity matching. |
--exact |
flag | Disable variable/literal normalization and disable fuzzy matching. |
--fuzzy |
flag | Normalize identifiers/calls as ID for renamed-code scans. |
--no-fuzzy |
flag | Disable fuzzy identifier/call normalization. |
--baseline |
--baseline=FILE |
Suppress clone groups whose fingerprints are already in a baseline file. |
--write-baseline |
--write-baseline, --write-baseline=FILE |
Write current clone fingerprints to a baseline and exit 0. Bare flag writes .phpprobe-duplicates-baseline.json. |
--json |
flag | Emit machine-readable JSON to stdout. |
--help, -h |
flag | Print duplicate checker help and exit 0. |
Exact accepted forms matter: numeric options, --mode, --baseline, and valued --write-baseline=FILE are parsed in equals form. --config, --preset, and --exclude also accept split form. --write-baseline may also be passed as a bare flag.
Path behavior:
- CLI paths override
duplicates.pathsfrom config. - If CLI paths are empty,
duplicates.pathsis used. - If both are empty, discovery starts from
.. - Config excludes and CLI excludes are merged.
Mode behavior:
gate: token-window duplicate detection only, unless--near-missis explicitly passed.audit: token-window matching plus statement-window matching, and near-miss matching is enabled automatically.
Output and exits:
| Condition | Stream | Exit |
|---|---|---|
| No clone groups after baseline suppression | stdout: No new duplicated code found (...) |
0 |
| Clone groups found | stderr: text report |
1 |
--json with no clones |
stdout: JSON result |
0 |
--json with clones |
stdout: JSON result |
1 |
--write-baseline |
stdout: baseline message or JSON result |
0 |
| Unknown preset | stderr: preset error |
2 |
Duplicate Detection Details
File discovery:
- PHPProbe first tries
git ls-files -z --cached --others --exclude-standard. - It filters discovered PHP files with
git check-ignore -z --stdin --no-index. - If Git discovery is unavailable, it recursively scans the selected paths.
- Recursive fallback skips common infrastructure directories such as
.git,.idea,.phpunit.cache,.psalm-cache,.vscode,coverage,node_modules, andvendor.
Token normalization:
- Whitespace, comments, doc comments, PHP open tags, and close tags are ignored.
- With
normalize=true, variables becomeVAR, numbers becomeNUM, strings becomeSTR. - With
fuzzy=true, identifiers and names becomeID. - With
--exact, token values include token names and original text.
Token clones:
- PHPProbe hashes every normalized token window of
min_tokenstokens. - Matching windows are candidate clones.
- Candidates are extended token-by-token to find the full matching region.
- Overlapping windows in the same file are ignored.
- Clone regions below
min_linesare ignored.
AST and statement matching:
- PHPProbe uses
nikic/php-parserto index structural blocks. - Indexed blocks include functions, methods, closures, arrow functions, loops, branches, match arms, and try/catch/finally blocks.
- Statement hashes are built from AST shape.
- In
auditmode, matching statement windows ofmin_statementsstatements are reported as statement clones.
Near-miss matching:
- Near-miss matching compares blocks with the same block type.
- Similarity is weighted as
72%statement-hash similarity and28%AST-shape similarity. - Similarity is based on longest-common-subsequence ratio.
- Matches below
min_similarityare ignored.
Grouping, pruning, and scoring:
- Duplicate pairs are grouped into clone families.
- Contained/weaker clones are pruned.
- Results are ranked by score, line span, and similarity.
- Scoring rewards larger clones, more occurrences, higher similarity, structural completeness, and near-miss signal; small trivial clones are penalized.
Duplicate JSON Result Shape
phpprobe duplicates --json emits:
{
"files": 2,
"total_lines": 100,
"duplicated_lines": 20,
"duplicate_percentage": 20.0,
"known_clones": 0,
"new_clones": 1,
"clones": [
{
"fingerprint": "...",
"source": "tokens",
"score": 120.5,
"similarity": 1.0,
"tokens": 90,
"lines": 10,
"statements": 0,
"block_type": "function",
"occurrences": [
{
"file": "src/Example.php",
"start_line": 10,
"end_line": 20,
"lines": 11,
"context": "function"
}
]
}
]
}
Clone source is one of:
tokensstatementsnear_miss
known_clones is populated when a duplicate baseline is read. new_clones is the number of clone groups remaining after baseline suppression.
API JSON Result Shape
phpprobe api --json emits:
{
"snapshot": {
"version": 1,
"generated_at": "2026-05-02T00:00:00+00:00",
"symbols": [
{
"id": "class App\\Service",
"kind": "class",
"name": "App\\Service",
"file": "src/Service.php",
"line": 5,
"modifiers": ["final"],
"extends": "",
"implements": [],
"members": [],
"fingerprint": "..."
}
]
},
"baseline": {
"version": 1,
"generated_at": "",
"symbols": []
},
"changed": false,
"changes": {
"added": [],
"removed": [],
"changed": []
}
}
Baselines
Write a baseline:
php vendor/bin/phpprobe duplicates --write-baseline php vendor/bin/phpprobe duplicates --write-baseline=.phpprobe-duplicates-baseline.json php vendor/bin/phpprobe api --write-baseline php vendor/bin/phpprobe api --write-baseline=.phpprobe-api-baseline.json
Use a baseline:
php vendor/bin/phpprobe duplicates --baseline=.phpprobe-duplicates-baseline.json php vendor/bin/phpprobe api --baseline=.phpprobe-api-baseline.json
Duplicate baseline files contain:
{
"version": 1,
"generated_at": "2026-05-02T00:00:00+00:00",
"clones": [
{
"fingerprint": "...",
"source": "tokens",
"score": 100.0
}
]
}
API baseline files use the same top-level version, generated_at, and symbols shape emitted under the snapshot JSON key. When a baseline file is missing or unreadable, PHPProbe treats it as empty.
Development
Composer scripts:
| Script | Command |
|---|---|
composer test |
vendor/bin/pest -c pest.xml |
composer lint |
php bin/phpprobe syntax src tests |
composer duplicates |
php bin/phpprobe duplicates --preset=standard --config=resources/phpprobe.json src tests |
composer api |
php bin/phpprobe api --config=resources/phpprobe.json src tests |
Useful local checks:
composer validate --strict
composer test
composer lint
composer duplicates
composer api
git diff --check
The repository does not need a committed composer.lock; it is a library-style tool package, so dev dependencies can resolve for the active PHP version.