raffaelecarelle / ai-code-review-bot
Minimal, extensible AI code review tool
Requires
- php: >=8.1
- guzzlehttp/guzzle: ^7.8
- monolog/monolog: ^3.7
- psr/log: ^3.0
- symfony/console: ^7.0|^6.0|^5.0
- symfony/filesystem: ^7.0|^6.0|^5.0
- symfony/yaml: ^7.0|^6.0|^5.0
Requires (Dev)
- friendsofphp/php-cs-fixer: ^3.64
- phpstan/phpstan: ^1.12
- phpunit/phpunit: ^10.5
- rector/rector: ^1.2
This package is auto-updated.
Last update: 2025-09-18 07:28:42 UTC
README
AI Code Review Bot
Minimal, extensible AI-assisted code review tool for PHP projects.
- Analyzes unified diffs (from Pull/Merge Requests or files)
- Produces normalized findings (machine-readable JSON or human summary)
- Loads a simple YAML/JSON config with provider/policy settings and an optional coding guidelines file
- Safe defaults: deterministic Mock AI provider; no network calls unless configured
Official documentation: Docs
Table of Contents
-
- Objectives and scope
-
- Architecture and main modules
-
- Quick start
-
- Configuration (.aicodereview.yml)
-
- VCS adapters (GitHub/GitLab)
-
- Coding guidelines file
-
- AI providers and token budgeting
-
- Security & Performance
-
- Output formats
-
- Development and QA
-
- Credits
-
- License
1. Objectives and scope
- Functional
- Analyze diffs and produce review findings for coding standard violations and simple risk patterns.
- Dynamic configuration for providers, policy, token budget, rules, and VCS.
- Post results back to PR/MR via platform adapters when requested.
- Non-functional
- Safe defaults: no external calls by default (mock provider) and no PR comments unless
--comment
. - Modular design to plug real LLM providers and VCS platforms.
- Safe defaults: no external calls by default (mock provider) and no PR comments unless
2. Architecture and main modules (PHP)
bin/aicr
: CLI entry point (Symfony Console) running the review command in single-command mode.src/Command/ReviewCommand.php
: Orchestrates reading config, loading diff (from file or git), running Pipeline, and optional PR/MR commenting. Uses Symfony Process for git.src/Config.php
: Loads YAML/JSON config, merges with defaults, expands${ENV}
variables, exposes sections (providers, context, policy, vcs, prompts).src/DiffParser.php
: Minimal unified diff parser returning added lines per file with accurate line numbers.src/Pipeline.php
: End-to-end pipeline: parse diff, build AI provider, chunk with token budget, apply policy, and render output.src/Adapters/
: VcsAdapter interface and GithubAdapter/GitlabAdapter/BitbucketAdapter implementations (resolve branches from PR/MR id and post comments).src/Providers/
: AIProvider interface and concrete providers (OpenAI, Gemini, Anthropic, Ollama, Mock).src/Support/
: Core utility classes for enhanced functionality:ChunkBuilder
: Intelligent diff chunking with semantic analysis and optimizationTokenBudget
: Advanced token management with compression and per-file capsResourceManager
: Safe resource handling with automatic cleanupApiCache
: Response caching with TTL and size managementInputSanitizer
: Security-focused input validation and sanitizationDiffProcessor
: Enhanced diff processing with filtering capabilitiesSemanticChunker
: Context-aware code chunking for better AI analysis
src/Config/Constants
: Centralized configuration constants replacing magic numbers and strings.
3. Quick start
- Install dependencies via Composer:
composer install
- Option A: Analyze an existing diff file
- Create or use a unified diff, e.g.,
examples/sample.diff
. - Run:
- Create or use a unified diff, e.g.,
php bin/aicr review --diff-file examples/sample.diff --output summary php bin/aicr review --diff-file examples/sample.diff --output json php bin/aicr review --diff-file examples/sample.diff --output summary --provider openai
- Option B: Analyze a PR/MR by ID using git
- Configure
vcs.platform
in.aicodereview.yml
(github or gitlab) and set required identifiers/tokens. - Then run (the command fetches branches, computes diff, and analyzes it):
- Configure
php bin/aicr review --id 123 --output summary php bin/aicr review --id 123 --output summary --provider gemini
- To also post a comment back to the PR/MR, add
--comment
:
php bin/aicr review --id 123 --output summary --comment php bin/aicr review --id 123 --output summary --comment --provider anthropic
Notes
- Provide
--config <path>
to use a non-default config file. - Use
--provider <name>
to override the default provider from config (e.g., openai, gemini, anthropic, ollama, mock). - Without
--diff-file
,--id
is required and branches are resolved via the configured adapter.
4. Configuration (.aicodereview.yml)
Example (see .aicodereview.yml
in this repo and examples/config.*.yml
):
version: 1 providers: # Safe deterministic provider by default default: mock context: diff_token_limit: 8000 overflow_strategy: trim per_file_token_cap: 2000 enable_semantic_chunking: true enable_diff_compression: true policy: min_severity_to_comment: info max_comments: 50 redact_secrets: true consolidate_similar_findings: true max_findings_per_file: 5 severity_limits: error: 10 warning: 10 info: 5 guidelines_file: null vcs: # Set one of: github | gitlab | bitbucket platform: null # GitHub: owner/repo (optional if GH_REPO env or remote origin is GitHub) repo: null # GitLab: numeric id or full path namespace/repo (optional if GL_PROJECT_ID or remote origin is GitLab) project_id: null # GitLab: override API base for self-hosted instances (e.g., https://gitlab.example.com/api/v4) api_base: null # Bitbucket: workspace name (required for Bitbucket) workspace: null # Bitbucket: repository name (required for Bitbucket) repository: null # Bitbucket: access token for authentication (required for Bitbucket) accessToken: null # Bitbucket: API request timeout in seconds (optional, defaults to 30) timeout: 30 prompts: # Optional: append additional instructions to the base prompts used by the LLM # You can use single strings or lists of strings system_append: "Prefer concise findings and avoid duplicates." user_append: - "Prioritize security and performance related issues." extra: - "If a secret or key is detected, suggest redaction." excludes: # Array of paths to exclude from code review # Each element is treated as glob, regex, or relative path from project root # Examples: - "*.md" # Exclude all markdown files (glob) - "composer.lock" # Exclude specific files (exact match) - "tests/*.php" # Exclude files in specific directories with patterns (glob) - "vendor" # Exclude entire vendor directory (directory) - "node_modules" # Exclude node_modules directory (directory) - "build" # Exclude build artifacts (directory) - "dist" # Exclude distribution files (directory)
Notes
- Env var expansion works in any string value:
${VAR_NAME}
. - Tokens/ids read from env if not set:
GH_TOKEN
/GITHUB_TOKEN
,GL_TOKEN
/GITLAB_TOKEN
,GH_REPO
,GL_PROJECT_ID
.
5. VCS adapters (GitHub/GitLab/Bitbucket)
- Configure
vcs.platform
and required parameters as needed. - The review command supports a single
--id
option (PR number for GitHub, MR IID for GitLab, PR ID for Bitbucket). - Behavior when
--diff-file
is omitted:- Resolve base/head branches from the ID via platform API.
git fetch --all
; fetch base/head; computegit diff base...head
.- Run the analysis pipeline on that diff.
--comment
posts the summary back via the adapter.
6. Coding guidelines file
- You can provide a project coding standard or style guide via
guidelines_file
in.aicodereview.yml
. - When set, its content is embedded into the LLM prompts as a base64 string. The prompt explicitly instructs the model to base64-decode the guidelines and follow them strictly during the review.
- No provider-specific file uploads are performed: all supported providers (OpenAI, Gemini, Anthropic, Ollama) receive the same base64-embedded guidelines in the prompt.
7. AI providers and token budgeting
- Supported providers in this repository:
openai
,gemini
,anthropic
,ollama
,mock
. - Select via
providers.default
and configure each provider section accordingly (seesrc/Providers/*
for options). - Token budgeting is approximate (chars/4). Global and per-file caps are configurable;
overflow_strategy
defaults totrim
.
7.1 Advanced Token Optimization Features
The system includes sophisticated token cost optimization capabilities:
- Semantic Chunking: Enable with
enable_semantic_chunking: true
to group related code changes by context (classes, methods, etc.) - Diff Compression: Enable with
enable_diff_compression: true
to intelligently compress diffs while maintaining semantic meaning - Trivial Change Filtering: Automatically filters out whitespace-only changes, TODO comments, and import statements
- Similar Finding Consolidation: Set
consolidate_similar_findings: true
to aggregate similar issues across multiple files - Per-file Limits: Control review scope with
max_findings_per_file
to prevent overwhelming output - Severity Limits: Fine-tune output with
severity_limits
to cap the number of findings by severity level
These optimizations can reduce token usage by 30-50% for input and 40-60% for output while maintaining review quality. See docs/token-cost-optimization.md
for detailed implementation guide.
8. Security & Performance
Introduces significant enhancements focusing on security hardening, performance optimization, and code quality improvements:
Security Enhancements
- InputSanitizer: Comprehensive input validation and sanitization for all external data
- Branch name, repository name, and file path validation
- API response sanitization to prevent injection attacks
- URL and commit SHA validation with strict patterns
- Resource Management: Safe resource handling with automatic cleanup
- Temporary file and directory management
- Resource leak prevention with shutdown handlers
- Exception-safe cleanup with try-finally patterns
Performance Optimizations
- Intelligent Chunking: Enhanced ChunkBuilder with semantic analysis
- Batch processing for better memory management
- Parallel-friendly architecture for large diffs
- Context-aware chunking for improved AI analysis
- Advanced Token Management: Improved TokenBudget with compression
- Per-file token caps to prevent oversized chunks
- Diff compression for large files
- Smart budget allocation and overflow handling
- API Response Caching: New ApiCache system for improved performance
- TTL-based caching with automatic expiration
- Size-limited cache with LRU eviction
- Request deduplication and response reuse
Code Quality Improvements
- Constants Centralization: All magic numbers and strings moved to Constants class
- Enhanced Error Handling: Standardized exception handling across all providers
- Improved Documentation: Comprehensive PHPDoc comments and inline documentation
- Security Audit: Fixed potential security issues identified in code review
Configuration Enhancements
New configuration options available:
context: enable_semantic_chunking: true # Enable context-aware chunking enable_diff_compression: true # Enable diff compression for large files cache_ttl: 3600 # API response cache TTL in seconds max_cache_size: 52428800 # Maximum cache size in bytes (50MB)
9. Output formats
json
(default): machine-readable findings array.summary
: human-readable bulleted list. This is also the format used for PR/MR comments.markdown
: structured markdown format with emojis, metadata, and organized findings by severity and file.
9. Development and QA
- Requires PHP and Composer.
- Run unit and E2E tests with PHPUnit:
./vendor/bin/phpunit
- Coding standards and static analysis:
composer analyse
- The codebase uses
declare(strict_types=1)
and Symfony components (Console, YAML, Filesystem, Process).
10. Credits
- Author: Raffaele Carelle
- Contributors: Thanks to everyone who reports issues or submits PRs.
11. License
This project is open-sourced under the MIT License. See the LICENSE file for details.