mottaviani-dev/laravel-reductor

ML-powered test suite optimization for Laravel - Reduce CI/CD time by identifying redundant tests

v1.0.0 2025-07-15 19:44 UTC

This package is auto-updated.

Last update: 2025-07-15 21:09:46 UTC


README

Latest Version on Packagist Total Downloads License

Accelerate your Laravel test suite by identifying and eliminating redundant tests using unsupervised machine learning.

Key Features

  • Research-Validated ML: Implements Sebastian et al. (2024) unsupervised machine learning methodology
  • Laravel-Aware Design: Handles Laravel-specific patterns, shared bootstraps, and testing idioms
  • Semantic + Coverage Analysis: Uses hybrid 640D vectors combining TF-IDF semantics and execution coverage
  • Safety-First Clustering: Prevents merging tests with opposite behavior (e.g., success vs failure)
  • Interactive CLI Review: Validate clusters before merging, with entropy and safety scores
  • CI/CD Integration Ready: Automate detection of test duplication regressions
  • Multi-Format Reports: Markdown, CSV, JSON, YAML, and HTML output options

Requirements

  • PHP ≥ 8.1, Laravel ≥ 9.0
  • Python ≥ 3.7 with numpy, scikit-learn, scipy
  • PHPUnit 10+ or Pest
  • Code coverage via Xdebug or PCOV

Installation

composer require --dev reductor/laravel-test-reduction
pip3 install numpy scikit-learn scipy
php artisan reductor:install

Quick Start

Prerequisites

Configure phpunit.xml for proper coverage format and exclusions:

<phpunit>
  <!-- Required: .cov format for per-test coverage data -->
  <logging>
    <log type="coverage-php" target="coverage.cov"/>
  </logging>

  <!-- Recommended: Focus on application code only -->
  <coverage processUncoveredFiles="true">
    <include>
      <directory suffix=".php">app</directory>
      <directory suffix=".php">src</directory>
    </include>
    <exclude>
      <directory suffix=".php">bootstrap</directory>
      <directory suffix=".php">config</directory>
      <directory suffix=".php">database</directory>
      <directory suffix=".php">routes</directory>
      <directory suffix=".php">storage</directory>
      <directory suffix=".php">tests</directory>
      <directory suffix=".php">vendor</directory>
    </exclude>
  </coverage>
</phpunit>

Steps

  1. Generate Coverage:
php artisan test --coverage
  1. Run Redundancy Analysis:
php artisan test:reduce --cluster --coverage=coverage.cov
  1. Review Results:
open storage/test-reduction/redundancy_report.md

For advanced usage, interactive review, CI integration, and troubleshooting, see the docs/ folder.

Sample Output

Analysis Summary
================
Total Tests Analyzed:       118
Redundant Test Clusters:    25
Tests in Clusters:          78
Potential Test Reduction:   53 (44.9%)

Top Redundant Clusters

Cluster #11: 8 tests (97.5% similar)
  • CreateTest::it_creates_an_asset_with_valid_payload#0
  • CreateTest::it_creates_an_asset_with_valid_payload#1
  • CreateTest::it_creates_an_asset_with_string_custom_fields#0
  ... and 5 more

Cluster #12: 7 tests (98.5% similar)
  • CreateTest::it_creates_an_asset_with_no_end_date_fail#0
  • CreateTest::admin_creates_asset_with_past_end_at_date_fail#0
  • CreateTest::admin_creates_asset_with_past_start_at_date_fail#0
  ... and 4 more

=== Semantic Vector Statistics ===
Non-zero vectors: 118 (100% - no extraction failures)
Average vector magnitude: 1.0
Average non-zero elements: 21.7

Advanced Usage

For advanced features, see the docs/ folder:

Quick Advanced Examples:

# Interactive review
php artisan tests:reduce --coverage-file=coverage.cov --interactive

# Custom weights
php artisan tests:reduce --semantic-weight=0.8 --coverage-weight=0.2

# Different output formats
php artisan tests:reduce --format=html --output-dir=reports

How It Works

  1. Test Coverage: Collects executed lines per test
  2. Semantic Vectorization: Extracts TF-IDF tokens from test methods
  3. MinHash Fingerprints: Builds 512-bit sparse binary vector
  4. Clustering: DBSCAN groups similar test cases
  5. Safety Checks: Assertion-aware and intent-matching logic to validate clusters

Performance Benefits

  • 20-40% faster test execution
  • 30-50% fewer redundant tests to maintain
  • 15-30% CI pipeline time savings

Troubleshooting

Common Issues

Problem: "Only 1 test processed" instead of all tests
Solution: Ensure you're using .cov format, not --coverage-clover (XML). The XML format lacks per-test granularity.

Problem: "Warning: X/Y semantic vectors are zero"
Solution: This was fixed in v1.0.1+ with better parameterized test handling. Update the package.

Problem: "Python module not found" in Docker
Solution: Use the container's runtime: docker exec container_name php artisan tests:reduce

Problem: High similarity but tests aren't actually redundant
Solution: Review the generated reports carefully. High similarity doesn't always mean redundancy - especially for validation tests.

Verifying Setup

Check your configuration:

# 1. Verify .cov file exists and has content
ls -la coverage.cov

# 2. Verify Python dependencies
python3 -c "import numpy, sklearn, scipy; print('Dependencies OK')"

# 3. Test source code extraction
php artisan tests:reduce --validate

Research Basis

This project is based on the systematic mapping study:

Sebastian, A., Naseem, H., & Catal, C. (2024). Unsupervised Machine Learning Approaches for Test Suite Reduction. Applied Artificial Intelligence, 38(1), e2322336.

The systematic mapping study analyzed 34 research papers and identified key patterns in unsupervised test suite reduction approaches. Laravel Reductor implements the validated methodology while extending it for production use.

Key Research Findings Applied

  • Algorithm Selection: DBSCAN and K-means clustering (research-validated)
  • Feature Engineering: TF-IDF semantic analysis + coverage fingerprinting
  • Evaluation Metrics: Coverage preservation and test suite size reduction
  • Safety Mechanisms: Multi-layered validation to prevent dangerous test removal

Laravel-Specific Adaptations

  • Shared coverage filtering - Removes Laravel bootstrap/vendor code noise
  • Adaptive semantic/coverage weighting - Default 70/30 split optimized for Laravel
  • Assertion-aware tokenization - Special handling for assertStatus, assertJson, etc.
  • Parameterized test handling - Strips #0, #1 suffixes from data providers
  • Cluster safety engine - Prevents merging opposing tests (success/fail, null/empty)
  • Framework integration - Native Laravel service provider and Artisan commands

Research Compliance

Laravel Reductor addresses key research gaps identified in the literature:

  1. Scalability - Memory-efficient processing for enterprise test suites
  2. Artifact Availability - Complete open-source implementation
  3. Safety Validation - Multi-layered validation framework
  4. Production Integration - CI/CD pipeline support and DevOps tooling

For detailed analysis of the research alignment, see docs/RESEARCH_METHODOLOGY.md.

Acknowledgments

Special thanks to the authors of the Sebastian et al. paper for the foundational methodology and systematic analysis of the field.