adachsoft/directory-scanner-tool

Safe, configurable directory scanning and file content search tools for PHP with adachsoft/ai-tool-call integration

Installs: 8

Dependents: 0

Suggesters: 0

Security: 0

Stars: 0

Forks: 0

pkg:composer/adachsoft/directory-scanner-tool

v0.3.1 2026-01-31 05:53 UTC

This package is not auto-updated.

Last update: 2026-01-31 04:56:46 UTC


README

Safe, configurable directory scanner and file content search tool for PHP projects, designed to integrate with the adachsoft/ai-tool-call library and AI agents (e.g. Google Gemini).

It exposes two tools:

  • directory_scanner – scans a configured base directory, applies exclusions and depth/entry limits, and returns a flat list of file system entries with optional metadata.
  • file_content_search – uses the same safe directory scanning, but additionally filters results to files whose contents match a given pattern (plain/regex/similarity search modes).

Version numbers are managed via Git tags / Packagist and follow Semantic Versioning. See CHANGELOG.md for notable changes.

Features

  • Safe scanning strictly confined to a configured base path (no directory traversal above base path).
  • Support for excluded subpaths (e.g. vendor, var/cache, .git).
  • Configurable maximum recursion depth (max_allowed_depth).
  • Configurable maximum number of filesystem entries scanned per request (max_entries, nullable), independent from the maximum number of returned results.
  • Optional request-level limit for the maximum number of returned results (max_results) for both tools.
  • Optional inclusion of additional metadata for each entry (keys are present only when enabled via include_*/default_include_* flags and a non-null value is available from the filesystem):
    • file size (bytes),
    • last modification time (ISO 8601 string),
    • future‑ready fields for creation time and permissions.
  • Optional approximate memory usage metrics for each scan when include_metrics is enabled.
  • Flat, predictable result structure (items + summary).
  • Ready‑to‑use SPI tools + factories for adachsoft/ai-tool-call:
    • DirectoryScannerTool / DirectoryScannerToolFactory,
    • FileContentSearchTool / FileContentSearchToolFactory.
  • Content‑based file filtering via file_content_search with multiple search modes:
    • plain (case‑insensitive substring),
    • plain_case_sensitive,
    • regex,
    • similarity (fuzzy match using similar_text).
  • Fine‑grained control over which files are searched by file_content_search using host‑level configuration (excluded_files, excluded_extensions, allowed_extensions) and optional request‑level allowed_extensions.

Requirements

  • PHP 8.3 or higher
  • Composer

The library depends on the following AdachSoft packages at runtime:

  • adachsoft/ai-tool-call
  • adachsoft/filesystem
  • adachsoft/normalized-safe-path

These are installed automatically when you require this package.

Installation

composer require adachsoft/directory-scanner-tool

Concepts and architecture

The core pieces of this library are:

  • DirectoryScannerTool – SPI tool implementation (AdachSoft\AiToolCall\SPI\ToolInterface) that is discovered and executed by adachsoft/ai-tool-call.
  • DirectoryScannerToolFactory – factory used by AiToolCallFacadeBuilder to create configured tool instances based on a ConfigMap.
  • FileContentSearchTool – SPI tool that wraps directory scanning and then filters entries by inspecting file contents using pluggable search strategies.
  • FileContentSearchToolFactory – factory that wires the same DirectoryScannerService and filesystem configuration, and composes FileContentSearchService.
  • DirectoryScannerService / DirectoryScanRunner – services responsible for scanning the file system and collecting results.
  • FileContentSearchService – uses DirectoryScannerService plus a set of search strategies (Strategy pattern) to keep the search logic extensible and testable.
  • PathNormalizationHelper – uses adachsoft/normalized-safe-path to ensure all paths stay inside the configured base path.

You typically do not construct these objects manually. Instead, you plug the factories into AiToolCallFacadeBuilder and configure the tools using ConfigMap.

Configuration

The tools are configured by the host application (not by the AI agent) via DirectoryScannerToolFactory, FileContentSearchToolFactory and ConfigMap.

Factory configuration for directory_scanner (host application)

Example of wiring the directory scanner tool with AiToolCallFacadeBuilder:

use AdachSoft\AiToolCall\PublicApi\Builder\AiToolCallFacadeBuilder;
use AdachSoft\AiToolCall\SPI\Collection\ConfigMap;
use AdachSoft\DirectoryScannerTool\DirectoryScannerToolFactory;

$factory = new DirectoryScannerToolFactory();

$facade = AiToolCallFacadeBuilder::new()
    ->withSpiFactories([$factory])
    ->withToolConfigs([
        'directory_scanner' => new ConfigMap([
            'base_path' => '/var/www/my-project',
            'excluded_paths' => ['vendor', 'var/cache', '.git'],
            'max_allowed_depth' => 10,
            'max_entries' => null,
            // Optional defaults for include_* flags when the agent does not specify them
            'default_include_size' => false,
            'default_include_created_at' => false,
            'default_include_modified_at' => false,
            'default_include_permissions' => false,
        ]),
    ])
    ->build();

Supported config keys (both tools)

All config keys are passed as an array to ConfigMap for tool names directory_scanner and file_content_search:

  • base_path (string, required)

    • Absolute path that acts as the root of all scans.
    • All agent‑provided paths are resolved relative to this base path.
  • excluded_paths (string[]|optional)

    • List of relative paths (from base path) that should be excluded from scanning.
    • Both the directory itself and all its descendants are excluded.
  • max_allowed_depth (int, optional, default: 10)

    • Maximum recursion depth allowed by the host application.
    • The effective depth used for a given request is the minimum of this value and the request‑level max_depth parameter (see below).
  • max_entries (int|null, optional, default: null)

    • Maximum number of filesystem entries (files and directories) that can be processed by the underlying scanner in a single request.
    • null means "no limit" – the library will not stop scanning based on the number of encountered filesystem entries; the host application is then responsible for other safety limits (time, process limits, etc.).
    • When a positive integer is configured and reached, scanning stops and summary.truncated_by_max_entries is set to true.
  • default_include_size (bool, optional, default: false)

  • default_include_created_at (bool, optional, default: false)
  • default_include_modified_at (bool, optional, default: false)
  • default_include_permissions (bool, optional, default: false)
    • Default values used when the agent omits corresponding request parameters.

Internally DirectoryScannerConfig keeps PHP properties in camelCase (e.g. $basePath, $excludedPaths), but everywhere arrays/JSON are used the keys follow snake_case as shown above.

The file_content_search tool uses the same configuration, but always returns only file entries whose contents match the request pattern.

Additional config keys for file_content_search

The following configuration keys are interpreted only by the file_content_search tool. They do not change the behaviour of directory_scanner itself, but they control which files are inspected when performing content search:

  • excluded_files (string[]|optional)

    • List of relative file paths (from base_path) that must never be searched or returned by file_content_search, even if their contents would otherwise match the pattern.
    • Paths are normalised to use forward slashes and duplicates are removed.
  • excluded_extensions (string[]|optional)

    • List of file extensions that must never be inspected by file_content_search.
    • Values are normalised by trimming whitespace, converting to lower‑case and stripping a leading dot (e.g. "PHP", ".php" and " php " all become "php").
    • Files whose extension is in this list are ignored even if they match the search pattern.
  • allowed_extensions (string[]|optional)

    • Optional allow‑list of file extensions that may be inspected by file_content_search.
    • When non‑empty, only files whose extension is in this list are considered for content search.
    • Values are normalised in the same way as for excluded_extensions (trim, lower‑case, leading dot removed).

excluded_files and excluded_extensions always take precedence over allowed_extensions – a file explicitly excluded by path or extension will never be searched, even when its extension appears in an allow‑list.

Tool invocation (AI agent request)

Once the tools are registered, AI agents (or your own code) call them through the AdachSoft\AiToolCall\PublicApi\AiToolCallFacade.

Request parameters – directory_scanner

The directory_scanner tool exposes the following parameters schema (as seen in DirectoryScannerTool::getDefinition()):

  • path (string, required)

    • Relative path to scan from base path (e.g. ., src, src/Module).
    • . means "start from the base path itself".
  • recursive (bool, default: false)

    • Whether nested directories should be scanned recursively.
  • max_depth (int|null, default: null)

    • Maximum recursion depth relative to the starting directory.
    • 1 means "only direct children".
    • The actual maximum depth used is min(max_depth, config.max_allowed_depth).
  • max_results (int|null, default: null)

    • Maximum number of entries to return in the items array for this call.
    • null means "no limit" on the number of returned entries (only the configured max_entries scan limit may truncate the scan).
  • include_size (bool, default: false)

    • Whether to include file size in bytes (for files only).
  • include_created_at (bool, default: false)

    • Reserved for future use (creation time; currently may always be null depending on filesystem).
  • include_modified_at (bool, default: false)

    • Whether to include last modification time as an ISO 8601 string.
  • include_permissions (bool, default: false)

    • Reserved for future use (POSIX‑like permission string); may be null if unavailable.
  • include_metrics (bool, default: false)

    • When true, the summary includes approximate memory usage metrics for the scan (memory_usage_start_bytes, memory_usage_end_bytes, memory_peak_bytes).

Request parameters – file_content_search

The file_content_search tool accepts the same parameters as directory_scanner, plus:

  • pattern (string, required)

    • Text or pattern to search for in file contents.
    • Must be a non‑empty string.
  • search_mode (string, default: plain)

    • Controls how pattern is applied to file contents.
    • One of:
      • plain – case‑insensitive substring search,
      • plain_case_sensitive – case‑sensitive substring search,
      • regex – PHP regular expression, pattern is wrapped as "/{$pattern}/u",
      • similarity – fuzzy match using similar_text (internal threshold ~70%).
  • max_results (int|null, default: null)

    • Maximum number of matching files to return in the items array.
    • null means "no limit" on the number of matches returned; the underlying directory scan is still subject to the configured max_entries scan limit.
  • allowed_extensions (string[]|optional, default: [])

    • Optional list of file extensions (without dots) that narrows down which files are inspected for this particular search request, for example ["php", "js"].
    • Values are normalised in the same way as configuration‑level allowed_extensions (trimmed, lower‑case, without a leading dot).
    • The final set of extensions that file_content_search will inspect is computed as follows:
      • start with configuration‑level allowed_extensions (if non‑empty),
      • if request‑level allowed_extensions is non‑empty, intersect it with the configuration list; when configuration‑level allowed_extensions is empty, the request‑level list alone is used,
      • files without any extension are ignored whenever there is at least one effective allowed_extensions value.

excluded_extensions and excluded_files from configuration always take precedence over allowed_extensions – they define hard exclusions regardless of request‑level values.

If search_mode = 'regex' and the pattern is not a valid regular expression, the tool throws InvalidToolCallException.

Example: calling directory_scanner via Public API

use AdachSoft\AiToolCall\PublicApi\Dto\ToolCallRequestDto as PublicToolCallRequestDto;

$request = new PublicToolCallRequestDto(
    toolName: 'directory_scanner',
    parameters: [
        'path' => '.',
        'recursive' => true,
        'max_depth' => 3,
        'max_results' => 100,
        'include_size' => true,
        'include_modified_at' => true,
        'include_metrics' => true,
    ],
);

$result = $facade->callTool($request);

// $result->toolName === 'directory_scanner'
// $result->result is an array with keys 'items' and 'summary'

$items = $result->result['items'];
$summary = $result->result['summary'];

Example: calling file_content_search via Public API

use AdachSoft\AiToolCall\PublicApi\Dto\ToolCallRequestDto as PublicToolCallRequestDto;

$request = new PublicToolCallRequestDto(
    toolName: 'file_content_search',
    parameters: [
        'path' => '.',
        'recursive' => true,
        'max_depth' => 3,
        'max_results' => 50,
        'pattern' => 'TODO',
        'search_mode' => 'plain',
        'include_metrics' => true,
        'allowed_extensions' => ['php', 'js'],
    ],
);

$result = $facade->callTool($request);

// $result->toolName === 'file_content_search'
// $result->result has the same shape as for directory_scanner

$items = $result->result['items'];
$summary = $result->result['summary'];

Only files whose contents match the given pattern (according to search_mode) are returned in items. Directories are never included in file_content_search results.

Response structure

Both tools return a structure containing two top‑level keys: items and summary.

items

items is a flat list of scan entries:

/**
 * @var array<int, array{
 *     path: string,
 *     name: string,
 *     is_file: bool,
 *     is_directory: bool,
 *     size?: int,
 *     created_at?: string,
 *     modified_at?: string,
 *     permissions?: string,
 * }> $items
 */
$items = $result->result['items'];
  • path – relative path from the configured base path.
  • name – basename of the entry (file or directory name).
  • is_filetrue if the entry is a file.
  • is_directorytrue if the entry is a directory.
  • size – file size in bytes. Present only when include_size/default_include_size is enabled for a file entry and the filesystem provides a size.
  • created_at – creation time as ISO 8601 string. Present only when include_created_at/default_include_created_at is enabled and creation time is available from the filesystem.
  • modified_at – last modification time as ISO 8601 string. Present only when include_modified_at/default_include_modified_at is enabled and last modification time is available from the filesystem.
  • permissions – POSIX‑style permissions string. Present only when include_permissions/default_include_permissions is enabled and the filesystem exposes a permissions string.

Optional metadata keys are omitted entirely when the corresponding include flags are disabled or not requested. The tools do not emit "...": null for fields that were not explicitly asked for.

For file_content_search, the structure is identical, but only entries with is_file === true that match the content search criteria are present.

summary

summary contains metadata about the scan:

/**
 * @var array{
 *     requested_path: string,
 *     recursive: bool,
 *     requested_max_depth: int|null,
 *     effective_max_depth: int,
 *     actual_depth_reached: int,
 *     total_entries_found: int,
 *     returned_entries_count: int,
 *     truncated_by_max_entries: bool,
 *     truncated_by_max_results: bool,
 *     issue_note?: string,
 *     memory_usage_start_bytes?: int,
 *     memory_usage_end_bytes?: int,
 *     memory_peak_bytes?: int,
 * } $summary
 */
$summary = $result->result['summary'];
  • requested_path – the path value from the request.
  • recursive – whether recursive scanning was enabled for the request.
  • requested_max_depth – raw max_depth from the request (may be null).
  • effective_max_depth – actual recursion depth limit used after applying config constraints.
  • actual_depth_reached – deepest level reached during the scan.
  • total_entries_found / returned_entries_count:
    • for directory_scanner – number of filesystem entries that qualified for inclusion and number of entries actually returned in items after applying max_results (if any),
    • for file_content_search – number of matching file entries (before and after applying max_results), where returned_entries_count may be lower than total_entries_found when max_results is used.
  • truncated_by_max_entriestrue if the underlying directory scan was stopped because max_entries was reached.
  • truncated_by_max_resultstrue if the list of returned results was truncated to the request‑level max_results limit.
  • issue_note – present only when there is something important to report about the scan result, for example:
    • when the requested path is equal to or inside an entry from excluded_paths, explaining that access to this path is forbidden by configuration and suggesting that the agent chooses another path or asks the host to adjust excluded_paths,
    • when the scan or result list was truncated because of max_entries and/or max_results, with guidance on how to adjust configuration or request parameters to obtain more results.
  • memory_usage_start_bytes – present only when include_metrics === true. Approximate process memory usage (in bytes) just before the scan started.
  • memory_usage_end_bytes – present only when include_metrics === true. Approximate process memory usage (in bytes) immediately after the scan finished.
  • memory_peak_bytes – present only when include_metrics === true. Peak process memory usage (in bytes) reported by memory_get_peak_usage(true) during the lifetime of the PHP process.

Error handling

The tools use exceptions from adachsoft/ai-tool-call and their own domain exceptions to signal problems:

  • InvalidToolCallException

    • Thrown when request parameters are invalid (e.g. wrong types, impossible options, invalid pattern for file_content_search).
  • ToolExecutionException

    • Wraps domain and filesystem errors that occur during scanning or content search.
    • The original cause is available as the previous exception and usually contains a DirectoryScannerToolException with a more detailed message.
  • DirectoryScannerDomainException

    • Used internally for invalid or unsafe path operations (e.g. attempts to escape base path).

In typical adachsoft/ai-tool-call setups, these exceptions are translated into structured error responses returned to the AI agent.

Development

To work on the library locally, install dev dependencies and run the checks:

composer install

# Run test suite
vendor/bin/phpunit

# Run static analysis
vendor/bin/phpstan analyse

# Run coding standards fixer (dry run or fix)
PHP_CS_FIXER_IGNORE_ENV=1 vendor/bin/php-cs-fixer fix --dry-run

Versioning

This library follows Semantic Versioning. Versions are published as Git tags and exposed on Packagist. The composer.json file does not contain an explicit version field; Composer reads version information from VCS tags.

See CHANGELOG.md for a list of notable changes between versions.

License

This library is open‑source software licensed under the MIT License. See the LICENSE file for full license text.

Author

  • Arkadiusz Adach