gtstudio/module-ai-knowledge-base

Knowledge base management for Magento 2. Upload documents (PDF, TXT) that AI agents can retrieve as context before answering queries.

Maintainers

Package info

github.com/gabrielgts/module-ai-knowledge-base

Type:magento2-module

pkg:composer/gtstudio/module-ai-knowledge-base

Statistics

Installs: 5

Dependents: 0

Suggesters: 0

Stars: 0

Open Issues: 0

1.0.3 2026-03-11 03:09 UTC

This package is auto-updated.

Last update: 2026-03-11 03:10:29 UTC


README

Document management for AI agents in Magento 2. Upload files that agents can retrieve as context before answering queries — enabling retrieval-augmented generation (RAG) without a vector database.

What It Does

  • Upload and manage documents (PDF, TXT) in the Magento admin
  • Documents are stored and indexed so that agents can fetch relevant excerpts at query time
  • Integrates with Gtstudio_AiAgents — assign a knowledge base to any agent

Requirements

  • Magento 2.4.4+
  • PHP 8.1+
  • Gtstudio_AiConnector enabled and configured
  • Gtstudio_AiAgents enabled
  • smalot/pdfparser: ^2.12 (PDF text extraction)

Installation

php bin/magento module:enable Gtstudio_AiKnowledgeBase
php bin/magento setup:upgrade
php bin/magento setup:di:compile
php bin/magento setup:static-content:deploy -f --area adminhtml
php bin/magento cache:flush

Usage

Uploading Documents

Navigate to AI Studio → Agents & Tools → Knowledge Base.

Click Add New, fill in:

Field Description
Title Human-readable label (auto-populated from PDF metadata on upload)
Upload PDF Document Upload a PDF file — text and metadata are extracted automatically
Content Extracted text (editable; used for retrieval)
Tags Comma-separated keywords (auto-populated from PDF metadata)
Agents Associate this document with one or more agents
Is Active Only active entries are searchable by agents

How Retrieval Works

When an agent that has knowledge base documents attached receives a question:

  1. The question is matched against document excerpts using keyword or semantic similarity
  2. Relevant excerpts are prepended to the agent's system prompt as context
  3. The agent responds with awareness of those excerpts

No full document text is sent to the LLM — only the most relevant excerpts, keeping token usage low.

Extensibility

Supporting Additional File Formats

The text extraction pipeline uses a registry pattern. Register a custom extractor for a new MIME type:

<!-- etc/di.xml -->
<type name="Gtstudio\AiKnowledgeBase\Model\Extractor\ExtractorPool">
    <arguments>
        <argument name="extractors" xsi:type="array">
            <item name="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
                  xsi:type="object">
                Vendor\Module\Model\Extractor\DocxExtractor
            </item>
        </argument>
    </arguments>
</type>

Implement Gtstudio\AiKnowledgeBase\Api\ExtractorInterface:

interface ExtractorInterface
{
    /**
     * Extract plain text from the given file path.
     */
    public function extract(string $filePath): string;
}

Custom Retrieval Strategy

Override the retrieval service to use a vector database, OpenSearch k-NN, or any other similarity search:

<preference for="Gtstudio\AiKnowledgeBase\Api\RetrievalServiceInterface"
            type="Vendor\Module\Model\VectorRetrievalService"/>

Chunking Strategy

Document chunking (splitting documents into excerpt-sized pieces) can be customised:

<type name="Gtstudio\AiKnowledgeBase\Model\Chunker\TextChunker">
    <arguments>
        <!-- Maximum characters per chunk -->
        <argument name="chunkSize" xsi:type="number">1500</argument>
        <!-- Overlap between consecutive chunks -->
        <argument name="overlap" xsi:type="number">200</argument>
    </arguments>
</type>

Database Tables

Table Purpose
gtstudio_ai_knowledge_base Document metadata (name, description, file path, agent association)
gtstudio_ai_knowledge_base_chunk Extracted text chunks ready for retrieval

ACL Resources

Resource Controls
Gtstudio_AiKnowledgeBase::management Access to the Knowledge Base admin section