gtstudio / module-ai-knowledge-base
Knowledge base management for Magento 2. Upload documents (PDF, TXT) that AI agents can retrieve as context before answering queries.
Package info
github.com/gabrielgts/module-ai-knowledge-base
Type:magento2-module
pkg:composer/gtstudio/module-ai-knowledge-base
Requires
- php: >=8.1
- gtstudio/module-ai-agents: >=1.0.0
- gtstudio/module-aiconnector: >=1.0.0
- magento/framework: >=2.4.4
- smalot/pdfparser: ^2.12
README
Document management for AI agents in Magento 2. Upload files that agents can retrieve as context before answering queries — enabling retrieval-augmented generation (RAG) without a vector database.
What It Does
- Upload and manage documents (PDF, TXT) in the Magento admin
- Documents are stored and indexed so that agents can fetch relevant excerpts at query time
- Integrates with
Gtstudio_AiAgents— assign a knowledge base to any agent
Requirements
- Magento 2.4.4+
- PHP 8.1+
Gtstudio_AiConnectorenabled and configuredGtstudio_AiAgentsenabledsmalot/pdfparser: ^2.12(PDF text extraction)
Installation
php bin/magento module:enable Gtstudio_AiKnowledgeBase php bin/magento setup:upgrade php bin/magento setup:di:compile php bin/magento setup:static-content:deploy -f --area adminhtml php bin/magento cache:flush
Usage
Uploading Documents
Navigate to AI Studio → Agents & Tools → Knowledge Base.
Click Add New, fill in:
| Field | Description |
|---|---|
| Title | Human-readable label (auto-populated from PDF metadata on upload) |
| Upload PDF Document | Upload a PDF file — text and metadata are extracted automatically |
| Content | Extracted text (editable; used for retrieval) |
| Tags | Comma-separated keywords (auto-populated from PDF metadata) |
| Agents | Associate this document with one or more agents |
| Is Active | Only active entries are searchable by agents |
How Retrieval Works
When an agent that has knowledge base documents attached receives a question:
- The question is matched against document excerpts using keyword or semantic similarity
- Relevant excerpts are prepended to the agent's system prompt as context
- The agent responds with awareness of those excerpts
No full document text is sent to the LLM — only the most relevant excerpts, keeping token usage low.
Extensibility
Supporting Additional File Formats
The text extraction pipeline uses a registry pattern. Register a custom extractor for a new MIME type:
<!-- etc/di.xml --> <type name="Gtstudio\AiKnowledgeBase\Model\Extractor\ExtractorPool"> <arguments> <argument name="extractors" xsi:type="array"> <item name="application/vnd.openxmlformats-officedocument.wordprocessingml.document" xsi:type="object"> Vendor\Module\Model\Extractor\DocxExtractor </item> </argument> </arguments> </type>
Implement Gtstudio\AiKnowledgeBase\Api\ExtractorInterface:
interface ExtractorInterface { /** * Extract plain text from the given file path. */ public function extract(string $filePath): string; }
Custom Retrieval Strategy
Override the retrieval service to use a vector database, OpenSearch k-NN, or any other similarity search:
<preference for="Gtstudio\AiKnowledgeBase\Api\RetrievalServiceInterface" type="Vendor\Module\Model\VectorRetrievalService"/>
Chunking Strategy
Document chunking (splitting documents into excerpt-sized pieces) can be customised:
<type name="Gtstudio\AiKnowledgeBase\Model\Chunker\TextChunker"> <arguments> <!-- Maximum characters per chunk --> <argument name="chunkSize" xsi:type="number">1500</argument> <!-- Overlap between consecutive chunks --> <argument name="overlap" xsi:type="number">200</argument> </arguments> </type>
Database Tables
| Table | Purpose |
|---|---|
gtstudio_ai_knowledge_base |
Document metadata (name, description, file path, agent association) |
gtstudio_ai_knowledge_base_chunk |
Extracted text chunks ready for retrieval |
ACL Resources
| Resource | Controls |
|---|---|
Gtstudio_AiKnowledgeBase::management |
Access to the Knowledge Base admin section |