talan-hdf/semantic-suggestion

TYPO3 extension for suggesting semantically related pages

Installs: 129

Dependents: 0

Suggesters: 0

Security: 0

Stars: 6

Watchers: 2

Forks: 2

Open Issues: 1

Type:typo3-cms-extension

1.4.0 2024-09-11 15:54 UTC

README

Join Our Community on Slack

We have a dedicated Slack channel where you can ask questions, discuss new features, and provide feedback on the extension. Join us to stay updated and participate in the conversation!

Join the Slack Channel

We look forward to seeing you there and engaging with you!

TYPO3 12 TYPO3 13 Latest Stable Version License

Elevate your TYPO3 website with intelligent, content-driven recommendations

๐ŸŒŸ Introduction

The Semantic Suggestion extension revolutionizes the way related content is presented on TYPO3 websites. Moving beyond traditional "more like this" functionalities based on categories and taxonomies, this extension employs advanced semantic analysis to create genuinely relevant content connections.

Key Benefits:

  • ๐ŸŽฏ Highly Relevant Links: Automatically generate connections based on actual content similarity, not just predefined categories.
  • โฑ๏ธ Increased User Engagement: Keep visitors on your site longer by offering truly related content.
  • ๐Ÿ•ธ๏ธ Semantic Cocoon: Contribute to a high-quality semantic network within your website, enhancing SEO and user navigation.
  • ๐Ÿค– Intelligent Automation: Reduce manual linking work while improving internal link quality.

Performance Consideration

While the Semantic Suggestion extension offers powerful capabilities, it's important to note:

  • ๐Ÿ“Š The similarity calculation process scales exponentially with the number of pages.
  • โณ For sites with over 500 pages, the initial calculation may take up to 30 seconds, depending on server capacity.
  • ๐Ÿ’ก We recommend using the backend module to assess the caching time for your specific setup.
  • ๐Ÿ”„ The cache is automatically reset when a page or content is modified, ensuring up-to-date similarity calculations.

๐Ÿ“Œ Pro Tip: Utilize the backend module to monitor performance and optimize settings for your specific use case.

By leveraging the power of semantic analysis, this extension provides a superior alternative to traditional related content plugins, offering more accurate and valuable content suggestions to your users.

New in Version 1.4.0

Stopwords Support

The extension now includes stopwords functionality, significantly improving the accuracy of content analysis. Stopwords are common words (such as "the", "is", "at") that are filtered out before processing the content. This feature enhances the relevance of semantic suggestions by focusing on meaningful content.

Debug Mode

A new debug mode has been introduced, which can be activated via TypoScript:

plugin.tx_semanticsuggestion_suggestions.settings.debugMode = 1

When enabled, this mode provides:

Detailed debug information in the backend interface
Comprehensive logs in public/typo3temp/logs/semantic_suggestion.log

This feature is invaluable for developers and administrators looking to fine-tune the extension's performance or troubleshoot issues.

### Backend Module Enhancements

The backend module has been significantly improved:

Optimized "Top 5 Most Similar Page Pairs" display, eliminating duplicate entries
Enhanced statistics and visualizations for better content insights
Improved performance for large-scale page analyses

### For Developers

New API methods are available to access stopwords statistics
The similarity calculation algorithm has been optimized, providing more accurate results



## ๐Ÿ“š Table of Contents

- [Introduction](#-introduction)
- [Features](#-features)
- [Requirements](#-requirements)
- [Installation](#-installation)
- [Configuration](#-configuration)
- [Usage](#-usage)
- [Backend Module](#-backend-module)
- [Similarity Logic](#-similarity-logic)
- [Display Customization](#-display-customization)
- [Multilingual Support](#-multilingual-support)
- [Debugging and Maintenance](#-debugging-and-maintenance)
- [Security](#-security)
- [Performance](#-performance)
- [File Structure](#-file-structure)
- [Unit Tests](#-unit-tests)
- [Contributing](#-contributing)
- [License](#-license)
- [Support](#-support)



### Frontend View
![Frontend view with the same theme](Documentation/Medias/frontend_on_the_same_theme_view.jpg)

## ๐Ÿš€ Features

- ๐Ÿ” Analyzes subpages of a specified parent page
- ๐Ÿ“Š Displays title, associated media, and enhanced text excerpt of suggested pages
- โš™๏ธ Highly configurable via TypoScript
- ๐ŸŽ› Customizable parent page ID, proximity threshold, and search depth
- ๐Ÿ’พ Optimized performance with database caching of proximity scores
- ๐ŸŒ Built-in multilingual support
- ๐Ÿงฉ Improved compatibility with various TYPO3 content structures, including Bootstrap Package
- ๐Ÿšซ Option to exclude specific pages from analysis and suggestions

## ๐Ÿ›  Requirements

- TYPO3 12.0.0-13.9.99
- PHP 8.0 or higher

## ๐Ÿ’ป Installation

<details>
<summary><strong>Composer Installation (recommended)</strong></summary>

1. Install the extension via composer:
   ```bash
   composer require talan-hdf/semantic-suggestion
  1. Activate the extension in the TYPO3 Extension Manager
Manual Installation
  1. Download the extension from the TYPO3 Extension Repository (TER) or the GitHub repository.
  2. Upload the extension file to your TYPO3 installation's typo3conf/ext/ directory.
  3. In the TYPO3 backend, go to the Extension Manager and activate the "Semantic Suggestion" extension.

โš™๏ธ Configuration

Edit your TypoScript setup to configure the extension:

plugin.tx_semanticsuggestion {
    settings {
        parentPageId = 1
        proximityThreshold = 0.7
        maxSuggestions = 3
        excerptLength = 150
        recursive = 1
        excludePages = 8,9,3456
        recencyWeight = 0.2

        analyzedFields {
            title = 1.5
            description = 1.0
            keywords = 2.0
            abstract = 1.2
            content = 1.0
        }
    }
}

Weight System for Analyzed Fields

The analyzedFields section allows you to configure the importance of different content fields in the similarity calculation:

Configuration Parameters Explained
  • parentPageId: The ID of the parent page from which the analysis starts
  • proximityThreshold: The minimum similarity threshold for displaying a suggestion (0.0 to 1.0)
  • maxSuggestions: The maximum number of suggestions to display
  • excerptLength: The maximum length of the text excerpt for each suggestion
  • recursive: The search depth in the page tree (0 = only direct children)
  • excludePages: Comma-separated list of page UIDs to exclude from analysis and suggestions
  • recencyWeight: Weight of recency in similarity calculation (0-1)

The Weight of Recency in Similarity Calculation (0-1)

The recencyWeight parameter determines the importance of publication or modification date in similarity calculations:

  • 0: Recency has no impact
  • 1: Recency has maximum impact
How Recency Weight Works
  1. Base similarity score is calculated from content
  2. Recency boost is calculated based on publication/modification dates
  3. Final similarity is a weighted combination of content similarity and recency boost

Formula:

finalSimilarity = (contentSimilarity * (1 - recencyWeight)) + (recencyBoost * recencyWeight)

Choosing the right value:

  • Low (0.1-0.3): Slightly favor recent content
  • Medium (0.4-0.6): Balance between content similarity and recency
  • High (0.7-0.9): Strongly favor recent content

Consider your specific use case:

  • News website: Higher recency weight
  • Educational resource: Lower recency weight
  • General blog: Medium recency weight

๐Ÿ–ฅ Usage

In Fluid Templates

To add the plugin directly in your Fluid template, use:

<f:cObject typoscriptObjectPath='lib.semantic_suggestion' />

This method uses the TypoScript configuration and is suitable for simple integrations.

TypoScript Integration

You can also integrate the Semantic Suggestions plugin using TypoScript. Add the following TypoScript setup to your configuration:

lib.semantic_suggestion = USER
lib.semantic_suggestion {
    userFunc = TYPO3\CMS\Extbase\Core\Bootstrap->run
    extensionName = SemanticSuggestion
    pluginName = Suggestions
    vendorName = TalanHdf
    controller = Suggestions
    action = list
}

Then, you can use it in your TypoScript template like this:

page.10 = < lib.semantic_suggestion

Or in specific content elements:

tt_content.semantic_suggestion = COA
tt_content.semantic_suggestion {
    10 = < lib.semantic_suggestion
}

Remember to include your TypoScript template in your site configuration or page setup.

๐ŸŽ› Backend Module

Backend module

The Semantic Suggestion extension includes a powerful backend module providing comprehensive insights into semantic relationships between your pages.

Features

  • ๐Ÿ“Š Similarity Analysis: Visualize semantic similarity between pages
  • ๐Ÿ” Top Similar Pairs: Quickly identify most related page pairs
  • ๐Ÿ“ˆ Distribution of Similarity Scores: Overview of similarity across content
  • โš™๏ธ Configurable Analysis: Set custom parameters (parent page ID, depth, thresholds)
  • ๐Ÿ“Š Visual Representation: Intuitive charts and progress bars
  • ๐Ÿ“‘ Detailed Statistics: In-depth page similarity and content relationship data

Access the module under the "Web" menu in the TYPO3 backend.

๐Ÿ’ก Tip: The effectiveness of semantic analysis depends on content quality and quantity. Ensure your pages have meaningful titles, descriptions, and content for best results.

Performance Metrics

Backend module performance metrics Backend module performance metrics - No cache

The backend module provides crucial performance metrics to help optimize the extension's operation:

Execution Time (seconds)
  • What: Total time for semantic analysis, including page retrieval, calculations, and caching
  • Interpretation:
    • Lower is better
    • High values may indicate need for content structure optimization or increased caching
    • 0.00 seconds typically means results were cached
Total Pages Analyzed
  • What: Number of pages included in the semantic analysis
  • Interpretation:
    • Depends on page tree structure and configured analysis depth
    • Higher numbers may increase accuracy but also execution time
Similarity Calculations
  • What: Total number of page-to-page similarity comparisons
  • Calculation: Typically n * (n-1) / 2, where n is the number of pages analyzed
  • Interpretation:
    • Higher numbers indicate more comprehensive analysis
    • May impact performance with large page sets
Results from Cache
  • What: Indicates whether results were retrieved from cache (Yes/No)
  • Interpretation:
    • "Yes" means faster execution (cached results)
    • "No" indicates a fresh analysis was performed
    • Frequent "No" results might suggest too frequent cache clearing or rapidly changing content

Optimizing Performance

  1. Caching: Adjust caching configuration to match your update frequency
  2. Analysis Depth: Balance comprehensiveness with performance
  3. Excluded Pages: Use excludePages setting to omit irrelevant pages
  4. Content Structure: Organize content to minimize analyzed pages without compromising quality

Monitor these metrics to fine-tune the extension's configuration for your specific use case.

๐Ÿงฎ Similarity Logic

The extension employs a custom similarity calculation to determine related pages:

  1. Data Gathering: Collects title, description, keywords, and content for each subpage of the specified parent page.
  2. Similarity Calculation: Compares page pairs using a word intersection and union method. The similarity score is the ratio of common words to total unique words, weighted by field importance.
  3. Proximity Threshold: Only pages with similarity scores above the configured threshold are considered related and displayed.
  4. Caching Scores: Calculated scores are stored in tx_semanticsuggestion_scores table for performance optimization. These are updated periodically or when page content changes.

๐ŸŽจ Display Customization

Customize the display of suggestions by overriding the Fluid template (List.html). Configure your own template paths in TypoScript:

plugin.tx_semanticsuggestion {
    view {
        templateRootPaths.10 = EXT:your_extension/Resources/Private/Templates/
    }
}

๐ŸŒ Multilingual Support

The extension fully supports TYPO3's multilingual structure, analyzing and suggesting pages in the current site language.

๐Ÿ› Debugging and Maintenance

The Semantic Suggestion extension utilizes TYPO3's logging system for comprehensive debugging and maintenance:

  • ๐Ÿ“ Configure logging to get detailed information about the analysis and suggestion process
  • ๐Ÿ” Monitor extension behavior and performance
  • ๐Ÿš€ Optimize based on logged data
Configuring Logging

Add the following to your typo3conf/AdditionalConfiguration.php:

$GLOBALS['TYPO3_CONF_VARS']['LOG']['TalanHdf']['SemanticSuggestion']['writerConfiguration'] = [
    \TYPO3\CMS\Core\Log\LogLevel::DEBUG => [
        \TYPO3\CMS\Core\Log\Writer\FileWriter::class => [
            'logFile' => 'typo3temp/logs/semantic_suggestion.log'
        ],
    ],
];

This configuration will log all debug-level and above messages to semantic_suggestion.log.

๐Ÿ”’ Security

The Semantic Suggestion extension implements several security measures:

  • ๐Ÿ›ก๏ธ Protection against SQL injections through TYPO3's secure query mechanisms (QueryBuilder)
  • ๐Ÿ” XSS attack prevention via automatic output escaping in Fluid templates
  • ๐Ÿšซ Access control restricted to users with appropriate permissions

โšก Performance

Optimized for efficient operation, even with large numbers of pages:

  • ๐Ÿ’พ Caching of similarity scores in the database
  • ๐Ÿ”„ Periodic score updates and refresh on content changes
  • ๐Ÿš€ Optimized content retrieval process
  • ๐ŸŽฏ Efficient handling of excluded pages
  • โš–๏ธ Batch processing of page analysis for server load management

๐Ÿ“ File Structure and Logic

semantic_suggestion/
โ”œโ”€โ”€ Classes/
โ”‚   โ”œโ”€โ”€ Controller/
โ”‚   โ”‚   โ”œโ”€โ”€ SemanticBackendController.php
โ”‚   โ”‚   โ””โ”€โ”€ SuggestionsController.php
โ”‚   โ””โ”€โ”€ Service/
โ”‚       โ””โ”€โ”€ PageAnalysisService.php
โ”œโ”€โ”€ Configuration/
โ”‚   โ”œโ”€โ”€ Backend/
โ”‚   โ”‚   โ”œโ”€โ”€ Modules.php
โ”‚   โ”‚   โ””โ”€โ”€ Routes.php
โ”‚   โ”œโ”€โ”€ TCA/
โ”‚   โ”‚   โ””โ”€โ”€ Overrides/
โ”‚   โ”‚       โ”œโ”€โ”€ sys_template.php
โ”‚   โ”‚       โ””โ”€โ”€ tt_content.php
โ”‚   โ”œโ”€โ”€ TypoScript/
โ”‚   โ”‚   โ”œโ”€โ”€ constants.typoscript
โ”‚   โ”‚   โ””โ”€โ”€ setup.typoscript
โ”‚   โ””โ”€โ”€ Services.yaml
โ”œโ”€โ”€ Documentation/
โ”‚   โ”œโ”€โ”€ Index.rst
โ”‚   โ”œโ”€โ”€ Installation/
โ”‚   โ”‚   โ””โ”€โ”€ Index.rst
โ”‚   โ”œโ”€โ”€ Introduction/
โ”‚   โ”‚   โ””โ”€โ”€ Index.rst
โ”‚   โ””โ”€โ”€ Medias/
โ”‚       โ”œโ”€โ”€ backend_module.png
โ”‚       โ”œโ”€โ”€ backend_module_performance_metrics.jpg
โ”‚       โ””โ”€โ”€ frontend_on_the_same_theme_view.jpg
โ”œโ”€โ”€ Resources/
โ”‚   โ”œโ”€โ”€ Private/
โ”‚   โ”‚   โ”œโ”€โ”€ Language/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ locallang.xlf
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ locallang_be.xlf
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ locallang_mod.xlf
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ locallang_semanticproximity.xlf
โ”‚   โ”‚   โ”œโ”€โ”€ Layouts/
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ Default.html
โ”‚   โ”‚   โ””โ”€โ”€ Templates/
โ”‚   โ”‚       โ”œโ”€โ”€ SemanticBackend/
โ”‚   โ”‚       โ”‚   โ”œโ”€โ”€ Index.html
โ”‚   โ”‚       โ”‚   โ””โ”€โ”€ List.html
โ”‚   โ”‚       โ””โ”€โ”€ Suggestions/
โ”‚   โ”‚           โ””โ”€โ”€ List.html
โ”‚   โ””โ”€โ”€ Public/
โ”‚       โ”œโ”€โ”€ Css/
โ”‚       โ”‚   โ””โ”€โ”€ SemanticSuggestion.css
โ”‚       โ””โ”€โ”€ Icons/
โ”‚           โ”œโ”€โ”€ Extension.svg
โ”‚           โ”œโ”€โ”€ module-semantic-suggestion.svg
โ”‚           โ””โ”€โ”€ user_mod_semanticproximity.svg
โ”œโ”€โ”€ Tests/
โ”‚   โ”œโ”€โ”€ Fixtures/
โ”‚   โ”‚   โ””โ”€โ”€ pages.xml
โ”‚   โ”œโ”€โ”€ Integration/
โ”‚   โ”‚   โ””โ”€โ”€ Service/
โ”‚   โ”‚       โ””โ”€โ”€ PageAnalysisServiceIntegrationTest.php
โ”‚   โ””โ”€โ”€ Unit/
โ”‚       โ””โ”€โ”€ Service/
โ”‚           โ””โ”€โ”€ PageAnalysisServiceTest.php
โ”œโ”€โ”€ .env
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ CHANGELOG.md
โ”œโ”€โ”€ IMPROVEMENTS.MD
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ ROADMAP_TO_STABLE.md
โ”œโ”€โ”€ composer.json
โ”œโ”€โ”€ ext_conf_template.txt
โ”œโ”€โ”€ ext_emconf.php
โ”œโ”€โ”€ ext_localconf.php
โ”œโ”€โ”€ ext_tables.php
โ””โ”€โ”€ phpunit.xml.dist

๐Ÿงช Unit Tests

The Semantic Suggestion extension includes a comprehensive suite of unit tests to ensure reliability and correctness of core functionalities, with a focus on the similarity calculation algorithm.

Test Coverage

  1. Weighted Word Calculation: Verifies the correct weighting of words based on field importance and word frequency.
  2. Similarity Calculation: Ensures accuracy of page similarity calculations using cosine similarity.
  3. Field-Specific Similarity: Tests the calculation of similarity scores for individual fields (title, content, keywords, etc.).
  4. Recency Boost Integration: Validates the integration of recency factors in the final similarity score.
  5. Page Data Preparation: Checks correct data preparation and preprocessing for similarity analysis.
  6. Common Keywords Detection: Tests functionality for finding shared keywords between pages.
  7. Relevance Determination: Validates logic for determining relevance based on calculated similarity scores.
  8. Edge Case Handling: Tests behavior with empty pages, single-word content, and extremely large content.
  9. Multilingual Content Handling: Verifies correct similarity calculation for content in different languages.
  10. Performance Testing: Evaluates the efficiency of similarity calculations with large datasets.
  11. Cache Handling: Ensures proper use of caching mechanisms for improved performance.

Running Tests

To run the unit tests:

  1. Ensure you have a development environment set up with DDEV.
  2. Open a terminal and navigate to your project root.
  3. Execute the following command:
ddev exec vendor/bin/phpunit -c packages/semantic_suggestion/phpunit.xml.dist --testdox --colors=always

For specific tests, add the --filter option:

ddev exec vendor/bin/phpunit -c packages/semantic_suggestion/phpunit.xml.dist --filter testMethodName

Commandes de test pour PageAnalysisService

Test de tous les tests dans PageAnalysisServiceTest

Pour exรฉcuter tous les tests dans la classe PageAnalysisServiceTest :

ddev exec vendor/bin/phpunit -c packages/semantic_suggestion/phpunit.xml.dist --testdox --colors=always --filter PageAnalysisServiceTest

Test d'une mรฉthode spรฉcifique

Pour tester une mรฉthode spรฉcifique, par exemple testGetWeightedWordsReturnsCorrectWeights :

ddev exec vendor/bin/phpunit -c packages/semantic_suggestion/phpunit.xml.dist --testdox --colors=always --filter "PageAnalysisServiceTest::testGetWeightedWordsReturnsCorrectWeights"

Test avec un motif de nom

Pour exรฉcuter tous les tests contenant "Similarity" dans leur nom :

ddev exec vendor/bin/phpunit -c packages/semantic_suggestion/phpunit.xml.dist --testdox --colors=always --filter "/::test.*Similarity/"

Exรฉcution avec couverture de code

Pour exรฉcuter les tests avec un rapport de couverture de code :

ddev exec vendor/bin/phpunit -c packages/semantic_suggestion/phpunit.xml.dist --testdox --colors=always --filter PageAnalysisServiceTest --coverage-text

Test en mode verbeux

Pour obtenir plus de dรฉtails sur l'exรฉcution des tests :

ddev exec vendor/bin/phpunit -c packages/semantic_suggestion/phpunit.xml.dist --testdox --colors=always --filter PageAnalysisServiceTest -v

Interpreting Results

  • โœ… Green checkmarks: Passed tests
  • โŒ Red crosses: Failed tests
  • โš ๏ธ Yellow exclamation marks: Risky or incomplete tests

Detailed output helps quickly identify and address any issues.

๐Ÿ’ก Tip: Regular test execution is recommended, especially after code changes, to ensure continued functionality and catch regressions early.

๐Ÿค Contributing

We welcome contributions to the Semantic Suggestion extension! Here's how you can contribute:

  1. ๐Ÿด Fork the repository
  2. ๐ŸŒฟ Create a new branch for your feature or bug fix
  3. ๐Ÿ› ๏ธ Make your changes and commit them with clear messages
  4. ๐Ÿš€ Push your changes to your fork
  5. ๐Ÿ“ฌ Submit a pull request to the main repository

Please adhere to existing coding standards and include appropriate tests for your changes.

๐Ÿ“„ License

This project is licensed under the GNU General Public License v2.0 or later. See the LICENSE file for full details.

๐Ÿ†˜ Support

For support and further information:

๐Ÿ‘ค Contact: Wolfangel Cyril
Email: cyril.wolfangel@gmail.com

๐Ÿ› Bug Reports and Feature Requests: Use the GitHub issue tracker

๐Ÿ“š Documentation and Updates: Visit our GitHub repository

๐Ÿ“˜ Full Documentation | ๐Ÿ› Report Bug | ๐Ÿ’ก Request Feature