starfruit/crawler-bundle

0.0.2 2025-09-18 10:02 UTC

This package is auto-updated.

Last update: 2025-09-18 10:02:54 UTC


README

Starfruit Crawler Bundle

Crawler UI

Requirements

Google Cloud

  1. Create a new project then enable below libraries:
  2. Create a service account and download JSON credentials file

Google Cloud screenshort

Installation

    composer require starfruit/crawler-bundle

OR

    composer require starfruit/crawler-bundle --ignore-platform-req=ext-amqp
  • Update config/bundles.php file:
    return [
        ....
        Starfruit\CrawlerBundle\StarfruitCrawlerBundle::class => ['all' => true],
    ];

Setup

  • Create a new variable in .env file:
# path to file Google Cloud JSON, example:
CRAWLER_BUNDLE_GOOGLE_JSON=/root/project/public/crawler-google-credential.json
  • Update config/config.yaml file:
imports:
        - { resource: 'local/' }

pimcore:
    ...
    ...

# config for crawler bundle
starfruit_crawler:
    target:
        class_object: # list of classname as key, and fields
            News: # name of class
                content_field: 'content' # field to paste crawled content
                last_version_field: 'importUrl' # field to store last version, can be null

            Event: # name of class
                content_field: 'mainContent'

    # custom asset path in Admin to store images, media
    asset_store_path: '/default-crawler-media/image'

    # custom format for html after crawling
    content_format:
        heading:
            # all default config to mapping headling value to html tag
            default: 'p' # default tag
            HEADING_1: 'h1'
            HEADING_2: 'h2'
            HEADING_3: 'h3'
            HEADING_4: 'h4'