contextualcode / content-import
Content import bundle
Requires
- php: >=7.3
- ext-curl: *
- ext-dom: *
- ext-json: *
- ext-libxml: *
- contextualcode/crawler: ^1.1 || ^2.0
- doctrine/doctrine-bundle: ^2.1
- doctrine/orm: ^2.7
- monolog/monolog: ^2.0
- symfony/console: >=5.0
- symfony/doctrine-bridge: >=5.0
Requires (Dev)
- friendsofphp/php-cs-fixer: ^2.16
- dev-master
- v2.9.2
- v2.9.1
- v2.9.0
- v2.8.0
- v2.7.1
- v2.7.0
- v2.6.1
- v2.6.0
- v2.5.0
- v2.4.3
- v2.4.2
- v2.4.1
- v2.4.0
- v2.3.10
- v2.3.9
- v2.3.8
- v2.3.7
- v2.3.6
- v2.3.5
- v2.3.4
- v2.3.3
- v2.3.2
- v2.3.1
- v2.3.0
- v2.2.3
- v2.2.2
- v2.2.1
- v2.2.0
- v2.1.5
- v2.1.4
- v2.1.3
- v2.1.2
- v2.1.1
- v2.1.0
- v2.0.1
- v2.0.0
- 1.1.x-dev
- v1.1.1
- v1.1.0
- 1.0.x-dev
- v1.0.0
This package is not auto-updated.
Last update: 2025-01-11 02:35:29 UTC
README
This package provides content import functionality from the results of contextualcode/crawler. Despite the fact originally it was designed for eZ Platform v3 it does not contain any specific CMS/CMR/DXP functionality. It provides an abstraction layer, so can be easily implemented in any new CMS/CMR/DXP.
Requirements
The only and the main requirement for any new platform where this package is going to be implemented is the platform content model. We assume all modern CMS/CMR/DXP have similar content model to eZ Platform:
- Each content item has its content type. For example, an article is an instance of the "Article" content type. And some product is an instance of the "Product" content type.
- The content is versioned. So each content item has multiple versions. And the version identifier is an integer number.
- The content item is used only to store the data (content fields).
- All the content items are structured by using separate "Location" items. The type of structure is not important it might be a tree/catalog/etc.
Basic Concepts
This package introduces a few new content import-related concepts. And it is very important to have an understanding of each of them. More detailed information is available in the reference.
Page
Each website URL is represented by ContextualCode\Crawler\Entity\Page
entity. Even binary files and images have corresponding own Page
entities.
Those entities are created by contextualcode/crawler and are used by this package. Please check crawlers documentation to learn how to crawl a website and store the results in Page
entities.
Content Import Handler
Each Page
is transformed into a single content item. And there are special handlers to make that transformation. Each of them is handling only 1 specific content type (CMS scope). For example, the "Article" Content Import Handler will convert all the articles, and the "Blog Post" Content Import Handler will convert all the blog posts. The only and single responsibility for an Import Content Handler is to convert a Page
entity into a CMS content item.
More details are available in the reference.
Content Field Transformer
Content Import Handler defines the exact way how the Page
is converted into the content item. It includes providing the way how to extract content field values from the page for each content. Content field values are extracted from the page by using Content Field Transformers. They receive the Page
entity and some options as the input and return the content field value. This package provides a few Content Field Transformers. A good example of Content Field Transformers usage would be an Article page. To convert it into the content item, following data needs to be extracted:
- Title,
text-line
Content Field Transformer will be used to get its content: extract a text line from thePage
entity by specifiedXPath
selector - Body,
html
Content Field Transformer which extracts an HTML content from thePage
entity using providedXPath
selector
More details are available in the reference.
Content Hash Transformer
All the extracted content fields are hashed. It is done to be able to determine if there were any changes in the import sources. Content Hash Transformers work similar to Content Field Transformers, but they receive the field value as input and return its string representation.
More details are available in the reference.
Content Hash
Each content item created/updated by the content import script has its Content Hash. Content Hash contains the composed hash for all the content fields. It is used to define if a content item requires an update on the next content import scripts executions. Also, it is used to track if the content was edited manually since its import. In such cases, the content will be not updated during the next content import script executions, as it has some manual changes.
Location Hash
Location hash is very similar to the Content Hash. But it is not calculated based on the content fields, instead only the URL of the source Page
used to calculate it. As it is the only parameter that defines the content item position (location) in the CMS content structure.
Content Operations
The list of the CMS specific operations like creating content, updating the content, adding a location, etc. This package just provides an interface and example dummy implementation for Content Operations. They need to be implemented in CMS/CMR/DXP specific package.
More details are available in the reference.
Installation
Require
contextualcode/content-import
viacomposer
:composer require contextualcode/content-import
Run the migrations:
php bin/console doctrine:migrations:migrate --configuration=vendor/contextualcode/crawler/src/Resources/config/doctrine_migrations.yaml --no-interaction php bin/console doctrine:migrations:migrate --configuration=vendor/contextualcode/content-import/src/Resources/config/doctrine_migrations.yaml --no-interaction
Usage
This package has an example dummy CMS integration. In order to integrate it with any new CMS/CMR/DXP, the following steps need to be followed:
Create CMS/CMR/DXP specific package, which will map the content model:
- Define
ContentType
class, which will implementContentTypeInterface
, example:Service/Integration/ContentType
. - Define
Content
class, which will implementContentInterface
, example:Service/Integration/Content
. - Define
Location
class, which will implementLocationInterface
, example:Service/Integration/Location
.
- Define
Define the CMS/CMR/DXP specific content operations handler. It should implement
ContentOperationsInterface
, example:Service/Integration/ContentOperations
.Implemented content operations handler should be registered as
ContextualCode\ContentImport\ContentHandler\ContentOperationsInterface
service:ContextualCode\ContentImport\ContentHandler\ContentOperationsInterface: class: ContextualCode\ContentImport\Service\Integration\ContentOperations