flowpack/decoupledcontentstore

This is the 2nd generation of a Two-Stack CMS package for Neos.


README

This is the 2nd generation of a Two-Stack CMS package for Neos.

This Package is used in production in a bigger instance.

The Content Store package is one part of a Two-Stack CMS solution with Neos. A Two-Stack architecture separates editing and publishing from the delivery of content. This is also an architecture that's suitable to+ integrate Neos content in various other systems without adding overhead during delivery.

The first iteration was not open source; developed jointly by Networkteam and Sandstorm and is in use for several large customers. The second iteration (this project) is developed from scratch, in an open-source way, based on the learnings of the first iteration. Especially the robustness has been greatly increased.

What does it do?

The Content Store package publishes content from Neos to a Redis database as immutable content releases. These releases can be atomically switched and a current release points to the active release.

The delivery layer in the Two-Stack architecture uses the current release and looks for matching URLs in the content store and delivers the pre-rendered content. A delivery layer is decoupled from the actual Neos CMS and can be implemented in any language or framework. It is also possible to integrate the delivery layer part in another software (e.g. a shop system) as an extension.

Features

  • Publish a full, read-only snapshot of your live content to Redis in a so-called Content Release
  • allows for incremental publishing; so if a change is made, only the needed pages are re-rendered. This is integrated with the Neos Content Cache; so cache flushings work correctly.
  • Integration with Neos workspace publishing for automatic incremental publishing to the Content Store
  • Configurable Content Store format, decoupled from the internal representation in Neos.
  • Extensibility: Enrich content releases with your custom data.
  • Allows parallel rendering
  • Allows copying the content releases to different environments.
  • Allows rsyncing persistent assets around (should you need it)
  • Backend module with overview of content releases (current release, switching releases, manual publish)

This project is using the go-package prunner and its Flow Package wrapper as the basis for orchestrating and executing a content release.

Requirements

  • Redis
  • Sandstorm.OptimizedCacheBackend is required when this package is used with Neos 7.3 (Neos 8 already has an optimized Redis backend)
  • Prunner

Start up prunner via the following command:

prunner/prunner --path Packages --data Data/Persistent/prunner

Copy the pipelines_template.yml file into your project and adjust it as needed (see below and the comments in the file for explanation).

Approach to Rendering

The following flow chart shows the rendering pipeline for creating a content release.

                   ┌─────────────────────┐                                                      
                   │   Node Rendering    │                                                      
 ┌───────────┐     │   ┌─────────────┐   │     ┌───────────┐     ┌───────────┐     ┌───────────┐
 │   Node    │     │   │Orchestrator │   │     │  Release  │     │Transfer to│     │  Atomic   │
 │Enumeration│────▶│   └─────────────┘   │────▶│Validation │────▶│  Target   │────▶│  Switch   │
 └───────────┘     │┌────────┐ ┌────────┐│     └───────────┘     └───────────┘     └───────────┘
                   ││Renderer│ │Renderer││                                                      
                   └┴────────┴─┴────────┴┘                                                      
  • At the beginning of every render, all nodes are enumerated. The Node Enumeration contains all pages which need to be in the final content release.

  • Then, the rendering takes place. In parallel, the orchestrator checks if pages are already fully rendered. If no, he creates rendering jobs. If yes, the rendered page is added to the in-progress content release.

    The renderers simply render the pages as instructed by the orchestrator.

    The orchestrator tries to render multiple times: It can happen that after a render, the rendering did not successfully work, because an editor has changed pages at the same time; leading to content cache flushes and "holes" in the output.

  • During validation, checks can happen to see whether the content release is fully complete; to check whether it really can go online.

  • During the transfer phase, the finished content release is copied to the production Redis instance if needed. This includes copying of assets if needed.

  • In the switch phase, the content release goes live.

The above pipeline is implemented with prunner which is orchestrating the different steps.

Infrastructure

Here, we explain the different infrastructure and setup constraints for using the content store.

  • The Neos Content Cache must use Redis. It can use the OptimizedRedisCacheBackend.
  • The Content Store needs a separate Redis Database, but it can run on the same server.

It is crucial that Redis is available via lowest latency for Neos AND the Delivery Layer. See the different setup scenarios below for how this can be done.

Minimal Setup

The minimal setup looks as follows:

  • Neos writes into the Content Store Redis Database, and the Delivery Layer reads from the Content Store Redis Database.
  • Assets (persistent resources) are written directly to a publicly available Cloud Storage such as S3.
┌──────────────┐   ┌──────────────┐            
│ Neos Content │   │Content Store │            
│Cache Redis DB│   │   Redis DB   │◀───┐       
└──────────────┘   └──────────────┘    │       
        ▲                  ▲           │       
        └────────┬─────────┘           │       
                 │                     │       
             ╔══════╗          ╔══════════════╗
             ║ Neos ║          ║Delivery Layer║
             ╚══════╝          ╚══════════════╝
                 │                             
                 │                             
                 │       ┌──────────────┐      
                 │       │Asset Storage │      
                 └──────▶│   (S3 etc)   │      
                         └──────────────┘      

In this case, the transfer phase does not need to do anything, and you need to configure Neos to use the cloud storage (e.g. via Flownative.Google.CloudStorage or Flownative.Aws.S3) for resources.

This is implemented in the default pipelines_template.yml.

This Setup should be used if:

  • the Delivery Layer and Neos are in the same data center (or host), so both can access Redis via lowest latencies
  • you want the easiest possible setup.

If you use Cloud Asset Storage, ensure that you never delete assets from there. For Flownative.Aws.S3, you can follow the guide on "Preventing Unpublishing of Resources in the Target".

Manually Sync Assets to the Delivery Layer via RSync

If you can not to use a Cloud Asset Storage, there's a built-in feature to manually sync assets to the delivery layer(s) via RSync.

To enable this, you need to follow the following steps:

  1. Configure in Settings.yaml:

    Flowpack:
      DecoupledContentStore:
        resourceSync:
          targets:
            -
              host: localhost
              port: ''
              user: ''
              directory: '../nginx/frontend/resources/'
  2. In pipelines.yml, underneath 4) TRANSFER, comment-in the transfer_resources task.

Copy Content Releases to a different Redis instance

This Setup should be used if:

  • the Delivery Layer and Neos are in different data centers, so that there is a higher latency between one of the instances toward Redis
  • Or you need multiple delivery layers with different content states, with e.g. a staging delivery layer and a live delivery layer.
┌──────────────┐   ┌──────────────┐                   ┌──────────────┐
│ Neos Content │   │Content Store │                   │Content Store │
│Cache Redis DB│   │   Redis DB   │  ┌ ─ ─ ─ ─ ─ ─ ─ ▶│   Redis DB   │
└──────────────┘   └──────────────┘    Higher         └──────────────┘
        ▲                  ▲         │ Latency                ▲       
        └────────┬─────────┘                                  │       
                 │                   │                        │       
             ╔══════╗                                 ╔══════════════╗
             ║ Neos ║─ ─ ─ ─ ─ ─ ─ ─ ┘                ║Delivery Layer║
             ╚══════╝                                 ╚══════════════╝
                 │                                                    
                 │                                                    
                 │       ┌──────────────┐                             
                 │       │Asset Storage │                             
                 └──────▶│   (S3 etc)   │                             
                         └──────────────┘                                                 

In this case, the content store Redis DB is explicitly synced by Neos to another Delivery layer.

To enable this feature, do the following:

  1. Configure the additional Content Stores in Settings.yaml underneath Flowpack.DecoupledContentStore.redisContentStores. The key is the internal identifier of the content store:

    Flowpack:
      DecoupledContentStore:
        redisContentStores:
          live:
            label: 'Live Site'
            hostname: my-redis-hostname
            port: 6379
            database: 11
          staging:
            label: 'Staging Site'
            hostname: my-staging-redis-hostname
            port: 6379
            database: 11
  2. In pipelines.yml, underneath 4) TRANSFER, comment-in and adjust the transfer_content task.

  3. In pipelines.yml, underneath 5) TRANSFER, comment-in the additional contentReleaseSwitch:switchActiveContentRelease commands.

Alternative: Redis Replication

Instead of the explicit synchronization described here, you can also use Redis Replication to synchronize the primary Redis to the other instances.

Using Redis replication is transparent to Neos or the Delivery Layer.

To be able to use Redis replication, the Redis secondary (i.e. the delivery-layer's instance) needs to connect to the primary Redis instance.

For the explicit synchronization described here, the Redis instances do not need to communicate directly with each other; but Neos needs to be able to reach all instances.

Incremental Rendering

As a big improvement for stability (compared to v1), the rendering pipeline does not make a difference whether it is a full or an incremental render. To trigger a full render, the content cache is flushed before the rendering is started.

Options

After changing an Asset (e.g. in the Media Module) an incremental rendering is triggered. You can opt out of this behavior by setting the following configuration:

Flowpack:
  DecoupledContentStore:
    startIncrementalReleaseOnAssetChange: false

What happens if edits happen during a rendering?

If a change by an editor happens during a rendering, the content cache is flushed (by tag) as a result of this content modification. Now, there are two possible cases:

  • the document (which was modified) has not been rendered yet inside the current rendering. In this case, the rendered document would contain the recent changes.
  • the document was already rendered and added to the content release. In this case, the rendered document would not contain the recent changes.

The 2nd case is a bit dangerous, in the sense that we need a re-render to happen soon; otherwise we would not converge to a consistent state.

For use cases like scheduling re-renders, prunner supports a concurrency limit (i.e. how many jobs can run in parallel) - and if this limit is reached, it supports an additional queue which can be also limited.

So the following lines from pipelines.yml are crucial:

pipelines:
  do_content_release:
    concurrency: 1
    queue_limit: 1
    queue_strategy: replace

So, if a content release is currently running, and we try to start a new content release, then this task is added to the queue (but not yet executed). In case there is already a rendering task queued, this gets replaced by the newer rendering task.

This ensures that we have at most one content release running at any given time; and at most one content-release in the wait-list waiting to be rendered. Additionally, we can be sure that scheduled content releases will be eventually executed, because that's prunner's job.

Extensibility

Custom pipelines.yml

Crafting a custom pipelines.yml is the main extension point for doing additional work (f.e. additional enumeration or rendering).

Custom Document Metadata, integrated with the Content Cache

Sometimes, you need to build additional data structures for every individual document. Ideally, you'll want this structure to be integrated with the content cache; i.e. only refresh it if the page has changed.

Performance-wise, it is clever to do this at the same time as the rendering itself, as the content nodes (which you'll usually need) are already loaded in memory. You can register a Flowpack\DecoupledContentStore\NodeRendering\Extensibility\DocumentMetadataGeneratorInterface in Settings.yaml:

Flowpack:
  DecoupledContentStore:
    extensions:
      documentMetadataGenerators:
        'yourMetadataGenerator':
          className: 'Your\Extra\MetadataGenerator'

When you implement this class, you can add additional Metadata which is serialized to the Neos content cache for every rendered document.

Often, you'll also want to add another contentReleaseWriter which reads the newly added metadata and adds it to the final content release. Read the next section how this works.

Custom Content Release Writer

You can completely define how a content release is laid out in Redis for consumption by your delivery layer.

By implementing a custom ContentReleaseWriter, you can specify how the rendered content is stored in Redis.

Again, this is registered in Settings.yaml:

Flowpack:
  DecoupledContentStore:
    extensions:
      contentReleaseWriters:
        'yourMetadataReleaseWriter':
          className: 'Your\Extra\MetadataWriter'

Writing Custom Data to the Content Release

In case you write custom data to the content release (using $redisKeyService->getRedisKeyForPostfix($contentReleaseIdentifier, 'foo')), you need to register the custom key also in the settings:

Flowpack:
  DecoupledContentStore:
    redisKeyPostfixesForEachRelease:
      foo:
        transfer: true

This is needed so that the system knows which keys should be synchronized between the different content stores, and what data to delete if a release is removed.

Rendering additional nodes with arguments (e.g. pagination or filters)

If you render a paginated list or have filters (with a predictable list of values) that can be added to a document via arguments, you can implement a slot for the nodeEnumerated signal to enumerate additional nodes with arguments.

Note: Request arguments must be mapped to URIs via custom routes, since we do not support HTTP query parameters for rendered documents.

Example

Add a slot for the nodeEnumerated signal via Package.php:

<?php
class Package extends BasePackage
{
    public function boot(Bootstrap $bootstrap)
    {
        $dispatcher = $bootstrap->getSignalSlotDispatcher();

        $dispatcher->connect(NodeEnumerator::class, 'nodeEnumerated', MyNodeListsEnumerator::class, 'enumerateNodeLists');
    }
}

Implement the slot and enumerate additional nodes depending on the node type:

<?php
class NodeListsEnumerator
{
    public function enumerateNodeLists(EnumeratedNode $enumeratedNode, ContentReleaseIdentifier $releaseIdentifier, ContentReleaseLogger $logger)
    {
        $nodeTypeName = $enumeratedNode->getNodeTypeName();
        $nodeType = $this->nodeTypeManager->getNodeType($nodeTypeName);
        if ($nodeType->isOfType('Vendor.Site:Document.Blog.Folder')) {
            // Get the node and count the number of pages to render
            // $pageCount = ...

            $pageCount = ceil($postCount / (float)$this->perPage);
            if ($pageCount <= 1) {
                return;
            }

            // Start after the first page, because the first page will be the document without arguments
            for ($page = 2; $page <= $pageCount; $page++) {
                $enumeratedNodes[] = EnumeratedNode::fromNode($documentNode, ['page' => $page]);
            }

            $this->redisEnumerationRepository->addDocumentNodesToEnumeration($releaseIdentifier, ...$enumeratedNodes);
        }
    }
}

The actual logic will depend on your use of the node. Having the actual filtering logic implemented in PHP is beneficial, because it allows you to use it in the rendering process as well as in the additional enumeration.

Extending the backend module

  • You need a Views.yaml in your package, looking like this:
-
  requestFilter: 'isPackage("Flowpack.DecoupledContentStore")'
  viewObjectName: 'Neos\Fusion\View\FusionView'
  options:
    fusionPathPatterns:
      - 'resource://Flowpack.DecoupledContentStore/Private/BackendFusion'
      - 'resource://Vendor.Site/Private/DecoupledContentStoreFusion'
  • Ensure that your package depends on flowpack/decoupledcontentstore in composer.json (so that your Views.yaml "wins" because the DecoupledContentStore-Package comes with its own Views.yaml)
  • Add a Root.fusion in Vendor.Site/Resources/Private/DecoupledContentStoreFusion which can contain your modifications
  • We currently support the following adjustments:
    • Adding a button to the footer
      prototype(Flowpack.DecoupledContentStore:ListFooter) {
          test = '<span class="align-middle inline-block text-sm pr-4 pl-16">TEST</span>'
          test.@position = 'before reload'
      }
      
    • Adding a flash message
      // ActionController
      $this->addFlashMessage('sth important you have to say');
      

Using different sets of config

In some cases it might be necessary to make fundamental adjustments to some configuration properties that would be really hard to handle (safely, non-breaking) on the consuming site of the content store. Therefore we added the config property configEpoch that can contain a current and previous config version. The current value (that should be used on the consuming site) gets published to the content store.

We decided to save the configEpoch on content store level instead of content release level for simplicity reasons on the consuming site. If you need to switch back to an older release that was rendered with the previous config epoch version and would not match the currently published one, you may manually toggle between current and previous config epoch. There is a button for this in the backend module for each target content store. Obviously this button should be used with extra care as the config epoch needs to fit the current release at all times.

Example:

  • We need to make a bigger change to the contentDimensions config, let's say we need to add uriPrefixes that weren't there before. We adjust the config accordingly and in the same deployment we configure the config epoch as follows:

    Flowpack:
      DecoupledContentStore:
        configEpoch:
          current: '2'
          previous: '1'
  • Now on the consuming site we can take action to handle both the old and new config and decide based on the value in redis which case is executed.

    $configEpoch = (int) $redisClient->get('contentStore:configEpoch');
    $contentStoreUrl = 'https://www.vendor.de/' . ($configEpoch > 1 ? 'de-de/' : '');

Development

  • You need pnpm as package panager installed: curl -f https://get.pnpm.io/v6.js | node - add --global pnpm
  • Run pnpm install in this folder
  • Then run pnpm watch for development and pnpm build for prod build.

We use esbuild combined with tailwind.css for building.

Rendering Deep Dive

TODO write

CacheUrlMappingAspect - * NOTE: This aspect is NOT active during interactive page rendering; but only when a content release is built

  • through Batch Rendering (so when {@see DocumentRenderer} has invoked the rendering. This is to keep complexity lower
  • and code paths simpler: The system NEVER re-uses content cache entries created by editors while browsing the page; but
  • ONLY re-uses content cache entries created by previous Batch Renderings.

Debugging

If you need to debug single steps of the pipeline just run the corresponding commands from CLI, e.g. ./flow nodeEnumeration:enumerateAllNodes {{ .contentReleaseId }}.

Testing the Rendering

For executing behavioral tests, install the neos/behat package and run ./flow behat:setup. Then:

cd Packages/Application/Flowpack.DecoupledContentStore/Tests/Behavior
../../../../bin/behat -c behat.yml.dist

Behat also supports running single tests or single files - they need to be specified after the config file, e.g.

# run all scenarios in a given folder
../../../../bin/behat -c behat.yml.dist Features/ContentStore/

# run all scenarios in the single feature file
../../../../bin/behat -c behat.yml.dist Features/ContentStore/Basics.feature

# run the scenario starting at line 66
../../../../bin/behat -c behat.yml.dist Features/ContentStore/Basics.feature:66

In case of exceptions, it might be helpful to run the tests with --stop-on-failure, which stops the test cases at the first error. Then, you can inspect the testing database and manually reproduce the bug.

Additionally, -vvv is a helpful CLI flag (extra-verbose) - this displays the full exception stack trace in case of errors.

TODO

  • clean up of old content releases
    • in Content Store / Redis
  • generate the old content format
  • (SK) error handling tests
  • force-switch possibility
  • (AM) UI
  • check for TODOs :)

Missing Features from old

data-url-next-page (or so) not supported

License

GPL v3