phptek / silverstripe-exodus
Full automated content migration from any site into Silverstripe.
Installs: 30
Dependents: 0
Suggesters: 0
Security: 0
Stars: 3
Watchers: 3
Forks: 0
Open Issues: 10
Type:silverstripe-vendormodule
Requires
- brittainmedia/phpcrawl: ^0.9
- phptek/phpquery: dev-master
- silverstripe/cms: ^4
- silverstripe/reports: ^4
- symbiote/silverstripe-queuedjobs: ^4
Requires (Dev)
- phpunit/phpunit: ^9
Replaces
- phptek/staticsiteconnector: *
- silverstripe/staticsiteconnector: *
This package is auto-updated.
Last update: 2024-12-18 22:48:40 UTC
README
Introduction
Exodus is a content migration tool that follows the ETL standard (Extract Transform Load). It will consume content from virtually any website, regardless of its underlying CMS technology and import it as native content objects (SiteTree
, File
etc) into a Silverstripe instance.
Exodus crawls the source website's DOM and caches matching URLs to the local filesystem. It will then normalise page-URLs by stripping file-extensions, slashes and implementation-specific strings and run a site-scrape which imports content as native Silverstripe objects into your site-tree and assets hierarchy.
Please See the docs index.
How it works
Extract is analogous to the module's "Crawl" mode. Given a URL, the tool will crawl the target website and cache a collection of matching URLs to the local filesystem.
Transform is the process of normalising the URLs cached in crawl mode and which are unique to the source system (Drupal, Wordpress or Plone). This is automatic and occurs with the selection made in the main "URL Processing" selection. This may be trial and error until the crawl process completes.
Load is analogous to the module's "Import" mode and is where the hard work of tweaking your crawl settings pays off, allowing you to import the content located at each cached URL into your site-tree and assets store.
Please See the docs index.
Migration
Please See the migration docs.
Requirements
- PHP ^7||^8
Installation
composer require --dev phptek/silverstripe-exodus
You'll need to setup PHP to allow for long-running processes. Depending on the number of URLs in the target site which need to be crawled, and your configuration, you may be looking at upwards of 20-30m. Therefore configure the following, depending on your setup:
# Tell PHP itself to allow for long-running processes in php.ini
max_execution_time 72000
# Tell php-fpm to not stop reading after 20m in nginx.conf
fastcgi_read_timeout 72000;
# Tell php-fpm to increase the no. of available child process up from the default of 5 in www.conf
pm.max_children = 25
Please see the included Migration document which describes exactly how to configure the tool to perform a content migration.
Please also see the rest of the docs.
History
This module was originally written in 2012 by then Silverstripe Ltd CEO Sam Minnee and was known as the "Static Site Connector" module. It was used successfully on dozens of occassions to import content for new Silverstripe projects being built by the company at that time and was subsequently improved upon over the years by other Silverstripe employees.
Around 2015-2016 the module was archived by Sam and subsequently picked-up and improved by Russell Michell.
In 2022 Russell saw a need for the tool again for an upcoming gig and modified it once again to work with Silverstripe v4.
Contributers
In order of no. commits:
Credit also goes to Marcus Nyholt for the use of the External Content module on top of which Exodus itself is built. The module in its current state actually includes the nyeholt/silverstripe-external-content
package and bakes it in as a sub-directory rather than using Composer.
...it was just easier that way.
Support Me
If you like what you see, support me! I accept Bitcoin: