salaun/laravel-complex-upsert

A tool to efficiently upsert a database without having to relly on a primary id.

5.1.1 2023-05-03 13:52 UTC

README

The typical use case would be having to update a database with a text based file data tree (json or XML).
With such a situation the objects might not have a primary id, that is why we match the data with a combination of attribute to uniquely target them.

The main principles are:

  • Fail before causing permanente damage. If possible remove the damaging part and update the rest. Powered by Laravel Validation helper and references collections.
  • Multithread thanks to the Laravel Queue system.
  • Allows partial integration with data block hashing
  • Keep the same records with their ids. Not delete and recreate from scratch thanks to SQL upsert queries.
  • Reusing the Model relations netting for project cohesion and maintenance and avoid code duplicates.
  • Fewest database query possible by grouping the reference fetching (reference collections) and the upsert by Model type (upsert service).
  • Customizable, readable and easy to maintain. The purpose being to guide you to a structured and organized database update process while making it easy to actually setup a working state of this package implementation.

Installation

You can install the package via composer:

composer require salaun/laravel-complex-upsert

You can publish the config file with:

php artisan vendor:publish --provider="Salaun\ComplexUpsert\ComplexUpsertServiceProvider" --tag="laravel-complex-upsert-config"

This is the contents of the published config file:

return [
	'chunk_size' =>  env('COMPLEX_UPSERT_CHUNK_SIZE', 300),
];

Lexic

  • A relation: a child for a parent node.
  • Reference: a record already in the database. The last node of a data tree.
  • Upsert (up[date or in]sert): in the context of SQL represente the action of inserting records and if it already exists, updating it. Synonyme of integration and update in this package.
  • Reverse relations: in order to handle recursive upsert parents needs to be upserted first in order to obtain their id and set it as a foreign key on their children. Which means we need to know what was the relation between the parent and it's children when the time to set the foreign key comes. So we set the relation name parent_relation::child_relation in order not to confuse it with the parent native Laravel relations.

Requirements

Database

This package is made to work with a postgres database (thanks to Thomas Pettry package) But should also work with a MySql Database.

Relations

The relations used by the process should be defined from both sides and models must use the related traits.

Foreign relations

Nested models must define the required DELETE ON CASCADE.

Usage

Also it is recommended to follow the process flow provided by this package it is also possible to call the services methodes without using the Job "framework".

The tests included in this package doubles down as an implementation exemple.

Warning

Keys

As per the Laravel documentation you should be carefull with your migrations:

All databases systems except SQL Server require the columns in the second argument provided to the upsert method to have a "primary" or "unique" index.

You should therefore refer to the section creating index:

$table->unique(['account_id', 'created_at'], 'upsert_key');

Composite Unique Keys Caveat

When a composite unique key includes a nullable column and that the item value is NULL, the upsert operation will result in a duplicated record. This is due to the SQL standard that doesn't recognize this composite key as unique.

A UNIQUE index permits multiple NULL values for columns that can contain NULL. (MySQL documentation)

With postgres it is possible to use a UNIQUE INDEX WITH NULL DISTINCT to avoid this issue.

Only references can have left to right relations

Concidering the following situation products <= product_label => labels => entities. This case is not supported at this time. Labels will have to be setup as a reference but entities cannot be updated automatically.

Package technical details

Helpers

This packages contains helpers called services that are responsible for the processing:

  • The upsert service: is in charge of upserting the data (in a recursive manner if need be).
  • The upsert reference service: is in charge of retrieving already existing data from the database in order to complete the models to be inserted. Those data may then be used to match incomming data with records to be upserted.
  • The WhereInMulti() query builder macro. Performs a whereIn query based on multiple columns/values.
    SELECT * FROM `users` WHERE (`firstname`, `lastname`) IN (('micheal', 'bay'), ('steven', 'spielberg'));
    

Model configuration

Models that are to be upserted need to be configured so that the services can make use of them. This is done with the use of traits. Some configuration are static parameters to be set on the model to be upserted:

  • canFail (bool): define if this record can safelly be removed from the dataset.
  • upsertId (array): which columns are to be used in order to perform a database matching.
  • upsertColumns (array): in the case where there is already a record of the data in base, what columns should be updated. 3 options: "[]" nothing, "null" everything and "['firstname', 'lastname']" the columns firstname and lastname.
  • rules (array): the array to be used by the Laravel Validation helper when creating the Laravel Model.

Reference Collections

The reference collection is inspired by laravel-media-library in the way that it define the reference behaviour base on a declarative function. If an upsert model has references it will have to declare the collection in order to be able to fetch it and have it resolved. Such a collection contains:

  • relation (string): The relation used to set the foreign key on the upsert model.
  • referenceId (array): What are the columns used to fetch (and update) the reference.
  • query (Query Builder): The query needed to access the database content
  • syncColumns (array): If need be what are the reference values to be updated before fetching their ids.

The workflow

Beyond making the upsert easier there is also an implementation on how the processing should occur. Based on Laravel queue and batch system, we introduce a serie of steps to be performed in order to have the right data in the right place without risking to compromise the integrity of the database content.

Dispatchers

As the name implies their role is to dispatch fragments of upsert to the processors. It could be files from a ZIP afer downloading and extraction of id chunks in order to have processor jobs of relatively more the same size. For a given parent it will hydrate it's Batch with processor jobs to be performed asynchronously. There is also an option to perform an action once all the batche jobs were performed (like delete records that were not present in the new data).

Processors

Processors will actively perform the upsert. Progressing through a list of steps, it will filter out unusable or irrelevent data and format and complete the rest in a upsertable state. Logs are fired when relations would have otherwise been forgotten silently and stats allows us to keep tracks of where and how many records failed. Fatal failures are handled with the Queue system and could be logged and retried depending on the configuration.

Their are system steps:

  • beforeUpsert: prepare data before starting for the confirmedUpsert() step.
  • afterUpsert: save data for the confirmedUpsert() step once we made sure that the model was inserted without issue.

And user steps:

  • getData: fetch the raw data from the source of your choosing. May it be an API, a file like XML or JSON or a database export.
  • instanciateData: create basic PHP object matching the raw data structure. Only a data typing is performed.
  • confirmedUpsert: is where we decide if the data is worth being updated or if it can be forgotten.
  • transformData: allows us to make laravel Model out of the basic PHP objects of instanciateData. Anything from text formatting to structural changes may be performed. There is also the possibility to make use of Laravel validation helper thanks to makeValid() in order to insure that the data will fit in the database without issue.
  • resolveReferences (make use of the UpsertReferenceService): offers to update the reference already in the database in case some would be missing, fetch what is needed and apply the required changed to the model's foreign keys.
  • upsertModels (make use of the UpsertService): update or insert if missing (and even delete if some relations are missing from the new data and you want it to match) in order to update the database. If "reverse relations" have been set, the upsert will be performed in a recursive manner.

It is possible to add any number of custom steps. Any elements that do not pass a step must be set as null and filtered from the array. A model type of a relation can be set as "canFail()" and will be forgottent without cancelling the upsert of it's parent.

Optimization

  • Order the columns in the unique index wisely Source

Testing

composer p

Changelog

Please see CHANGELOG for more information on what has changed recently.

Credits

License

The MIT License (MIT). Please see License File for more information.