marcovo / laravel-dag-model
An implementation of the Directed Acyclic Graph (DAG) for Laravel models
Requires
- php: >=7.4 <8.4-stable
- laravel/framework: ^6.20 || ^7 || ^8 || ^9 || ^10 || ^11
Requires (Dev)
- marcovo/php-junit-merge: ^1.0.11
- micheh/phpcs-gitlab: ^1.1
- orchestra/testbench: ^4.10 || ^5 || ^6 || ^7 || ^8 || ^9
- phpunit/phpunit: ^9.5.10 || ^10 || ^11
- squizlabs/php_codesniffer: ^3.6
README
This package provides model functionality for managing and querying directed acyclic graphs (DAGs) stored in an SQL database.
It can turn any model into a DAG by including one trait and creating one new pivot model.
Your model will then be provided with relations extending from the native BelongsToMany
many-to-many relationship, such that besides the DAG-specific functionality, also many of the native Laravel methods are available to you.
A directed acyclic graph is particularly suited for storing and manipulating hierarchical datastructures. As a special case, by including one additional trait, you can easily and efficiently turn your models into a tree.
Installation
This package is compatible with PHP 7.4 - 8.3 and Laravel 6.20 - 11. It supports MySQL (MariaDB), PostgreSQL and SQLite.
Installation via composer:
composer require marcovo/laravel-dag-model
Documentation
This page contains links that do not work on packagist. Open in Gitlab
Make sure you read up on the Important notes (specifically the Caution section) before starting to use this package.
Basic usage
Using Laravel's BelongsToMany
relationship even without this package, one could already store DAGs and manage to query them for local relationships such as children
and parents
.
On top of that, this package will provide your model with additional relations for querying distant relationships, such as ancestors and descendants.
Given a vertex model instance $vertex
or query builder $query
, this package provides functionality to e.g.:
$vertex->children // Obtain all children
$vertex->descendants()->get() // Obtain all descendants using a query
$vertex->children()->attach($child) // Attach a child
$vertex->children()->detach($child) // Detach a child
// Use the relations in a query
$query->whereHas('children', $callback)
$query->whereHas('ancestors', $callback)
// Use specialized DAG query methods
$query->whereAncestorOf($vertex)
One simple example:
// Store 3 vertices
$parent = MyVertexModel::create();
$child = MyVertexModel::create();
$grandchild = MyVertexModel::create();
// Store 2 edges (+1 implicit transitive closure edge
// which will be automatically added by the TC algorithm)
$parent->children()->attach($child);
$child->children()->attach($grandchild);
$ancestors = $grandchild->ancestors;
// $ancestors now is a collection containing $parent and $child,
// as new model instances received freshly from the database
For more examples and details, refer to the documentation pages listed above.
Trees
As a tree is a special case of the Directed Acyclic Graph, this package can be used to manipulate and query trees.
To this end, this package includes a trait IsForest
, which can be used to manage one or multiple trees (= forest).
The trait is provided with specialized manipulation queries tuned to forests, as well as additional methods specific to trees, such as ->parent()
and ->siblings()
.
For more information, refer to the Forest (trees) extension page.
Tree data structure: pivot table (DAG) vs. nested set
When implementing a tree data structure, one could follow one of these two main approaches:
- Using a nested set implementation, adding two integer columns
left
andright
on your model - Use a pivot table containing edges between nodes, just as this DAG package does
Both are sensible solutions, each having their pros and cons. The best approach will depend on your usecase.
Tree using a nested set
- (pro) Only requires storage of two extra columns per model/node, which are stored in the very same model
- (pro) In most cases, relation deductions (e.g.,
isDescendantOf()
) between two retrieved models can be made without querying the database: all required information is stored in the twoleft
andright
columns - (con) In complex queries involving inheritance relations, the inequality constraints used by the nested set are hard to optimize. In particular, databases typically can only optimize for up to one inequality constraint per subquery, which leaves the rest to be filtered by using a table scan
- (con) Mutations cause a global state change: adding, deleting or moving a node typically requires a large amount of nodes to be updated, requiring a write-lock on all these rows
Tree using a pivot table (DAG)
- (pro) Complex queries involving inheritance relations are based on well-performing foreign key relations, allowing for easy optimization
- (pro) No extra columns are needed on your model table
- (pro) Mutations are local operations: only rows in the pivot table directly linked to the added/deleted/moved node will be modified
- (con) Requires an additional database table
- (con) In most cases, relation deductions (e.g.,
isDescendantOf()
) between two retrieved nodes will require a query on the pivot table
Comparison with CTE approach
The approach used in this DAG model package is to store the entire transitive closure of the DAG in the database. Alternatively, one could also only store the DAG itself and build queries that are capable of determining and querying the required relations at runtime. This latter approach is the one taken by staudenmeir/laravel-adjacency-list, using Common Table Expressions (CTEs).
Note the following pros and cons of these approaches:
DAGs and trees using CTE
- (pro) No extra columns or storage required
- (pro) No performance penalty for mutations
- (con) Has to compute a CTE for every
SELECT
query involving distant relations
DAGs and trees using stored transitive closure
- (pro) As the transitive closure acts as a cache for distant relations, all information for querying distant relations is readily available and does not have to be computed on-the-fly
- (con) Requires an extra database column and stores extra rows
- (con) Each mutation requires the transitive closure to be updated
In summary: the CTE approach should be the best choice whenever your application heavily mutates the graph structure. The transitive closure approach used in this package is best suited to situations where mutations are sporadic, and the table is heavily read for distant relationships.
Versioning
This package follows Semantic versioning.
The backwards compatibility promise covers all public API as defined in the API documentation sections.
Any method not documented there (regardless of visibility) should be considered private API and hence explicit calls to them will not be covered by the backwards compatibility promise.
In other words, methods not listed in the API documentation are subject to change in non-major releases.
All public API methods are marked with @api
annotations, all internal methods with @internal
.
Just like in Laravel, method parameter names are not considered part of the backwards compatibility promise.
Contributing
Feature requests are welcome, but you are encouraged to also provide an implementation in the form of a merge request. Only merge requests with 100% test coverage will be merged. This package follows the PSR-12 coding standard.
Test your contribution using:
gitlab-runner exec docker test:7.4
gitlab-runner exec docker test:8.1
gitlab-runner exec docker test:8.2
gitlab-runner exec docker test:8.3
gitlab-runner exec docker phpcs
Related packages
Some related packages:
- kalnoy/nestedset An implementation of the nested set model, an efficient way to store and query trees
- telkins/laravel-dag-manager A different implementation of the same concept of DAGs. As mentioned in their readme, it does suffer from bad asymptotic behaviour in large graphs, forcing them to impose a
max_hops
parameter. ThePathCountAlgorithm
included here can be seen as a modified version of their approach.- The
laravel-dag-manager
package is based on the article A Model to Represent Directed Acyclic Graphs (DAG) on SQL Databases. While that algorithm is not present in our package, the article is an interesting read and a good explanation of the basic concepts.
- The
- staudenmeir/laravel-adjacency-list An implementation for storing graphs (including DAGs and trees) using Common Table Expressions. (Refer to the comparison provided in this readme.)
Parts of this package have been inspired by these two packages.
References
- Maintaining Transitive Closure of Graphs in SQL by Dong, Libkin, Su & Wong
- Directed Acyclic Graph on Wikipedia
- Transitive Closure on Wikipedia