README

Table of contents:

About
- DATA
- KEY
- VALUE
- SCHEMA
Installation
Configuration
Querying
- Querying Entries
- Querying Schemas
Maintenance
- By Cron Job
- By Console Command
Advanced Guide
Usage Examples

About

Lucinda DB is serverless KEY-VALUE store originally designed to help developers cache results of resource-intensive SQL queries based to criteria (aka TAG) query depends on. It is different from other KV stores by having KEYs self-generate based on TAGs query result depended on and VALUEs saved as individual JSON files (instead of RAM) named by KEY in one/more SCHEMAs. This brings following advantages:

ability of working without a server: operating system on host machine, already optimized to manipulate files, becomes the server
ability of being platform agnostic: a database specification that can be implemented in any programming language on any operating system
key standardization: value of a KEY is generated according to a predictable rule based on value of TAGs it depends on
no entry duplication: combination of TAGs will always be unique
easy maintenance: entries can be queried by TAG, something impossible in standard KV stores that purely rely on RAM hash tables indexed by cumulative key
portability: to transfer/backup database, it is as easy as copying schema folder(s)
scalability: ability of database to be distributed on multiple disks in real time

API relies on an interplay of following concepts:

DATA: this is value of data to cache (eg: result of your SQL query or query combinations)
TAG: criteria based on whome DATA was generated (eg: "users", "roles")
KEY: the key in KV store, whose name was generated automatically based on TAGs it depends on (eg: "roles_users")
VALUE: the value in KV store present as json-ed DATA saved on disk as a separate file in SCHEMA folder named by KEY
SCHEMA: folders/disks in which KV entries are stored

Data

This is result to cache as VALUE in KV store, convertible to JSON.

Example query:

SELECT t1.name AS user, t3.name AS role
FROM users AS t1
INNER JOIN users__roles AS t2 ON t1.user_id = t2.user_id
INNER JOIN roles AS t3 ON t2.role_id = t3.id
WHERE t1.active = 1

Processed into following PHP array structure:

$data = ["John Doe"=>["Administrator"], "Jane Doe"=>["Assistant Manager", "Team Leader"]];

Key

A key is unique identifier of DATA in KV store named by combination of TAGs data depended on (eg: "roles_users" above) . To make things easier for maintenance, each finite combination of TAGs results into a single KEY, regardless of how they were ordered by caller! The rules based on whom key name is calculated are:

checks if TAGs obey above specifications
sorts TAGs alphabetically
joins all TAGs using _ sign

Key creation is encapsulated by Lucinda\DB\Key class. Usage example:

$object = new Lucinda\DB\Key(["users", "roles"]);
$key = $object->getValue(); // key will be "roles_users"

Value

A value is a JSON-ed representation of DATA saved as .json file named by KEY inside SCHEMA folder according to following rules:

DATA must be json encodable

Value operations are encapsulated by Lucinda\DB\Value class. Usage example:

$key = new Lucinda\DB\Key(["users", "roles"]); // initializes KEY
$value = new Lucinda\DB\Value("/usr/local/share/db", $key->getValue()); // instances VALUE by KEY and SCHEMA
$value->set($data); // saves DATA by KEY, creating a KEY.json file within SCHEMA

Load Balancing

For performance, consistency or scalability reasons, users can opt for VALUEs to be load balanced across multiple SCHEMAs. This is done by using Lucinda\DB\ValueDriver instead, which wraps Lucinda\DB\Value in order to insure that:

all writes (set/delete) are evenly distributed among replicas
all reads are done from random replica
all race condition operations (increment/decrement) will be done in first replica, then distributed to all others

Usage example:

$key = new Lucinda\DB\ValueDriver(["schema1", "schema2"], ["users", "roles"]); //instances VALUE by KEY and SCHEMAs
$value->set($data); // saves DATA by KEY, creating a KEY.json file within all SCHEMAs

Schema

A schema is simply the folder where VALUEs are saved. A schema can be a single folder or an array of replicas that can be located on different disks from same server or even different servers (eg: via symlinks). Class Lucinda\DB\Schema encapsulates operations one can perform on a single schema. Usage example:

$key = new Lucinda\DB\Schema("schema1"); // initializes SCHEMA
$value->deleteAll(); // deletes all VALUEs in SCHEMA

Load Balancing

For performance, consistency or scalability reasons, users can opt for SCHEMAs to be load balanced and evenly distributed. This is done via Lucinda\DB\SchemaDriver class that insures all schema operations are automatically reflected in replicas. Usage example:

$key = new Lucinda\DB\SchemaDriver(["schema1", "schema2"]); // initializes SCHEMAs
$value->deleteAll(); // deletes all VALUEs in all SCHEMAs

Installation

First choose a folder, then write this command there using console:

composer require lucinda/db

Then create at least one SCHEMA and follow configuration guide in order to set it/them in XML required to configure API. Once latter is done, you will be able to query database using Lucinda\DB\Wrapper object. Usage example:

require 'vendor/autoload.php';

$object = new Lucinda\DB\Wrapper(XML_CONFIGURATION_PATH, DEVELOPMENT_ENVIRONMENT);
$value = $wrapper->getEntryDriver(["users", "roles"]);
$value->set(["John Doe"=>["Administrator"], "Jane Doe"=>["Assistant Manager", "Team Leader"]]);

API uses composer autoload, requires PHP 8.1+ and has no external dependencies except SimpleXML, DOM and SPL extensions. All code inside is 100% unit tested and developed on simplicity and elegance principles!

Configuration

To configure this API you must have a XML with a lucinda_db tag inside:

<lucinda_db>
	<{ENVIRONMENT}>
		<schemas>
      <schema>{SCHEMA}</schema>
      ...
    </schemas>
	</{ENVIRONMENT}>
	...
</lucinda_db>

Where:

lucinda_db: (mandatory) holds database configuration
- {ENVIRONMENT}: name of development environment (to be replaced with "local", "dev", "live", etc)
  - schemas: (mandatory) stores list of schemas to be load balanced, each as schema tag
    - schema: (mandatory) holds path to a single schema that takes part of load balancing scheme

Example:

<lucinda_db>
    <local>
        <schemas>
          <schema>C:\db</schema>
        </schemas>
    </local>
    <live>
        <schemas>
          <schema>/usr/local/share/db</schema>
          <schema>/mnt/remote/db</schema>
        </schemas>
    </live>
</lucinda_db>

Querying

Now that XML is configured, you can query entries or schemas via Lucinda\DB\Wrapper class and its methods available:

Method	Arguments	Returns	Description
__construct	string $xmlFilePath, string $developmentEnvironment	void	Sets location of configuration file along with development environment for later querying
getEntryDriver	string[] $tags	Lucinda\DB\ValueDriver	Gets object to query load-balanced entry list of tags key depends on
getSchemaDriver	void	Lucinda\DB\SchemaDriver	Gets object to query distributed schemas

Querying Entries

Usage example employing Lucinda\DB\ValueDriver:

$object = new Lucinda\DB\Wrapper("/var/www/html/myapp/configuration.xml", "local");
$driver = $wrapper->getEntryDriver(["users", "roles"]);
$driver->set(["John Doe"=>["Administrator"], "Jane Doe"=>["Assistant Manager", "Team Leader"]]);

Using Lucinda\DB\Value directly is useful only if your app requires no load balancing, lies in one development environment only and needs only primitive maintenance. Usage example:

$key = new Lucinda\DB\Key(["users", "roles"]);
$value = new Lucinda\DB\Value("/usr/local/share/db", $key->getValue());
$value->set(["John Doe"=>["Administrator"], "Jane Doe"=>["Assistant Manager", "Team Leader"]]);

Both Lucinda\DB\Value and its Lucinda\DB\ValueDriver wrapper abide to a single contract defining blueprints of VALUE operations through interface Lucinda\DB\ValueOperations, which comes with following prototype methods:

Method	Arguments	Returns	Description
set	mixed $value	void	Sets entry value
get	void	mixed	Gets entry value
exists	void	bool	Checks if entry exists
increment	int $step = 1	int	Increments existing entry value using locking. Throws Lucinda\DB\LockException on race condition! See: Handling Race Conditions
decrement	int $step = 1	int	Decrements existing entry value using locking. Throws Lucinda\DB\LockException on race condition! See: Handling Race Conditions
delete	void	void	Deletes entry

As one can see above, in case developers opt using Lucinda\DB\Value directly, class Lucinda\DB\Key needs to be instanced manually. It encapsulates creation of keys based on tags and defines following public methods:

Method	Arguments	Returns	Description
__construct	string[] tags	void	Compiles key based on TAGs. Throws Lucinda\DB\KeyException if TAGs break naming rules
getValue		string	Gets entry key compiled above.

Querying Schemas

Usage example employing Lucinda\DB\SchemaDriver:

$object = new Lucinda\DB\Wrapper("/var/www/html/myapp/configuration.xml", "local");
$driver = $wrapper->getSchemaDriver();
$driver->deleteAll();

Using Lucinda\DB\Schema directly is useful only if your app requires no load balancing, lies in one development environment only and needs only primitive maintenance. Usage example:

$object = new Lucinda\DB\Schema("/usr/local/share/db");
$object->deleteAll();

Both Lucinda\DB\Schema and Lucinda\DB\SchemaDriver abide to a single contract defining blueprints of SCHEMA operations through interface Lucinda\DB\SchemaOperations, which comes with following prototype methods:

Method	Arguments	Returns	Description
create	void	bool	Creates schema(s)
exists	void	bool	Checks if schema(s) exist
getAll	void	string[]	Gets all keys in schema(s)
getByTag	string $tag	string[]	Gets all keys in schema(s) matching tag
getCapacity	void	int	Gets number of entries in schema(s)
deleteAll	void	int	Deletes all entries in schema(s)
drop	void	bool	Drops all schema(s), deleting all entries in the process

Maintenance

Modern operating systems allow up to 4,294,967,295 files in one folder but you shouldn't go anywhere near that value! Just like MySQL running out of disk space, LucindaDB may additionally run out of VALUEs in SCHEMA, something that usually happens only when specialization is used at massive scale.

To fix such cases class Lucinda\DB\DatabaseMaintenance class was created, whose purpose is to do automated maintenance through following public methods:

Method	Arguments	Returns	Description
__construct	string $xmlFilePath, string $developmentEnvironment	void	Sets location of configuration file along with development environment for later querying
checkHealth	float $maximumWriteDuration	Lucinda\DB\SchemaStatus[string]	Performs health checks of all load balanced schemas and returns results as status by schema. See: Checking Schemas Health
plugIn	string $schema	void	Plugs in schema to load-balanced DB without down times See: Plugging Schema In
plugOut	string $schema	void	Plugs out schema from load-balanced DB without down times See: Plugging Schema Out
deleteByTag	string $tag	int	Deletes all DB entries whose key matches TAG See: Deleting Entries by Tag
deleteUntil	int $secondsBeforeNow	int	Deletes all DB entries whose last modified time is more than #seconds old See: Deleting Entries by Time
deleteByCapacity	int $minCapacity, int $maxCapacity	int	Deletes all DB entries by keeping schema at fixed max capacity range based on entry last modified time. See: Deleting Entries by Capacity

By Cron Job

This class should be used via a cron job whose periodicity depends on the chance of your project to get filled! Example:

require 'vendor/autoload.php';

$maintenance = new Lucinda\DB\DatabaseMaintenance(XML_CONFIGURATION_PATH, DEVELOPMENT_ENVIRONMENT);
// checks schema health and plugs out unresponsive ones
$statuses  = $maintenance->checkHealth();
foreach ($statuses as $schema=>$status) {
  if (in_array($status, [DatabaseMaintenance::STATUS_OFFLINE, DatabaseMaintenance::STATUS_UNRESPONSIVE])) {
    $maintenance->plugOut($schema);
  }
}
// performs delete of all entries older than one day
$maintenance->deleteUntil(time()-(24*3600));

By Console Command

If maintenance involves just one operation it can be done without programming by calling client.php file bundled in API root. Console syntax:

php PATH_TO_CLIENT_PHP OPERATION ARGUMENTS

Where:

PATH_TO_CLIENT_PHP: absolute location of client.php file (example: /var/www/html/mysite/vendor/lucinda/db/client.php)
OPERATION: name of Lucinda\DB\DatabaseMaintenance method (example: deleteUntil)
ARGUMENTS: arguments to call method above, separated by space (example: 3600)

The greatest advantage of this solution is that it allows non-programmers (devops) to perform maintenance from command line. Example:

php /var/www/html/mysite/vendor/lucinda/db/client.php plugIn /my/new/schema

Advanced Guide

Specializing Keys

Sometimes, different VALUEs need to be produced for same TAG combination. This requires us to have different keys, while abiding to principles described in KEY section at the same time. Solution is to add an extra specializer tag (eg: MD5 checksum of query) when creating key:

$object = new Lucinda\DB\Key(["users", "roles", md5($query)]);
$key = $object->getValue(); // key will be "54ed347f362bb056e4d6db0477bf19c9_roles_users"

As a rule, specialization should be avoided as much as possible, since it enlarges database and has a duplication potential (for example a simple extra space in query above would generate another key)!

Handling Race Conditions

A complicated problem in all databases is managing race conditions. What happens when a increment or decrement operation is ran in parallel? Let's imagine DATA was 8 and increment is attempted at same moment Z by users X and Y:

# user X increments value at moment Z
$value->increment(1);
# user Y increments value at same moment Z
$value->increment(1);

Will end result be 10, as expected? The answer is no, because both processes got 8 to increment at same time! This situation is called a "race condition" and the only solution is to stack writes on that entry instead of letting them run in parallel.

For increment/decrement VALUE operations respective file is locked for writes and unlocked only when value update completes. If a concurrent process tries to write to a still locked file, a Lucinda\DB\LockException is thrown. Instead of letting exception bubble, developers can catch it and retry after delay:

try {
  $value->increment(1);
} catch(Lucinda\DB\LockException $e) {
  usleep(100);
  $value->increment(1);
}

Checking Schemas Health

Method checkHealth @ class Lucinda\DB\DatabaseMaintenance is to be used in checking the state of each replica SCHEMA and produce a Lucinda\DB\SchemaStatus. The algorithm by which statuses are produced is:

if folder (schema) doesn't exist, status is OFFLINE
otherwise, if files can't be written to schema, status is UNRESPONSIVE
otherwise, if file writes take longer than maximumWriteDuration, status is OVERLOADED
otherwise, status is ONLINE

The end result of all checks will be returned as an array where key is schema and value is status found.

Plugging Schema In

Method plugIn @ class Lucinda\DB\DatabaseMaintenance is to be used for plugging a SCHEMA to replicas without database down times. The algorithm used is:

if SCHEMA doesn't exist or it is already plugged, a Lucinda\DB\ConfigurationException is thrown
copies all files from first replica to SCHEMA
plugs in SCHEMA into XML
copies all remaining files from first replica to SCHEMA that may have been inserted to former as initial copy process was running

Plugging Schema Out

Method plugOut @ class Lucinda\DB\DatabaseMaintenance is to be used for plugging out a SCHEMA to replicas without database down times. The algorithm used is:

if SCHEMA doesn't exist or it is not plugged, a Lucinda\DB\ConfigurationException is thrown
SCHEMA is removed from XML, which insures no further writes will occur
all files are deleted in SCHEMA

Deleting Entries by Tag

Method deleteByTag @ class Lucinda\DB\DatabaseMaintenance is to be used for removing all VALUEs whose KEY matches TAG from all replicas. The algorithm used is:

iterates entries in random replica SCHEMA
if entry KEY matches TAG
- for each SCHEMA replica
  - deletes entry (VALUE) by KEY above

Deleting Entries by Time

Method deleteUntil @ class Lucinda\DB\DatabaseMaintenance is to be used for removing all VALUEs whose last modified time is more than $secondsBeforeNow old. The algorithm used is:

iterates entries in random SCHEMA replica
if entry VALUE last modified time is more than $secondsBeforeNow old
- remembers entry KEY
- for each SCHEMA replica
  - deletes entry (VALUE) by KEY above

Deleting Entries by Capacity

Method deleteByCapacity @ class Lucinda\DB\DatabaseMaintenance is to be used to insure number of entries in database doesn't exceed $maxCapacity and, if it does, shrink it to $minCapacity by having older entries removed. The algorithm used is:

iterates entries in random SCHEMA replica
records entry KEY to a fixed capacity SplMaxHeap sorted by VALUE's last modified time
if heap reached $maxCapacity, pops head and deletes entry until $minCapacity is reached

Avoiding API Disadvantages

This API, being disk based, comes with its own disadvantages compared to standard RAM-based KV stores:

slightly reduced speed: hard drives will always be slower than RAM, but considering how fast SSDs are this won't be a problem unless your app has very high thoroughput
no expiration for entries: all entries inside, being separate files, persist forever unless specifically deleted. This can be solved by using Lucinda\DB\DatabaseMaintenance!
not suitable for temporary/volatile data: if your app expects database entries to be volatile (changing randomly) and specializing keys is required at massive scale, standard RAM-based KV stores are highly recommended
requires daemonized maintenance: in case volatility is expected, a program must periodically remove old files in order prevent disk(s) getting full. . This can be solved by using Lucinda\DB\DatabaseMaintenance!

It is always up for developers to decide which KV store model fits their application the best! More often than not you will need to employ multiple stores (eg: LucindaDB and Redis) to cover all usage cases (one for persistent query-able data, the other for volatile data).

Usage Examples

To see usage examples, these unit tests should be enough:

ValueDriverTest: testing Lucinda\DB\ValueDriver operations
SchemaDriverTest: testing Lucinda\DB\SchemaDriver operations
DatabaseMaintenanceTest: testing Lucinda\DB\DatabaseMaintenance operations

lucinda / db

Maintainers

Details

README

About

Data

Tag

Key

Value

Load Balancing

Schema

Load Balancing

Installation

Configuration

Querying

Querying Entries

Querying Schemas

Maintenance

By Cron Job

By Console Command

Advanced Guide

Specializing Keys

Handling Race Conditions

Checking Schemas Health

Plugging Schema In

Plugging Schema Out

Deleting Entries by Tag

Deleting Entries by Time

Deleting Entries by Capacity

Avoiding API Disadvantages

Usage Examples