eftec/documentstoreone

A flat document store for PHP that allows multiples concurrencies.

1.10 2019-09-01 00:17 UTC

README

A document store for PHP that allows multiples concurrencies. It is a minimalist alternative to MongoDB or CouchDB without the overhead of installing a new service.

Build Status Packagist Total Downloads License Maintenance composer php php Doc

Key features

  • Single key based.
  • Fast. However, it's not an alternative to a relational database. It's optimized to store a moderated number documents instead of millions of rows.
  • Allows multiple concurrences by locking and unlocking a document. If the document is locked then, it retries until the document is unlocked or fails after a number of retries.
  • One single class with no dependencies.
  • Automatic unlock document locked (by default, every 2 minutes if the file was left locked).
  • It could use MapReduce See example

Test

In average, an SMB generates 100 invoices per month. So, let's say that an SMB generates 12000 invoices per decade.

Testing generating 12000 invoices with customer, details (around 1-5 lines per detail) and date on an i7/ssd/16gb/windows 64bits.

  • Store 12000 invoices 45.303 seconds (reserving a sequence range)
  • Store 12000 invoices 73.203 seconds (reading a sequence for every new invoice)
  • Store 12000 invoices 49.0286 seconds (reserving a sequence range and using igbinary)
  • Reading all invoices 60.2332 seconds. (only reading)
  • MapReduce all invoices per customers 64.0569 seconds.
  • MapReduce all invoices per customers 32.9869 seconds (igbinary)
  • Reading all invoices from a customer 0.3 seconds. (including render the result, see image)
  • Adding a new invoice without recalculating all the MapReduce 0.011 seconds.

mapreduce example

Concurrency test

A test with 100 concurrent test (write and read), 10 times.

Reads (ms) Reads Error
1 100 7471 100 0
2 100 7751 100 0
3 100 7490 100 0
4 100 7480 100 0
5 100 8199 100 0
6 100 7451 100 0
7 100 7476 100 0
8 100 7244 100 0
9 100 7573 100 0
10 100 7818 100 0

Usage

include "lib/DocumentStoreOne.php";
use eftec\DocumentStoreOne\DocumentStoreOne;
try {
    $flatcon = new DocumentStoreOne(dirname(__FILE__) . "/base", 'tmp');
} catch (Exception $e) {
    die("Unable to create document store. Please, check the folder");
}
$flatcon->insertOrUpdate("somekey1",json_encode(array("a1"=>'hello',"a2"=>'world'))); // or you could use serialize/igbinary_serialize
$doc=$flatcon->get("somekey1");
$listKeys=$flatcon->select();
$flatcon->delete("somekey1");

Commands

Constructor($baseFolder,$collection,$strategy=DocumentStoreOne::DSO_AUTO,$server="")

It creates the DocumentStoreOne instance. $baseFolder should be a folder, and $collection (a subfolder) is optional.

strategy type server benchmark
DSO_AUTO It sets the best available strategy (default) depends -
DSO_FOLDER It uses a folder for lock/unlock a document - 0.3247
DSO_APCU It uses APCU for lock/unlock a document - 0.1480
DSO_MEMCACHE It uses MEMCACHE for lock/unlock a document localhost:11211 0.1493
DSO_REDIS It uses REDIS for lock/unlock a document localhost:6379 2.5403 (worst)

Benchmark how much time (in seconds) it takes to add 100 inserts.

use eftec\DocumentStoreOne\DocumentStoreOne;
include "lib/DocumentStoreOne.php";
try {
    $flatcon = new DocumentStoreOne(dirname(__FILE__) . "/base", 'tmp');
} catch (Exception $e) {
    die("Unable to create document store.".$e->getMessage());
}
use eftec\DocumentStoreOne\DocumentStoreOne;
include "lib/DocumentStoreOne.php";
try {
    $flatcon = new DocumentStoreOne("/base", 'tmp',DocumentStoreOne::DSO_MEMCACHE,"localhost:11211");
} catch (Exception $e) {
    die("Unable to create document store.".$e->getMessage());
}

isCollection($collection)

Returns true if collection is valid (a subfolder).

$ok=$flatcon->isCollection('tmp');

collection($collection)

It sets the current collection

$flatcon->collection('newcollection'); // it sets a collection.

This command could be nested.

$flatcon->collection('newcollection')->select(); // it sets and return a query

Note, it doesn't validate if the collection is correct. You must use isCollection to verify if it's right.

autoSerialize($value=true,$strategy='php')

It sets if we want to auto serialize the information and we set how it is serialized

strategy type
php it serializes using serialize() function
php_array it serializes using include()/var_export()function. The result could be cached on OpCache because the result is a php file
json_object it is serialized using json (as object)
json_array it is serialized using json (as array)
none (default value) it is not serialized. Information must be serialized/de-serialized manually

createCollection($collection)

It creates a collection. It returns false if the operation fails; otherwise it returns true

$flatcon->createCollection('newcollection'); 

insertOrUpdate($id,$document,[$tries=-1])

inserts a new document (string) in the $id indicated. If the document exists, then it's updated.
$tries indicates the number of tries. The default value is -1 (default number of attempts).

$doc=json_encode(array("a1"=>'hello',"a2"=>'world')
$flatcon->insertOrUpdate("1",$doc));

If the document is locked then it retries until it is available or after an "nth" number of tries (by default it's 100 tries that equivales to 10 seconds)

It's fast than insert or update.

insert($id,$document,[$tries=-1])

Inserts a new document (string) in the $id indicated. If the document exists, then it returns false.
$tries indicates the number of tries. The default value is -1 (default number of attempts).

$doc=json_encode(array("a1"=>'hello',"a2"=>'world')
$flatcon->insert("1",$doc));

If the document is locked then it retries until it is available or after an "nth" number of tries (by default it's 100 tries that equivales to 10 seconds)

update($id,$document,[$tries=-1])

Update a document (string) in the $id indicated. If the document doesn't exist, then it returns false
$tries indicates the number of tries. The default value is -1 (default number of attempts).

$doc=json_encode(array("a1"=>'hello',"a2"=>'world')
$flatcon->update("1",$doc));

If the document is locked then it retries until it is available or after an "nth" number of tries (by default it's 100 tries that equivales to 10 seconds)

get($id,[$tries=-1])

It reads the document $id. If the document doesn't exist or it's unable to read it, then it returns false.
$tries indicates the number of tries. The default value is -1 (default number of attempts).

$doc=$flatcon->get("1");

If the document is locked then it retries until it is available or after an "nth" number of tries (by default it's 100 tries that equivales to 10 seconds)

public function appendValue($name,$addValue,$tries=-1)

It adds a value to a document with name $name. For example, for a log file.

a) If the value doesn't exists, then it's created with $addValue. Otherwise, it will return true
b) If the value exists, then $addValue is added and it'll return true
c) Otherwise,, it will return false

$seq=$flatcon->appendValue("log",date('c')." new log");

getNextSequence($name="seq",$tries=-1,$init=1,$interval=1,$reserveAdditional=0)

It reads or generates a new sequence.

a) If the sequence exists, then it's incremented by $interval and this value is returned.
b) If the sequence doesn't exist, then it's created with $init, and this value is returned. c) If the library is unable to create a sequence, unable to lock or the sequence exists but, it's unable to read, then it returns false

$seq=$flatcon->getNextSequence();

You could peek a sequence with $id=get('genseq_') however it's not recommended.

If the sequence is corrupt then it's reset to $init

If you need to reserve a list of sequences, you could use $reserveAdditional

$seq=$flatcon->getNextSequence("seq",-1,1,1,100); // if $seq=1, then it's reserved up to the 101. The next value will be 102.

getSequencePHP()

It returns an unique sequence (64bit integer) based on time, a random value and a serverId.

The chances of collision (a generation of the same value) is 1/4095 (per two operations executed every 0.0001 second).

$this->nodeId=1; // if it is not set then it uses a random value each time.
$unique=$flatcon->getSequencePHP(); 

ifExist($id,[$tries=-1])

It checks if the document $id exists. It returns true if the document exists. Otherwise, it returns false.
$tries indicates the number of tries. The default value is -1 (default number of tries).

The validation only happens if the document is fully unlocked.

$found=$flatcon->ifExist("1");

If the document is locked then it retries until it is available or after an "nth" number of tries (by default it's 100 tries that equivales to 10 seconds)

delete($id,[$tries=-1])

It deletes the document $id. If the document doesn't exist or it's unable to delete, then it returns false.
$tries indicates the number of tries. The default value is -1 (default number of tries).

$doc=$flatcon->delete("1");

If the document is locked then it retries until it is available or after an "nth" number of tries (by default it's 100 tries that equivales to 10 seconds)

select($mask="*")

It returns all the IDs stored on a collection.

$listKeys=$flatcon->select();
$listKeys=$flatcon->select("invoice_*");

It includes locked documents.

copy($idorigin,$iddestination,[$tries=-1])

Copy the document $idorigin in $iddestination

$bool=$flatcon->copy(20,30);

If the document destination exists then its replaced

rename($idorigin,$iddestination,[$tries=-1])

Rename the document $idorigin as $iddestination

$bool=$flatcon->rename(20,30);

If the document destination exists then the operation fails.

fixCast (util class)

It converts a stdclass to a specific class.

$inv=new Invoice();
DocumentStoreOne::fixCast($inv,$invTmp); //$invTmp is a stdClass();

It doesn't work with members that are array of objects. The array is kept as stdclass.

DocumentStoreOne Fields

The next fields are public and they could be changed during runtime

field Type
$database string root folder of the database
$collection string Current collection (subfolder) of the database
$maxLockTime=120 int Maximium duration of the lock (in seconds). By default it's 2 minutes
$defaultNumRetry=100 int Default number of retries. By default it tries 100x0.1sec=10 seconds
$intervalBetweenRetry=100000 int Interval (in microseconds) between retries. 100000 means 0.1 seconds
$docExt=".dson" string Default extension (with dot) of the document
$keyEncryption="" string Indicates if the key is encrypted or not when it's stored (the file name). Empty means, no encryption. You could use md5,sha1,sha256,..

Example:

$ds=new DocumentStoreOne();
$ds->maxLockTime=300;
$ds=new DocumentStoreOne();
$ds->insert('1','hello'); // it stores the document 1.dson
$ds->keyEncryption='SHA256';
$ds->insert('1','hello'); // it stores the document 6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.dson

MapReduce

It could be done manually. The system allows to store a pre-calculated value that could be easily accesses (instead of read all values).

Let's say the next exercise, we have a list of purchases

id customer age sex productpurchase amount
14 john 33 m 33 3
25 anna 22 f 32 1
productcode unitprice
32 23.3
33 30

John purchased 3 products with the code 33. The products 33 costs $23.3 per unit.

Question, how much every customer paid?.

It's a simple exercise, it's more suitable for a relational database (select * from purchases inner join products). However, if the document is long or complex to store in the database then it's here where a document store shines.

// 1) open the store
$ds=new DocumentStoreOne('base','purchases'); // we open the document store and selected the collection purchase.
// 2) reading all products
// if the list of products holds in memory then, we could store the whole list in a single document (listproducts key)
$products=$ds->collection('products')->get('listproducts');
// 3) we read the keys of every purchases. It could be slow and it should be a limited set (<100k rows)    
$purchases=$ds->collection('purchases')->select(); // they are keys such as 14,15...

$customerXPurchase=[];
// 4) We read every purchase. It is also slow.  Then we merge the result and obtained the final result
foreach($purchases as $k) {
    $purchase=unserialize( $ds->get($k));
    @$customerXPurchase[$purchase->customer]+=($purchase->amount * @$products[$purchase->productpurchase]); // we add the amount
}
// 5) Finally, we store the result.
$ds->collection('total')->insertOrUpdate(serialize($customerXPurchase),'customerXPurchase'); // we store the result.```
customer value
john 69.9
anna 30

Since it's done on code then it's possible to create an hybrid system (relational database+store+memory cache)

Limits

  • Keys should be of the type A-a,0-9. In windows, keys are not case sensitive.
  • The limit of documents that a collection could hold is based on the document system used. NTFS allows 2 millions of documents per collection.

Version list

  • 1.10 2019-08-30 Some cleaning. Added getSequencePHP() and field nodeId
  • 1.9 2019-02-10 Unlock now tries to unlock. Manuallock field is not used anymore.
  • 1.8 2018-02-03 field neverLock (for fast access a read only database) also phpunit
  • 1.7.3 2018-02-03 Updated composer.json
  • 1.7.1 2018-10-20 Removed a debug echo on lock()
  • 1.7 2018-10-20 Added key encryption (optional)
  • 1.6 2018-10-19
    • Reduced the default time from 30 seconds to 10 seconds because usually PHP is configured to a timeout of 30 seconds.
    • Method ifExist locks the resource and never releases. Now it releases as expected.
  • 1.5 2018-10-13 Maintenance update. Fixed the automatic strategy
  • 1.4 2018-08-26 function rename
  • 1.3 2018-08-15 Added strategy of lock.
  • 1.2 2018-08-12 Small fixes.
  • 1.1 2018-08-12 Changed schema with collection.
  • 1.0 2018-08-11 first version

Pending

  • Transactional (allows to commit or rollback a multiple step transaction). It's in evaluation.
  • Different strategy of lock (folder,memcache,redis and apcu)