datashaman / elasticsearch-model
Laravel/Eloquent integration with Elasticsearch
Installs: 129 119
Dependents: 0
Suggesters: 0
Security: 0
Stars: 6
Watchers: 3
Forks: 2
Open Issues: 0
Requires
- php: >=7.0
- elasticsearch/elasticsearch: >=2
- illuminate/support: >=5.4
Requires (Dev)
- ext-sqlite3: *
- mockery/mockery: *
- orchestra/testbench: >=3.4
- phpunit/phpunit: *
- symfony/yaml: *
Suggests
- symfony/yaml: To load settings from a YAML file
- dev-master
- 2.1.0
- 2.0.0
- 1.3.2
- 1.3.1
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.4
- 1.0.3
- 1.0.2
- 1.0.1
- 1.0.0
- 0.2.5
- 0.2.4
- 0.2.3
- 0.2.2
- 0.2.1
- 0.2.0
- 0.1.1
- 0.1.0
- dev-helper-methods
- dev-property
- dev-hotfix/simplify-driver-manager
- dev-save
- dev-feature/client-factory
- dev-feature/msearch
- dev-feature/multi-version
- dev-feature/elasticsearch-5
This package is auto-updated.
Last update: 2024-10-08 13:38:16 UTC
README
Laravel-oriented implementation of elasticsearch-model.
Supports Laravel 5.4 and higher on PHP 7.0/7.1. PHP 7.2 will be supported and tested once Travis directly supports it.
Note that at least PHP 7.1 is required for Laravel 5.6+.
NB This is currently BETA quality software. Use on production at your own risk, and be aware that there might some further simplifications of the API in the next version or two.
The idea is to stay fairly faithful to the Ruby on Rails implementation, but housed in Laravel.
Installation
Install the package using composer:
composer require datashaman/elasticsearch-model
Configure the service provider in config/app.php
:
...
Datashaman\Elasticsearch\Model\ServiceProvider::class,
...
Configure the alias in config.app.php
:
...
'Elasticsearch' => Datashaman\Elasticsearch\Model\ElasticsearchFacade::class,
...
Copy base config into your applicatin:
php artisan vendor:publish --tag=config --provider='Datashaman\Elasticsearch\Model\ServiceProvider'
Edit config/elasticsearch.php
to your liking, setting ELASTICSEARCH_HOSTS
(comma-delimited definition of host:port) in .env
should cover most use cases.
Usage
Let's suppose you have an Article
model:
Schema::create('articles', function (Blueprint $table) { $table->increments('id'); $table->string('title'); }); class Article extends Eloquent { } Article::create([ 'title' => 'Quick brown fox' ]); Article::create([ 'title' => 'Fast black dogs' ]); Article::create([ 'title' => 'Swift green frogs' ]);
Setup
To add the Elasticsearch integration for this model, use the Datashaman\Elasticsearch\Model\ElasticsearchModel
trait in your class. You must also add a protected static $elasticsearch
property for storage:
use Datashaman\Elasticsearch\Model\ElasticsearchModel; class Article extends Eloquent { use ElasticsearchModel; protected static $elasticsearch; }
This will extend the model with functionality related to Elasticsearch.
Proxy
The package contains a big amount of class and instance methods to provide all this functionality.
To prevent polluting your model namespace, nearly all functionality is accessed via static method Article::elasticsearch()
.
Elasticsearch client
The module will setup a client, connected to localhost:9200
, by default. You can access and use it like any other Elasticsearch::Client
:
Article::elasticsearch()->client()->cluster()->health(); => [ "cluster_name" => "elasticsearch", "status" => "yellow", ... ]
To use a client with a different configuration, set a client for the model using Elasticsearch\ClientBuilder
:
Article::elasticsearch()->client(ClientBuilder::fromConfig([ 'hosts' => [ 'api.server.org' ] ]));
Importing the data
The first thing you'll want to do is import your data to the index:
Article::elasticsearch()->import([ 'force' => true ]);
It's possible to import only records from a specific scope or query, transform the batch with the transform and preprocess options, or re-create the index by deleting it and creating it with correct mapping with the force option -- look for examples in the method documentation.
No errors were reported during importing, so... let's search the index!
Searching
For starters, we can try the simple type of search:
$response = Article::search('fox dogs'); $response->took(); => 3 $response->total(); => 2 $response[0]->_score; => 0.02250402 $response[0]->title; => "Fast black dogs"
Search results
The returned response
object is a rich wrapper around the JSON returned from Elasticsearch, providing access to response metadata and the actual results (hits).
The response
object delegates to an internal LengthAwarePaginator
. You can get a Collection
via the delegate getCollection
method, althought the paginator also delegates mmethods to its Collection
so either of these work:
$response->results() ->map(function ($r) { return $r->title; }) ->all(); => ["Fast black dogs", "Quick brown fox"] $response->getCollection() ->map(function ($r) { return $r->title; }) ->all(); => ["Fast black dogs", "Quick brown fox"] $response->filter(function ($r) { return preg_match('/^Q/', $r->title); }) ->map(function ($r) { return $r->title; }) ->all(); => ["Quick brown fox"]
As you can see in the examples above, use the Collection::all()
method to get a regular array.
Each Elasticsearch hit is wrapped in the Result
class.
Result
has a dynamic getter:
- index, type, id, score and source are pulled from the top-level of the hit. e.g. index is hit[_index], type is hit[_type], etc
- if not one of the above, it looks for an existing item in the top-level hit. e.g. _version is hit[_version] (if defined)
- if not one of the above, it looks for an existing item in hit[_source] (the document). e.g. title is hit[_source][title] (if defined)
- if nothing resolves from above, it triggers a notice and returns null
It also has a toArray
method which returns the hit as an array.
Search results as database records
Instead of returning documents from Elasticsearch, the records method will return a collection of model instances, fetched from the primary database, ordered by score:
$response->records() ->map(function ($article) { return $article->title; }) ->all(); => ["Fast black dogs", "Quick brown fox"]
The returned object is a Collection
of model instances returned by your database, i.e. the Eloquent
instance.
The records method returns the real instances of your model, which is useful when you want to access your model methods - at the expense of slowing down your application, of course.
In most cases, working with results coming from Elasticsearch is sufficient, and much faster.
When you want to access both the database records
and search results
, use the eachWithHit
(or mapWithHit
) iterator:
$lines = []; $response->records()->eachWithHit(function ($record, $hit) { $lines[] = "* {$record->title}: {$hit->_score}"; }); $lines; => [ "* Fast black dogs: 0.01125201", "* Quick brown fox: 0.01125201" ] $lines = $response->records()->mapWithHit(function ($record, $hit) { return "* {$record->title}: {$hit->_score}"; })->all(); $lines; => [ "* Fast black dogs: 0.01125201", "* Quick brown fox: 0.01125201" ]
Note the use Collection::all()
to convert to a regular array in the mapWithHit
example. Collection
methods prefer to return Collection
instances instead of regular arrays.
The first argument to records
is an options
array, the second argument is a callback which is passed the query builder to modify it on-the-fly. For example, to re-order the records differently to the results (from above):
$response ->records([], function ($query) { $query->orderBy('title', 'desc'); }) ->map(function ($article) { return $article->title; }) ->all(); => [ 'Quick brown fox', 'Fast black dogs' ]
Notice that adding an orderBy
call to the query overrides the ordering of the records, so that it is no longer the same as the results.
Searching multiple models
TODO Implement a Facade for cross-model searching.
Pagination
You can implement pagination with the from
and size
search parameters. However, search results can be automatically paginated much like Laravel does.
# Delegates to the results on page 2 with 20 per page $response->perPage(20)->page(2); # Records on page 2 with 20 per page; records ordered the same as results # Order of the `page` and `perPage` calls doesn't matter $response->page(2)->perPage(20)->records(); # Results on page 2 with (default) 15 results per page $response->page(2)->results(); # Records on (default) page 1 with 10 records per page $response->perPage(10)->records();
You have access to a length-aware paginator (the response delegates internally to the results()
call, so you don't need to call results() on the chain):
$response->page(2)->results(); => object(Illuminate\Pagination\LengthAwarePaginator) ... $results = response->page(2); $results->setPath('/articles'); $results->render(); => <ul class="pagination"> <li><a href="/articles?page=1" rel="prev">«</a></li> <li><a href="/articles?page=1">1</a></li> <li class="active"><span>2</span></li> <li><a href="/articles?page=3">3</a></li> <li><a href="/articles?page=3" rel="next">»</a></li> </ul>
The rendered HTML was tidied up slightly for readability.
The Elasticsearch DSL
TODO Integrate this with a query builder.
Index Configuration
For proper search engine function, it's often necessary to configure the index properly. This package provides class methods to set up index settings and mappings.
Article::settings(['index' => ['number_of_shards' => 1]], function ($s) { $s['index'] = array_merge($s['index'], [ 'number_of_replicas' => 4, ]); }); Article::settings->toArray(); => [ 'index' => [ 'number_of_shards' => 1, 'number_of_replicas' => 4 ] ] Article::mappings(['dynamic' => false], function ($m) { $m->indexes('title', [ 'analyzer' => 'english', 'index_options' => 'offsets' ]); }); Article::mappings()->toArray(); => [ "article" => [ "dynamic" => false, "properties" => [ "title" => [ "analyzer" => "english", "index_options" => "offsets", "type" => "string", ] ] ]]
You can use the defined settings and mappings to create an index with desired configuration:
Article::elasticsearch()->client()->indices()->delete(['index' => Article::indexName()]); Article::elasticsearch()->client()->indices()->create([ 'index' => Article::indexName(), 'body' => [ 'settings' => Article::settings()->toArray(), 'mappings' => Article::mappings()->toArray(), ], ]);
There's a shortcut available for this common operation (convenient e.g. in tests):
Article::elasticsearch()->createIndex(['force' => true]); Article::elasticsearch()->refreshIndex();
By default, index name and document type will be inferred from your class name, you can set it explicitely, however:
class Article { protected static $indexName = 'article-production'; protected static $documentType = 'post'; }
Alternately, you can set them using the following static methods:
Article::indexName('article-production'); Article::documentType('post');
Updating the Documents in the Index
Usually, we need to update the Elasticsearch index when records in the database are created, updated or deleted; use the index_document, update_document and delete_document methods, respectively:
Article::first()->indexDocument(); => [ 'ok' => true, ... "_version" => 2 ] Note that this implementation differs from the Ruby one, where the instance has an elasticsearch() method and proxy object. In this package, the instance methods are added directly to the model. Implementing the same pattern in PHP is not easy to do cleanly. ### Automatic callbacks You can auomatically update the index whenever the record changes, by using the `Datashaman\\Elasticsearch\\Model\\Callbacks` trait in your model: ```php use Datashaman\Elasticsearch\Model\ElasticsearchModel; use Datashaman\Elasticsearch\Model\Callbacks; class Article { use ElasticsearchModel; use Callbacks; } Article::first()->update([ 'title' => 'Updated!' ]); Article::search('*')->map(function ($r) { return $r->title; }); => [ 'Updated!', 'Fast black dogs', 'Swift green Frogs' ]
The automatic callback on record update keeps track of changes in your model (via Laravel's getDirty
implementation), and performs a partial update when this support is available.
The automatic callbacks are implemented in database adapters coming with this package. You can easily implement your own adapter: please see the relevant chapter below.
Custom Callbacks
In case you would need more control of the indexing process, you can implement these callbacks yourself, by hooking into created
, saved
, updated
or deleted
events:
Article::saved(function ($article) { $result = $article->indexDocument(); Log::debug("Saved document", compact('result')); }); Article::deleted(function ($article) { $result = $article->deleteDocument(); Log::debug("Deleted document", compact('result')); });
Regrettably there are no committed
events in Eloquent
like in Ruby's ActiveRecord
.
Asychronous Callbacks
Of course, you're still performing an HTTP request during your database transaction, which is not optimal for large-scale applications. A better option would be to process the index operations in background, with Laravel's Queue
facade:
Article::saved(function ($article) { Queue::pushOn('default', new Indexer('index', Article::class, $article->id)); });
An example implementation of the Indexer
class could look like this (source included in package):
class Indexer implements SelfHandling, ShouldQueue { use InteractsWithQueue, SerializesModels; public function __construct($operation, $class, $id) { $this->operation = $operation; $this->class = $class; $this->id = $id; } public function handle() { $class = $this->class; switch ($this->operation) { case 'index': $record = $class::find($this->id); $class::elasticsearch()->client()->index([ 'index' => $class::indexName(), 'type' => $class::documentType(), 'id' => $record->id, 'body' => $record->toIndexedArray(), ]); $record->indexDocument(); break; case 'delete': $class::elasticsearch()->client()->delete([ 'index' => $class::indexName(), 'type' => $class::documentType(), 'id' => $this->id, ]); break; default: throw new Exception('Unknown operation: '.$this->operation); } } }
Model Serialization
By default, the model instance will be serialized to JSON using the output of the toIndexedArray
method, which is defined automatically by the package:
Article::first()->toIndexedArray(); => [ 'title' => 'Quick brown fox' ]
If you want to customize the serialization, just implement the toIndexedArray
method yourself, for instance with the toArray
method:
class Article { use ElasticsearchModel; public function toIndexedArray($options = null) { return $this->toArray(); } }
The re-defined method will be used in the indexing methods, such as indexDocument
.
Attribution
Original design from elasticsearch-model which is:
- Copyright (c) 2014 Elasticsearch http://www.elasticsearch.org
- Licensed with Apache 2.0 license (detail in LICENSE.txt)
Changes include a rewrite of the core logic in PHP, as well as slight enhancements to accomodate Laravel and Eloquent.
License
This package inherits the same license as its original. It is licensed under the Apache2 license, quoted below:
Copyright (c) 2016 datashaman <marlinf@datashaman.com>
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.