nekman/es-pagination

Deep pagination for the Elasticsearch client

2.0.0 2022-06-09 13:53 UTC

This package is auto-updated.

Last update: 2024-04-22 19:40:18 UTC


README

Build Status Coverage Status

A library to deep paginate an Elasticsearch search operation. There are three ways to paginate:

  1. Scroll
  2. From
  3. Search after

Which one to use depends on the context, read more in the Elasticsearch documentation.

The library will get pageSize amount of hits in memory at the same time, which means a lower amount will result in less memory used but more requests to Elasticsearch (and the opposite). Never will it fully exhaust an index before returning the results.

Usage

The first step is to construct an $elasticsearchClient (instance of Elasticsearch\Client) which you can read more about in the Elasticsearch official PHP driver.

Scroll

use Nekman\EsPagination\CursorFactories\EsScrollCursorFactory;

$cursorFactory = new EsScrollCursorFactory(
    $elasticsearchClient,
    $pageSize = 1000,
    $scrollDuration = "1m"
);

$params = [
    /*
     * Same params as a normal Elasticsearch search operation.
     * See Elasticsearch documentation for more information.
     */
];

$cursor = $cursorFactory->hits($params);

foreach ($cursor as $hit) {
    echo "Hit {$hit['_id']}";
}

From

use Nekman\EsPagination\CursorFactories\EsFromCursorFactory;

$cursorFactory = new EsFromCursorFactory(
    $elasticsearchClient,
    $pageSize = 1000
);

$params = [
    /*
     * Same params as a normal Elasticsearch search operation.
     * See Elasticsearch documentation for more information.
     */
];

$cursor = $cursorFactory->hits($params);

foreach ($cursor as $hit) {
    echo "Hit {$hit['_id']}";
}

Search after

use Nekman\EsPagination\CursorFactories\EsSearchAfterCursorFactory;

$cursorFactory = new EsSearchAfterCursorFactory(
    $elasticsearchClient,
    $pageSize = 1000
);

$params = [
    /*
     * Same params as a normal Elasticsearch search operation.
     * See Elasticsearch documentation for more information.
     */
];

$cursor = $cursorFactory->hits($params);

foreach ($cursor as $hit) {
    echo "Hit {$hit['_id']}";
}

Point in time (PIT)

Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. Create a cursor factory and decorate it with PIT:

use \Nekman\EsPagination\CursorFactories\EsPitCursorFactory;

$cursorFactory = /* Create cursor factory, see above */;

$pitCursorFactory = new EsPitCursorFactory(
	$cursorFactory,
	$elasticsearchFactory,
	$pitKeepAlive = "1m"
);

$params = [
    /*
     * Same params as a normal Elasticsearch search operation.
     * See Elasticsearch documentation for more information.
     */
];

$cursor = $cursorFactory->hits($params);

foreach ($cursor as $hit) {
    echo "Hit {$hit['_id']}";
}

Versioning

This project complies with Semantic Versioning.

Changelog

For a complete list of changes, and how to migrate between major versions, see releases page.