UCS DataExtractor Component

3.0.1 2017-10-12 19:26 UTC


Principles of DataExtraction


Searching is one of the main challenge of a good web site. It can be achieve through different channels and may relies on various technologies. You can use, for example, a pure database search with filters or tools like Apache Solr or ElasticSearch.

In addition, displaying a list of items in a page is very similar as you often have to extract data based on very specific business rules. This process is called data extraction and this component stands the basis for achieving this process.

The generic principles of data extraction are the following:

  1. Build a base query
  2. Add filters, sorting and limitations
  3. Execute the query
  4. Hydrate the results

An exception would be on the specific filtering task, where in some cases you may need to filter at query post execution rather than before. Filtering data is usually collected via a form.

Filter Types

There are many types of filters:

  • Text with exact/partial match
  • Multiple/Single property values
  • Choices
  • Range
  • Regular Expression

In addition the searching process may or not include related entities and filters may be composed by logical expressions (AND/OR).

Basically, when you have a filter that accepts multiple values they are linked with a OR clause while each filter is linked with an AND clause.

From a general consideration point of view filters should be exposed easily as they can be considered generic and common to any filtering process. They only relies on displayed properties !


Extracting data from a database can be a complex process even with the ORM. In addition it is sometime useful to be able to build a custom and efficient query using some user defined filters. Filters can be complex and related to joined entities.

UCS DataExtractor has been designed with the aim of making easy the creation of such filters. In addition, the extraction process is normalized to ensure that your top level code can be (re)used with different extractors (Doctrine, Propel, Solr...).

Extracting data is finally a very simple process that can be summarized in the following few lines of code (using doctrine extractor):


// Configure the extraction process
$config = new DataExtractorConfiguration();
    ->setFirstResult(0) // Retrieve from the first result
    ->setMaxResults(30) // Retrieve 30 results max
    ->addFilter(new \UCS\Component\DataExtractor\Filter\LikeFilter('username', 'nicolas'))
    ->addFilter(new \UCS\Component\DataExtractor\Filter\LikeFilter('email', 'gmail'))
    ->addOrderBy(new \UCS\Component\DataExtractor\Model\OrderBy('username', 'ASC'));

// get the extractor handler form the factory
// The extractor can be a string of a registered extractor or an instance of a
// class that implements the DataExtractorInterface
$handler = $factory->createFromConfig(new \Acme\DemoComponent\Extractor\CustomExtractor, $config);
$resultSet = $handler->extract();

At the end the result set is like a paginator, and contains your results and additional database informations. Usually this process results in two queries:

  1. For getting a count of all elements
  2. For retrieving exactly the number of wanted elements

Creating a Form for presenting filtering options

Most of the time, filters are presented to your end users so that they can modify the display accordingly to the options available. UCS DataExtractorComponent comes with a built-in form and mechanism that lets your prepare this quickly.

The form can be configured to create any kind of filters, you only have to give their types in the form options so that they can be properly rendered:


// Create a form from this type within the Symfony FormFacotry
$form = $formFactory->createNamed('my_filters', 
    new \UCS\Component\DataExtractor\Form\DataExtractorConfigurationType(),
        'filterOptions' => array(
            'filters' => array(
                'foo' => array(
                    'type' => \UCS\Component\DataExtractor\Form\Filter\LikeFilterType(),
                    'filter_property' => 'foo'
                'bar' => array(
                    'type' => \UCS\Component\DataExtractor\Form\Filter\StringEqualsFilterType(),
                    'filter_property' => 'bar'

The filter_property option is not required in the filter form options, if not specified the filter property will be set to the filter name in the form. This options represents the final property path in the object.

For example, if you have a User entity linked to a Group entity with a ManyToMany relationship, you may want to: "Filter users that belongs to the group with a name similar to 'foo'". Then, the property path will be: group.name and you will use the LikeFilter.

Creating a DataExtractorType

Configuring your extractor forms, can be overkill and you may want to do it once and (re)use your configuration at different places in your application. Therefore UCS DataExtractor component comes with a built-in system to let you configure your extractors form in a simple class.

For example, to create the form we've presented in the previous section, we will create the following class:


namespace Acme\DemoBundle\DataExtractor\Type;

/* Imports */
use UCS\Component\DataExtractor\DataExtractorTypeInterface;
use UCS\Component\DataExtractor\Form\Filter\LikeFilterType;
use UCS\Component\DataExtractor\Form\Filter\StringEqualsFilterType;

 * Configure Filters and Extraction process for MyEntity
class MyEntityExtractorType extends DataExtractorTypeInterface
     * {@inheritdoc}
    public function getName()
        return 'my_entity_extractor';

     * {@inheritdoc}
    public function getDataClass()
        return 'Acme\DemoBundle\Entity\MyEntity';

     * {@inheritdoc}
    public function getExtractor()
        // The return value here can be either a string or an instance
        // of a data extractor
        return 'doctrine_entity';
     * {@inheritdoc}
    public function buildExtractor(DataExtractorBuilderInterface $builder, array $options = array())
        $builder->setFirstResult(0) // Retrieve from the first result
            ->setMaxResults(30) // Retrieve 30 results max
            ->addFilterType('foo', new LikeFilterType())
            ->addFilterType('bar', new StringEqualsFilterType());

     * {@inheritdoc}
    public function setExtractorOptions(OptionsResolver $resolver)
        // Add options you want to give to DataExtractorTypeInterface::buildExtractor here

Filter form types can be registered in the factory and can be used by their names similarly to Symfony2 forms.

Registering your types

UCS DataExtractor comes with a built-in system that lets your register your types and build them directly by their names:


$registry = new \UCS\Component\DataExtractor\DataExtractorRegistry();
$factory = new \UCS\Component\DataExtractor\DataExtractorFactory($registry);

$factory->register(new \Acme\DemoBundle\DataExtractor\Type\MyEntityExtractorType);

Once registered you can get the DataExtractorHandler from the factory and extract your entities in a very few steps:


$handler = $factory->create('my_entity_extractor');

// Bind the request data
$form = $handler->getForm();

// or ...

// Finally retrieve the entities
$entities = $handler->extract();

The DataExtractorHandler also comes with a submit method which directly calls the form's submit method.