jity/tag-generator

v0.2.1 2012-11-24 10:38 UTC

README

Build Status

About

This bundle is part of the Jity project. With the help of this generator you be able to transform any text to a usefull collection of tags.

Installation

Add JityTagGenerator to your composer.json:

{
    "require": {
        "jity/tag-generator": "dev-master"
    }
}

Download bundle:

php composer.phar update

Add the JityTagGenerator to your AppKernel.php

public function registerBundles()
{
    $bundles = array(
        ...
        new Jity\TagGeneratorBundle\JityTagGeneratorBundle(),
        ...
    );
    ...
}

Usage

This is a simple example how to use the TagGenerator.

use Jity\Tag\TagGenerator,
    Jity\Tag\Filter\Score,
    Jity\Tag\Filter\ScoreGroup,
    Jity\Tag\Filter\Length,
    Jity\Tag\Filter\Occurrence,
    Jity\Tag\Filter\Dictionary,
    Jity\Tag\Filter\Capitalized,
    Jity\Tag\Filter\Uppercase,
    Jity\Tag\Filter\Camelcase,
    Jity\Tag\Filter\Regex;

/* ------------------------------------------------------ */
/* - Configuration */
/* ------------------------------------------------------ */

// Instantiate a new Generator
$generator = new TagGenerator();

// Configure all Filters
$generator

    /* Remove words shorter than 3 chars */
    ->addFilter(
        new Length(1, true, array(
            'min' => 2
        ))
    )

    /* Remove most useless words from collection (stop-words) */
    ->addFilter(
        new Dictionary(1, true, array(
            'match'         => true,
            'casesensitive' => false,
            'dictionaries'  => array(
                'german'    => array(
                    'adjektive',
                    'verben',
                    'klein',
                    'fixwords'
                )
            )
        ))
    )  

    /* Score occurrence of remaining words */
    ->addFilter(
        new Occurrence(5)
    ) 

    /* Score uppercased words */
    ->addFilter(new Uppercase(15))

    /* Score camelcased words */
    ->addFilter(new Camelcase(15))

    /* Score capitalized words */
    ->addFilter(new Capitalized(5));

// Receive the collection of tags
$tags = $generator->getTags('Lorem ipsum etc');

Development

Write own filters

All you need to do this is to implement Jity\Tag\Filter\FilterInterface or extend Jity\Tag\Filter\AbstractFilter. A good and simple example is the Jity\Tag\Filter\Uppercase filter. Just have a look at this.

Recompile a dictionary

Go to resources/dictionaries/LANG/source and run:

for i in stopwords fixwords adjektive verben compound klein verben worte; do cat source/${i}*.txt | ../compiler.sh "$i"; done