A data crawler for Googles code review tool 'Gerrit'

v0.2.0 2014-12-04 20:22 UTC

README

Build Status Dependency Status Scrutinizer Code Quality Code Coverage Documentation Status

Gerrie is a data and information crawler for Gerrit, a code review system developed by Google.

Gerrie uses the SSH and REST-APIs offered by Gerrit to transform the data from Gerrit into a RDBMS. Currently only MySQL is supported. After the transformation the data can be used to start simple queries or complex analysis. One usecase is to analyze communites which use Gerrit like TYPO3, Wikimedia, Android, Qt, Eclipse and many more.

Gerrie is deprecated: watson will be replace Gerrie. Watson benefits from our learnings of developing and maintaing Gerrie in a larger (crawling) scale. Checkout #17 for some more information. Neverless we still merge and support contributions to Gerrie.

Features

  • Full imports
  • Incremental imports
  • Full support of SSH API
  • Command line interface
  • MySQL as storage backend
  • Debugging functionality
  • Logging functionality
  • Full documented

Getting started

Download application and install dependencies:

$ git clone https://github.com/andygrunwald/Gerrie.git .
$ composer install

Copy config file and adjust configuration (Database, SSH, Gerrit):

$ cp Config.yml.dist Config.yml
$ vim Config.yml

A minimalistic configuration for the TYPO3 Gerrit instance with the user max.mustermann can look like:

Database:
  Host: 127.0.0.1
  Username: root
  Password:
  Port: 3306
  Name: gerrie

SSH:
  KeyFile: /Users/max/.ssh/id_rsa_gerrie

Gerrit:
  TYPO3:
    - ssh://max.mustermann@review.typo3.org:29418/

Create a new database in your database with name gerrie and setup database scheme:

$ mysql -u root -e "CREATE DATABASE gerrie;"
$ ./gerrie gerrie:setup-database --config-file="./Config.yml"

Create an account (e.g. max.mustermann) in the Gerrit instance you want to crawl (e.g. review.typo3.org:29418), add your SSH public key to the Gerrit instance and execute the gerrie:check command to check your environment:

$ ./gerrie gerrie:check --config-file="./Config.yml"

Important: If your SSH key is protected by a passphrase this check will ask you to enter your passphrase to use the private key for this connection. Gerrie does not save or transfer this passphrase to any foreign server. The private key is only necessary to authenticate against the Gerrit instance.

If everything is fine start crawling:

$ ./gerrie gerrie:crawl --config-file="./Config.yml"

Now the crawler starts and is doing its job 🍺

You reading can continue in the documentation in the chapters Installation, Configuration, Commands, Database or Contributing.

Documentation

The complete and detailed documentation can be found at Gerrie @ Read the Docs. The documentation is written in reStructuredText and shipped with the source code and can be found in the docs/ folder.

Source code

The source code can be found at andygrunwald/Gerrie @ GitHub.

Contributing

Contribution is welcome at every time.

Contribution is not limited to source code. Also documentation, issues (bugs, new features, nice improvements), talks at usergroups or conferences and so on. In our documentation you can find more detailed information about contribution.

See Gerrie: Contribution @ Read the Docs.

License

This project is released under the terms of the MIT license.

Support, contact or feedback

If you got questions, got feedback, getting crazy with setting up or using this project or want to drink a 🍺 and talk about this project just contact me.

Write me an email (see Andy @ GitHub) or tweet me (@andygrunwald). And of course, you can just open an issue in the Gerrie tracker.