A data crawler for Googles code review tool 'Gerrit'
Gerrie uses the SSH and REST-APIs offered by Gerrit to transform the data from Gerrit into a RDBMS. Currently only MySQL is supported. After the transformation the data can be used to start simple queries or complex analysis. One usecase is to analyze communites which use Gerrit like TYPO3, Wikimedia, Android, Qt, Eclipse and many more.
- Website: andygrunwald.github.io/Gerrie
- Source code: Gerrie @ GitHub
- Documentation: Gerrie @ Read the Docs
Gerrie is deprecated: watson will be replace Gerrie. Watson benefits from our learnings of developing and maintaing Gerrie in a larger (crawling) scale. Checkout #17 for some more information. Neverless we still merge and support contributions to Gerrie.
- Full imports
- Incremental imports
- Full support of SSH API
- Command line interface
- MySQL as storage backend
- Debugging functionality
- Logging functionality
- Full documented
Download application and install dependencies:
$ git clone https://github.com/andygrunwald/Gerrie.git . $ composer install
Copy config file and adjust configuration (Database, SSH, Gerrit):
$ cp Config.yml.dist Config.yml $ vim Config.yml
A minimalistic configuration for the TYPO3 Gerrit instance with the user max.mustermann can look like:
Database: Host: 127.0.0.1 Username: root Password: Port: 3306 Name: gerrie SSH: KeyFile: /Users/max/.ssh/id_rsa_gerrie Gerrit: TYPO3: - ssh://firstname.lastname@example.org:29418/
Create a new database in your database with name gerrie and setup database scheme:
$ mysql -u root -e "CREATE DATABASE gerrie;" $ ./gerrie gerrie:setup-database --config-file="./Config.yml"
Create an account (e.g. max.mustermann) in the Gerrit instance you want to crawl (e.g. review.typo3.org:29418), add your SSH public key to the Gerrit instance and execute the gerrie:check command to check your environment:
$ ./gerrie gerrie:check --config-file="./Config.yml"
Important: If your SSH key is protected by a passphrase this check will ask you to enter your passphrase to use the private key for this connection. Gerrie does not save or transfer this passphrase to any foreign server. The private key is only necessary to authenticate against the Gerrit instance.
If everything is fine start crawling:
$ ./gerrie gerrie:crawl --config-file="./Config.yml"
Now the crawler starts and is doing its job 🍺
The source code can be found at andygrunwald/Gerrie @ GitHub.
Contribution is welcome at every time.
Contribution is not limited to source code. Also documentation, issues (bugs, new features, nice improvements), talks at usergroups or conferences and so on. In our documentation you can find more detailed information about contribution.
This project is released under the terms of the MIT license.
If you got questions, got feedback, getting crazy with setting up or using this project or want to drink a 🍺 and talk about this project just contact me.