nlp-tools/lang-detect

Classify source code using a Naive Bayes text classifier

dev-master 2013-11-13 22:29 UTC

This package is not auto-updated.

Last update: 2024-12-17 07:29:47 UTC


README

LanguageDetector is an implementation of sourceclassifier in PHP using NlpTools.

LanguageDetector detects the programming language of a source code using a Naive Bayes model. The pre trained provided model recognizes C, C#, C++, Clojure, Go, Haskell, Java, Javascript, MATLAB, Pascal, Perl, PHP, Python, Ruby, Scala, Visual Basic.

You can read a blog post about it.

Usage

include ("vendor/autoload.php");

$detector = LanguageDetector::loadFromFile("model");

$lang = $detector->classify(<<<CODE
#include <stdio.h>

int main() {
	printf("Hello world");
}
CODE
);

echo $lang; // C

$lang = $detector->classify(<<<CODE
def hello():
	print "Hello world"
hello()
CODE
);

echo $lang; // Python