Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

A Naive Bayesian Classifier in C#

0.00/5 (No votes)
28 May 2006 1  
A Naive Bayesian Classifier in C#
Sample Image - pict.gif

Introduction

I was looking for a way to classify short texts into several categories. A simple but probably sufficient method seemed to be naive bayesian classification. Looking for some readily available stuff, I found many different implementations in Perl or Java. The only CLR implementation I could find was NClassifier, yet it was not doing classification into multiple classes. Therefore I decided to write my own.

Background

There is plenty of information around on the Internet describing the theory of bayesian classification. Wikipedia has a good introduction.

Using the Code

First, create an instance of BayesClassifier.Classifier.

BayesClassifier.Classifier m_Classifier = new BayesClassifier.Classifier();

Tip: You may experiment with BayesClassifier.ExcludedWords to define the words that you will consider irrelevant for your classification. That can lead to smaller dictionaries and therefore speed up the classification.

Then define the categories and teach each category:

m_Classifier.TeachCategory("Cat1", new System.IO.StreamReader(file));
m_Classifier.TeachPhrases("Cat2", new string[] { "Hi", "HoHo" });

Finally the method BayesClassifier.Classifier.Classify will return the classification result.

Dictionary<string, double> score = 
    m_Classifier.Classify(new System.IO.StreamReader(file));

Let me know if you have any questions or suggestions, and let me know if you have any experiences with the applicability of the naive bayesian approach. (Since the (wrong) assumption of word independence might turn out to influence the result).

History

  • 28th May, 2006: Version 1.0

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here