Introduction
I found many word stemming algorithms implemented for the English language, some very good, others not so, and so on... I once had a project where my task was implementation of a stemming algorithm on .NET framework, for German, and I could not found any implementation in .NET framework, for German. So this is my implementation of word stemming for German language, on the .NET framework in C#.
There is source code for stemingLib.dll and example source code on how to use stemingLib.dll. With the demo project there is a sample of German vocabulary in a file rjecnik.txt and same vocabulary correct stemmed in output.txt. Demo application provides new stemmed vocabulary rezultat.txt and compares with output.txt to check for errors. This is actually my test application for this implementation. The other example is in the next section of this article.
Using the code
This is a simple library with only one class called destemmer
. This class has only one property Word
and one method Stem
. You can use this class like:
destemmer Stemmer = new destemmer();
Console.WriteLine("\n Input some german word: ");
string g_word = Console.ReadLine();
string stemmed_word = Stemmer.Stem( g_word.ToLower() );
Stemmer.Word = g_word.ToLower();
Stemmer.Stem();
string stemmed_word2 = Stemmer.Word;
Console.WriteLine("\n Stemming result on first way: {0} ",
stemmed_word);
Console.WriteLine("\n Stemming result on second way: {0} ",
stemmed_word2);
Points of interest
This library works only for Unicode string, and all word strings passed to the Stem()
function or in to Word
property have to be lowercase. Algorithm: There is no need for me to give you the details of the algorithm which I use for the implementation of German word stemming. All information you could be interested to know, can be found on http://snowball.tartarus.org/, where I found the original algorithm and implemented it for the German language, on .NET framework.