Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Word stemming for German on .NET Framework

0.00/5 (No votes)
22 Feb 2003 1  
An article on word stemming algorithm, implemented for German language on the .NET framework.

Introduction

I found many word stemming algorithms implemented for the English language, some very good, others not so, and so on... I once had a project where my task was implementation of a stemming algorithm on .NET framework, for German, and I could not found any implementation in .NET framework, for German. So this is my implementation of word stemming for German language, on the .NET framework in C#.

There is source code for stemingLib.dll and example source code on how to use stemingLib.dll. With the demo project there is a sample of German vocabulary in a file rjecnik.txt and same vocabulary correct stemmed in output.txt. Demo application provides new stemmed vocabulary rezultat.txt and compares with output.txt to check for errors. This is actually my test application for this implementation. The other example is in the next section of this article.

Using the code

This is a simple library with only one class called destemmer. This class has only one property Word and one method Stem. You can use this class like:

// Create object who will performe stemming

destemmer Stemmer = new destemmer();
Console.WriteLine("\n Input some german word: ");
string g_word = Console.ReadLine();
// You can call function Stem(string word) to get stemmed word.

string stemmed_word = Stemmer.Stem( g_word.ToLower() );
// Or You can initialize 'Word' property

Stemmer.Word = g_word.ToLower();
// Then call stem function like this

Stemmer.Stem();
/* and retrive result on this way 
(property Word after calling Stem() function 
contains stemmed word, not original word) */ 
string stemmed_word2 = Stemmer.Word;
Console.WriteLine("\n Stemming result on first way: {0} ", 
                                              stemmed_word);
Console.WriteLine("\n Stemming result on second way: {0} ", 
                                             stemmed_word2);
/* And its so simple */

Points of interest

This library works only for Unicode string, and all word strings passed to the Stem() function or in to Word property have to be lowercase. Algorithm: There is no need for me to give you the details of the algorithm which I use for the implementation of German word stemming. All information you could be interested to know, can be found on http://snowball.tartarus.org/, where I found the original algorithm and implemented it for the German language, on .NET framework.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here