Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / WinForms

Spell Check, Hyphenation, and Thesaurus for .NET with C# and VB Samples - Part 1: Single Threading

4.87/5 (25 votes)
24 Jul 2014LGPL33 min read 114.1K  
New NHunspell (Open Office spell checker for .NET) functions at a glance.

Introduction

Spell checking, hyphenation, and synonym lookup via thesaurus are the Open Office spell checker Hunspell functions. The NHunspell project makes these functions available for .NET applications. As the Open Office spell checker Hunspell is used in a vast amount of Open Source applications, it could also be the first choice for .NET applications. Beyond Open Office, Hunspell is currently used in the Mozilla applications Firefox and Thunderbird, the browsers Google Chrome and Opera, and last but not least, in the new Apple MAC OS/X 10.6 "Snow Leopard" Operating System.

Since the first steps (NHunspell - Hunspell for the .NET platform), NHunspell has improved a lot, and goes straight to the first release candidate. The current release 0.9.2 is a milestone because the support of Hunspell is nearly complete.

Using NHunspell Spell Check, Hyphenation, and Thesaurus in Single Threaded Applications

NHunspell is designed to serve two different use cases: single threaded applications, like word processors and any other tool with a UI/GUI, and multi threaded applications like servers and web servers (ASP.NET).

This article covers the single threaded applications. They use the basic NHunspell classes Hunspell, Hyphen, and MyThes. The members aren't thread safe. If these classes are used by multiple threads, a synchronization mechanism like lock must be used. But NHunspell provides special multi-threading classes which are announced in the second part of the article: Spell checking, hyphenation, and thesaurus in multi-threading applications.

Spell Checking: Hunspell

Hunspell objects have several possibilities to work with texts:

  • Spell check and suggestion for a misspelled word: with Spell() and Suggest()
  • Morphological analysis and word stemming: with Analyze() and Stem()
  • Generation (deriving a word from its stem, like girl => girls ) by sample: with Generate()

A C# sample of spell checking, suggestion, analysis, stemming, and generation with Hunspell:

C#
using (Hunspell hunspell = new Hunspell("en_us.aff", "en_us.dic"))
{
    Console.WriteLine("Hunspell - Spell Checking Functions");
    Console.WriteLine("¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯");

    Console.WriteLine("Check if the word 'Recommendation' is spelled correct"); 
    bool correct = hunspell.Spell("Recommendation");
    Console.WriteLine("Recommendation is spelled " + 
       (correct ? "correct":"not correct"));

    Console.WriteLine("");
    Console.WriteLine("Make suggestions for the word 'Recommendatio'");
    List<string> suggestions = hunspell.Suggest("Recommendatio");
    Console.WriteLine("There are " + 
       suggestions.Count.ToString() + " suggestions" );
    foreach (string suggestion in suggestions)
    {
        Console.WriteLine("Suggestion is: " + suggestion );
    }

    Console.WriteLine("");
    Console.WriteLine("Analyze the word 'decompressed'");
    List<string> morphs = hunspell.Analyze("decompressed");
    foreach (string morph in morphs)
    {
        Console.WriteLine("Morph is: " + morph);
    }

    Console.WriteLine("");
    Console.WriteLine("Find the word stem of the word 'decompressed'");
    List<string> stems = hunspell.Stem("decompressed");
    foreach (string stem in stems)
    {
        Console.WriteLine("Word Stem is: " + stem);
    }

    Console.WriteLine("");
    Console.WriteLine("Generate the plural of 'girl' by providing sample 'boys'");
    List<string> generated = hunspell.Generate("girl","boys");
    foreach (string stem in generated)
    {
        Console.WriteLine("Generated word is: " + stem);
    }
}

A Visual Basic sample of spell checking, suggestion, analysis, stemming and generation with Hunspell:

VB
Using hunspell As New Hunspell("en_us.aff", "en_us.dic")
    Console.WriteLine("Hunspell - Spell Checking Functions")
    Console.WriteLine("¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯")

    Console.WriteLine("Check if the word 'Recommendation' is spelled correct")
    Dim correct As Boolean = hunspell.Spell("Recommendation")
    Console.WriteLine("Recommendation is spelled " & (If(correct,"correct","not correct")))

    Console.WriteLine("")
    Console.WriteLine("Make suggestions for the word 'Recommendatio'")
    Dim suggestions As List(Of String) = hunspell.Suggest("Recommendatio")
    Console.WriteLine("There are " & suggestions.Count.ToString() & " suggestions")
    For Each suggestion As String In suggestions
        Console.WriteLine("Suggestion is: " & suggestion)
    Next

    Console.WriteLine("")
    Console.WriteLine("Analyze the word 'decompressed'")
    Dim morphs As List(Of String) = hunspell.Analyze("decompressed")
    For Each morph As String In morphs
        Console.WriteLine("Morph is: " & morph)
    Next

    Console.WriteLine("")
    Console.WriteLine("Find the word stem of the word 'decompressed'")
    Dim stems As List(Of String) = hunspell.Stem("decompressed")
    For Each stem As String In stems
        Console.WriteLine("Word Stem is: " & stem)
    Next

    Console.WriteLine("")
    Console.WriteLine("Generate the plural of 'girl' by providing sample 'boys'")
    Dim generated As List(Of String) = hunspell.Generate("girl", "boys")
    For Each stem As String In generated
        Console.WriteLine("Generated word is: " & stem)

    Next
End Using

Hyphenation: Hyphen

The use of Hyphen to hyphenate is straightforward. Just create a Hyphen object and call Hyphenate(). The HyphenResult allows simple and complex hyphenation with text replacements, like in the old German spelling, the hyphenation of 'ck' as 'k-k'. For further details, refer to the documentation.

A C# sample of hyphenation with Hyphen:

C#
using (Hyphen hyphen = new Hyphen("hyph_en_us.dic"))
{
    Console.WriteLine("Get the hyphenation of the word 'Recommendation'"); 
    HyphenResult hyphenated = hyphen.Hyphenate("Recommendation");
    Console.WriteLine("'Recommendation' is hyphenated as: " + hyphenated.HyphenatedWord ); 
}

A Visual Basic sample of hyphenation with Hyphen:

VB
Using hyphen As New Hyphen("hyph_en_us.dic")
    Console.WriteLine("Get the hyphenation of the word 'Recommendation'")
    Dim hyphenated As HyphenResult = hyphen.Hyphenate("Recommendation")
    Console.WriteLine("'Recommendation' is hyphenated as: " & hyphenated.HyphenatedWord)
End Using

Finding Synonyms With Thesaurus: MyThes

With the thesaurus MyThes, it is quite easy to find synonyms for a given word or phrase. Just create a MyThes object and call Lookup().

Often, only stem forms of a word are part of the thesaurus dictionary. By providing a Hunspell object, your derived word like 'Girls' is stemmed to 'girl', and the synonyms are generated in the primary form like 'misses', 'women', 'females', and not 'miss', 'woman', 'female'. In combination with the stemming and generation functions of Hunspell, MyThes is really a Swiss knife in finding synonyms. The sample shows this feature, but you can also try it on the ASP.NET demonstration project: Spell Check, Hyphenation, and Thesaurus Online.

A C# sample of a synonym lookup in the thesaurus with MyThes:

C#
using( MyThes thes = new MyThes("th_en_us_new.idx","th_en_us_new.dat"))
{
    using (Hunspell hunspell = new Hunspell("en_us.aff", "en_us.dic"))
    {
        Console.WriteLine("Get the synonyms of the plural word 'cars'");
        Console.WriteLine("hunspell must be used to get the word stem 'car' via Stem().");
        Console.WriteLine("hunspell generates the plural forms " + 
                          "of the synonyms via Generate()");
        ThesResult tr = thes.Lookup("cars", hunspell);
        
        if( tr.IsGenerated )
            Console.WriteLine("Generated over stem " + 
              "(The original word form wasn't in the thesaurus)");
        foreach( ThesMeaning meaning in tr.Meanings )
        {
            Console.WriteLine();
            Console.WriteLine("  Meaning: " + meaning.Description );

            foreach (string synonym in meaning.Synonyms)
            {
                Console.WriteLine("    Synonym: " + synonym);

            }
        }
    }
}

A Visual Basic sample of a synonym lookup in the thesaurus with MyThes:

VB
Using thes As New MyThes("th_en_us_new.idx", "th_en_us_new.dat")
    Using hunspell As New Hunspell("en_us.aff", "en_us.dic")
        Console.WriteLine("Get the synonyms of the plural word 'cars'")
        Console.WriteLine("hunspell must be used to get the word stem 'car' via Stem().")
        Console.WriteLine("hunspell generates the plural forms " & _ 
                          "of the synonyms via Generate()")
        Dim tr As ThesResult = thes.Lookup("cars", hunspell)

        If tr.IsGenerated Then
            Console.WriteLine("Generated over stem " & _ 
               "(The original word form wasn't in the thesaurus)")
        End If
        For Each meaning As ThesMeaning In tr.Meanings
            Console.WriteLine()
            Console.WriteLine("  Meaning: " & meaning.Description)

            For Each synonym As String In meaning.Synonyms

                Console.WriteLine("    Synonym: " & synonym)
            Next
        Next
    End Using
End Using

Use in Commercial Applications and Available Dictionaries

Due to the LGPL and MPL licenses, NHunspell can be used in commercial applications. It is allowed to link against the NHunspell.dll assembly in closed source projects. NHunspell uses the Open Office dictionaries; most of these dictionaries are available for free. The use of NHunspell in commercial / closed source applications is permitted.

Resources

The Open Office '.oxt' extensions are in fact ZIP Files. To use it with NHunspell, unzip the dictionaries they contain.

Important: Check the dictionary license before you use it!

Works with NHunspell too.

History

  • 24th July, 2014: Initial version

License

This article, along with any associated source code and files, is licensed under The GNU Lesser General Public License (LGPLv3)