Introduction
Spell checking, hyphenation, and synonym lookup via thesaurus are the Open Office spell checker Hunspell functions. The NHunspell project makes these functions available for .NET applications. As the Open Office spell checker Hunspell is used in a vast amount of Open Source applications, it could also be the first choice for .NET applications. Beyond Open Office, Hunspell is currently used in the Mozilla applications Firefox and Thunderbird, the browsers Google Chrome and Opera, and last but not least, in the new Apple MAC OS/X 10.6 "Snow Leopard" Operating System.
Since the first steps (NHunspell - Hunspell for the .NET platform), NHunspell has improved a lot, and goes straight to the first release candidate. The current release 0.9.2 is a milestone because the support of Hunspell is nearly complete.
Using NHunspell Spell Check, Hyphenation, and Thesaurus in Single Threaded Applications
NHunspell is designed to serve two different use cases: single threaded applications, like word processors and any other tool with a UI/GUI, and multi threaded applications like servers and web servers (ASP.NET).
This article covers the single threaded applications. They use the basic NHunspell classes Hunspell
, Hyphen
, and MyThes
. The members aren't thread safe. If these classes are used by multiple threads, a synchronization mechanism like lock
must be used. But NHunspell provides special multi-threading classes which are announced in the second part of the article: Spell checking, hyphenation, and thesaurus in multi-threading applications.
Spell Checking: Hunspell
Hunspell
objects have several possibilities to work with texts:
- Spell check and suggestion for a misspelled word: with
Spell()
and Suggest()
- Morphological analysis and word stemming: with
Analyze()
and Stem()
- Generation (deriving a word from its stem, like girl => girls ) by sample: with
Generate()
A C# sample of spell checking, suggestion, analysis, stemming, and generation with Hunspell
:
using (Hunspell hunspell = new Hunspell("en_us.aff", "en_us.dic"))
{
Console.WriteLine("Hunspell - Spell Checking Functions");
Console.WriteLine("¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯");
Console.WriteLine("Check if the word 'Recommendation' is spelled correct");
bool correct = hunspell.Spell("Recommendation");
Console.WriteLine("Recommendation is spelled " +
(correct ? "correct":"not correct"));
Console.WriteLine("");
Console.WriteLine("Make suggestions for the word 'Recommendatio'");
List<string> suggestions = hunspell.Suggest("Recommendatio");
Console.WriteLine("There are " +
suggestions.Count.ToString() + " suggestions" );
foreach (string suggestion in suggestions)
{
Console.WriteLine("Suggestion is: " + suggestion );
}
Console.WriteLine("");
Console.WriteLine("Analyze the word 'decompressed'");
List<string> morphs = hunspell.Analyze("decompressed");
foreach (string morph in morphs)
{
Console.WriteLine("Morph is: " + morph);
}
Console.WriteLine("");
Console.WriteLine("Find the word stem of the word 'decompressed'");
List<string> stems = hunspell.Stem("decompressed");
foreach (string stem in stems)
{
Console.WriteLine("Word Stem is: " + stem);
}
Console.WriteLine("");
Console.WriteLine("Generate the plural of 'girl' by providing sample 'boys'");
List<string> generated = hunspell.Generate("girl","boys");
foreach (string stem in generated)
{
Console.WriteLine("Generated word is: " + stem);
}
}
A Visual Basic sample of spell checking, suggestion, analysis, stemming and generation with Hunspell
:
Using hunspell As New Hunspell("en_us.aff", "en_us.dic")
Console.WriteLine("Hunspell - Spell Checking Functions")
Console.WriteLine("¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯")
Console.WriteLine("Check if the word 'Recommendation' is spelled correct")
Dim correct As Boolean = hunspell.Spell("Recommendation")
Console.WriteLine("Recommendation is spelled " & (If(correct,"correct","not correct")))
Console.WriteLine("")
Console.WriteLine("Make suggestions for the word 'Recommendatio'")
Dim suggestions As List(Of String) = hunspell.Suggest("Recommendatio")
Console.WriteLine("There are " & suggestions.Count.ToString() & " suggestions")
For Each suggestion As String In suggestions
Console.WriteLine("Suggestion is: " & suggestion)
Next
Console.WriteLine("")
Console.WriteLine("Analyze the word 'decompressed'")
Dim morphs As List(Of String) = hunspell.Analyze("decompressed")
For Each morph As String In morphs
Console.WriteLine("Morph is: " & morph)
Next
Console.WriteLine("")
Console.WriteLine("Find the word stem of the word 'decompressed'")
Dim stems As List(Of String) = hunspell.Stem("decompressed")
For Each stem As String In stems
Console.WriteLine("Word Stem is: " & stem)
Next
Console.WriteLine("")
Console.WriteLine("Generate the plural of 'girl' by providing sample 'boys'")
Dim generated As List(Of String) = hunspell.Generate("girl", "boys")
For Each stem As String In generated
Console.WriteLine("Generated word is: " & stem)
Next
End Using
Hyphenation: Hyphen
The use of Hyphen
to hyphenate is straightforward. Just create a Hyphen
object and call Hyphenate()
. The HyphenResult
allows simple and complex hyphenation with text replacements, like in the old German spelling, the hyphenation of 'ck
' as 'k-k
'. For further details, refer to the documentation.
A C# sample of hyphenation with Hyphen
:
using (Hyphen hyphen = new Hyphen("hyph_en_us.dic"))
{
Console.WriteLine("Get the hyphenation of the word 'Recommendation'");
HyphenResult hyphenated = hyphen.Hyphenate("Recommendation");
Console.WriteLine("'Recommendation' is hyphenated as: " + hyphenated.HyphenatedWord );
}
A Visual Basic sample of hyphenation with Hyphen
:
Using hyphen As New Hyphen("hyph_en_us.dic")
Console.WriteLine("Get the hyphenation of the word 'Recommendation'")
Dim hyphenated As HyphenResult = hyphen.Hyphenate("Recommendation")
Console.WriteLine("'Recommendation' is hyphenated as: " & hyphenated.HyphenatedWord)
End Using
Finding Synonyms With Thesaurus: MyThes
With the thesaurus MyThes
, it is quite easy to find synonyms for a given word or phrase. Just create a MyThes
object and call Lookup()
.
Often, only stem forms of a word are part of the thesaurus dictionary. By providing a Hunspell
object, your derived word like 'Girls' is stemmed to 'girl', and the synonyms are generated in the primary form like 'misses', 'women', 'females', and not 'miss', 'woman', 'female'. In combination with the stemming and generation functions of Hunspell
, MyThes
is really a Swiss knife in finding synonyms. The sample shows this feature, but you can also try it on the ASP.NET demonstration project: Spell Check, Hyphenation, and Thesaurus Online.
A C# sample of a synonym lookup in the thesaurus with MyThes
:
using( MyThes thes = new MyThes("th_en_us_new.idx","th_en_us_new.dat"))
{
using (Hunspell hunspell = new Hunspell("en_us.aff", "en_us.dic"))
{
Console.WriteLine("Get the synonyms of the plural word 'cars'");
Console.WriteLine("hunspell must be used to get the word stem 'car' via Stem().");
Console.WriteLine("hunspell generates the plural forms " +
"of the synonyms via Generate()");
ThesResult tr = thes.Lookup("cars", hunspell);
if( tr.IsGenerated )
Console.WriteLine("Generated over stem " +
"(The original word form wasn't in the thesaurus)");
foreach( ThesMeaning meaning in tr.Meanings )
{
Console.WriteLine();
Console.WriteLine(" Meaning: " + meaning.Description );
foreach (string synonym in meaning.Synonyms)
{
Console.WriteLine(" Synonym: " + synonym);
}
}
}
}
A Visual Basic sample of a synonym lookup in the thesaurus with MyThes
:
Using thes As New MyThes("th_en_us_new.idx", "th_en_us_new.dat")
Using hunspell As New Hunspell("en_us.aff", "en_us.dic")
Console.WriteLine("Get the synonyms of the plural word 'cars'")
Console.WriteLine("hunspell must be used to get the word stem 'car' via Stem().")
Console.WriteLine("hunspell generates the plural forms " & _
"of the synonyms via Generate()")
Dim tr As ThesResult = thes.Lookup("cars", hunspell)
If tr.IsGenerated Then
Console.WriteLine("Generated over stem " & _
"(The original word form wasn't in the thesaurus)")
End If
For Each meaning As ThesMeaning In tr.Meanings
Console.WriteLine()
Console.WriteLine(" Meaning: " & meaning.Description)
For Each synonym As String In meaning.Synonyms
Console.WriteLine(" Synonym: " & synonym)
Next
Next
End Using
End Using
Use in Commercial Applications and Available Dictionaries
Due to the LGPL and MPL licenses, NHunspell can be used in commercial applications. It is allowed to link against the NHunspell.dll assembly in closed source projects. NHunspell uses the Open Office dictionaries; most of these dictionaries are available for free. The use of NHunspell in commercial / closed source applications is permitted.
Resources
The Open Office '.oxt' extensions are in fact ZIP Files. To use it with NHunspell, unzip the dictionaries they contain.
Important: Check the dictionary license before you use it!
Works with NHunspell too.
History
- 24th July, 2014: Initial version