Introduction
SpellCheck.NET is free online spell checking site. Whenever I need to check my spelling I visit this site, so I decided to write a parser for this site. I wrote this parser with C# and wrapped it up in a DLL file and called it Word.dll. In this article I will show you how to parse a HTML page using regular expressions. I will not explain all the source code since it is available for download. My main purpose of this project is to demonstrate how to parse a HTML page using regular expressions.
Before this project I have never worked with regular expressions seriously, so I decided to use regular expressions. In this project I have learned a lot about C# regular expressions and .NET framework. The difficult part was in this project writing regular expressions pattern. So I referred to different sites and books to get the right pattern.
Here are some useful sites to check out.
About Word.dll
Word.dll has one public class and two public methods
- Public Class SpellCheck
Include "using Word.dll" at the top of file for the object reference.
SpellCheck word = new SpellCheck();
- Public Method CheckSpelling
This method will check the word and return true or false. If the word is correct then it will return true otherwise false.
bool status = false;
status = word.CheckSpelling("a word");
- Public Method GetSpellingSuggestions
This method will return the collection of suggested words.
foreach(string suggestion in word.GetSpellingSuggestions("a word"))
{
System.Console.WriteLine( suggestion );
}
Parser Technique
- Connect to spellcheck.net site and pass the word.
- Look for the word "correctly." in html file, if found return true
- else look for the word "misspelled.", if found return false.
regular expression pattern @"(correctly.)|(misspelled.)"
- If the word misspelled found in html then look for the word "suggestions:"
regular expression pattern @"(suggestions:)"
- and parse the string between <blockquote>
regular expression pattern @"<blockquote>(?:\s*([^<]+) \s*)+ </blockquote>"
- and finally return the collection of suggested words.
C# code:
Source file is included in zip format for download.
Calling Word.dll wrapper class:
This is how you would call this wrapper class in your application.
using System;
using Word;
class TestHarness
{
[STAThread]
static void Main(string[] args)
{
SpellCheck word = new SpellCheck();
bool status = false;
string s = "youes";
Console.WriteLine("Checking for word : " + s );
status = word.CheckSpelling(s);
if (status == false)
{
Console.WriteLine("This word is misspelled : " + s);
Console.WriteLine("Here are some suggestions");
Console.WriteLine("-------------------------");
foreach( string suggestion in word.GetSpellingSuggestions(s) )
{
System.Console.WriteLine( suggestion );
}
}
else if (status == true)
{
Console.WriteLine("This word is correct : " + s );
}
}
}
Compiling:
Run the "compile.bat" file at the DOS prompt, it will create necessary files.
Output:
This is how your screen would look like after you execute TestHarness.exe file.