Introduction
I am convinced that the quality of an application depends on the use of proper spelling and grammar. It might be argued that this is nitpicking and that productivity is far more important. While this is important, one must bear in mind that any line of code will be read more often than it was written.
Good programs should not contain language errors due to the diligence of the programmers. Written code needs to be correct at all levels, starting with the design, the correctness and consistency in the names used down to the periods in the comments.
Not everyone has English as his first language, including myself and there are number of people with certain disabilities, e.g. dyslexia who also might have trouble reading or writing proper English. It is not to say that these people could not write a good application. The point I'm trying to make is that the written code should convey the intention of the writer.
To facilitate this, I thought it would be interesting to write a spelling checker extension for Visual Studio 2010 with the following features:
- Check the spelling of the words in (XML) comments
- Check the spelling of names on declaration using the MS.NET naming convention
- Check the spelling of the words in string literals
- Mark misspelled words with red squiggly lines
- Provide a list of suggestions for a misspelled word
- A misspelled word can be added to the custom words dictionary
- The custom words dictionary is maintained inside the source tree. This has the benefit of source control and synchronizing it across a team, in case of teamwork.
First Step: Getting a Red Squiggle Under a Bit of Text
This turned out to be surprisingly easy. Start off with using a project template in order to get familiarized with some of the basic attributes in the MEF (Managed extension framework).
Visual Studio 2010 has a mechanism called tagging in which subscribers can add metadata to bits of the text file. The important concepts are:
Span
, which in essence are coordinates (starting point + length) within the text
ITagger
, which tells the IDE where to place the tags
ITag
, which tells the IDE what to render (The IErrorTag
is rendered as the red squiggle)
TagSpan
, the combination of a Tag
and a Span
, a.k.a. where and what
The interface ITagger<IErrorTag>
has one method called GetTags
, which accepts a span
collection and must return an enumeration of TagSpan
objects. The provided span
collection marks areas of interest to the IDE. This might be because the user has changed or otherwise triggered changes in the view and as such need retagging.
The following example code shows an implementation of this method which puts a squiggle under each “A
”. These squiggles also display “C
” as the accompanying tooltip.
public IEnumerable<ITagSpan<IErrorTag>> GetTags (NormalizedSnapshotSpanCollection spans)
{
foreach (var span in spans)
{
var text = span.Snapshot.GetText (span);
for (int i = 0; i < text.Length; i++)
{
if (text[i] == 'A')
{
yield return new TagSpan<ErrorTag>
(
new SnapshotSpan
(
span.Snapshot,
span.Start + i,
1
),
new ErrorTag ("B", "C")
);
}
}
}
}
Second Step: Getting a Context Like Menu Tagged (a.k.a. Smart Tag) to a Bit of Text.
Again this turned out to be surprisingly easy. This is achieved by utilizing the same ITagger
interface but this time with the SmartTag
class as the type argument. SmartTag
is the place holder for actions (grouped in sets). The user is hinted by a small blue rectangle of its presence. When the user places the mouse over the rectangle, it will show a dropdown list with the provided actions.
The ISmartTagAction
interface has 2 important members:
DisplayText
property, which is shown to the user in the dropdown list
Invoke
method, which provides the actual implementation
Third Step: Check the Spelling
I remembered that one of the features of WPF is spellchecking and decided to investigate whether this would suffice. Although in WPF 4.0 support for custom words dictionaries has been added, it has been limited to English, French, German and Spanish. This wasn’t a problem as the Microsoft .NET naming guidelines state that identifiers should be declared in English.
However, the actual implementation was hidden (marked as internal), which I could either hack using reflection or using an actual WPF textbox underneath the covers. Tip: When you use the WPF textbox, the spellchecker disabled itself when the user uses a different input locale from the previously mentioned languages (+dialects). You can override this user setting to a specific language using an attached property “xml:lang
” and an ISO 3166 code, e.g. “en-us
” or “en-gb
”.
I struggled with both strategies for a while, and eventually came to the conclusion that both were lacking what I was looking for... So I decided to look for alternatives and came across “Hunspell
”, an open source spellchecking library, currently used by open office, Firefox, and others. It comes with a very large number of dictionaries and has thesaurus, hyphen and grammar functionality as well.
Unfortunately it is not exposed as com, so no standard PInvoke. Using some old school kernel32.dll trickery, we can load the library and unload it. But the real magic is that GetProcAddress
combined with the Marshal.GetDelegateForFunctionPointer
allows you to create a delegate which calls into the library. (For more details, check out the Hunspell
namespace.)
Fourth Step: Getting the Bits to Check the Spelling Of
An interesting technique used by the spell checker created by Roman Golovin and Noah Richards. This spell checker is/will be part of the VS SDK and uses a service called the classifier. This mechanism is used by the language services to classify bits of text as to their contextual and generic meaning. My guess is that this is used by the syntax colorizing process.
Unfortunately, it did not provide any detail when a name was declared as opposed to used. So I decided to build my own C# scanner & simple parser (as this is my language of choice). Though the scanner and the lexical tokens are complete, the intention of the parser is only to correctly identify all different types of name declarations.
Screenshots
You can tell from this screenshot that the word “misspellled" (having 3 L’s) and Namespace have been marked. It also shows namespace, alias and delegate (with parameter and defaults) declarations.
By hovering over the smart tag indicator (blue rectangle), the drop down becomes visible.
The first word to be added to the custom words dictionary results in the file CustomWords.txt to be added to the solution in the standard folder “Solution items”.
An enumeration with attributes and comments:
An interface with various members:
A class with various members:
Known Bugs
- No handling C# pre compiler directives (Fixed)
Future Versions
- Including unit tests
- Adding MS build task
- Adding a VS command to scan all files in the solution akin to the find/replace method
- Adding (T) SQL language support
- Adding VB.NET language support
- Adding F# language support
- Using a managed spelling checker, maybe writing my own for fun
- Suggestions or bugs?