Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Code Spelling Checker Extension for Visual Studio 2010 (VSX)

0.00/5 (No votes)
7 Apr 2010 1  
Building a spelling checker for source code as an extension for Visual Studio 2010

Introduction

I am convinced that the quality of an application depends on the use of proper spelling and grammar. It might be argued that this is nitpicking and that productivity is far more important. While this is important, one must bear in mind that any line of code will be read more often than it was written.

Good programs should not contain language errors due to the diligence of the programmers. Written code needs to be correct at all levels, starting with the design, the correctness and consistency in the names used down to the periods in the comments.

Not everyone has English as his first language, including myself and there are number of people with certain disabilities, e.g. dyslexia who also might have trouble reading or writing proper English. It is not to say that these people could not write a good application. The point I'm trying to make is that the written code should convey the intention of the writer.

To facilitate this, I thought it would be interesting to write a spelling checker extension for Visual Studio 2010 with the following features:

  • Check the spelling of the words in (XML) comments
  • Check the spelling of names on declaration using the MS.NET naming convention
  • Check the spelling of the words in string literals
  • Mark misspelled words with red squiggly lines
  • Provide a list of suggestions for a misspelled word
  • A misspelled word can be added to the custom words dictionary
  • The custom words dictionary is maintained inside the source tree. This has the benefit of source control and synchronizing it across a team, in case of teamwork.

First Step: Getting a Red Squiggle Under a Bit of Text

This turned out to be surprisingly easy. Start off with using a project template in order to get familiarized with some of the basic attributes in the MEF (Managed extension framework).

Visual Studio 2010 has a mechanism called tagging in which subscribers can add metadata to bits of the text file. The important concepts are:

  • Span, which in essence are coordinates (starting point + length) within the text
  • ITagger, which tells the IDE where to place the tags
  • ITag, which tells the IDE what to render (The IErrorTag is rendered as the red squiggle)
  • TagSpan, the combination of a Tag and a Span, a.k.a. where and what

The interface ITagger<IErrorTag> has one method called GetTags, which accepts a span collection and must return an enumeration of TagSpan objects. The provided span collection marks areas of interest to the IDE. This might be because the user has changed or otherwise triggered changes in the view and as such need retagging.

The following example code shows an implementation of this method which puts a squiggle under each “A”. These squiggles also display “C” as the accompanying tooltip.

public IEnumerable<ITagSpan<IErrorTag>> GetTags (NormalizedSnapshotSpanCollection spans)
{
    foreach (var span in spans)
    {
        var text = span.Snapshot.GetText (span);
        for (int i = 0; i < text.Length; i++)
        {
            if (text[i] == 'A')
            {
                yield return new TagSpan<ErrorTag>
                (
                    new SnapshotSpan
                    (
                        span.Snapshot,
                        span.Start + i,
                        1
                    ),
                    new ErrorTag ("B", "C")
                );
            }
        }
    }
}

Second Step: Getting a Context Like Menu Tagged (a.k.a. Smart Tag) to a Bit of Text.

Again this turned out to be surprisingly easy. This is achieved by utilizing the same ITagger interface but this time with the SmartTag class as the type argument. SmartTag is the place holder for actions (grouped in sets). The user is hinted by a small blue rectangle of its presence. When the user places the mouse over the rectangle, it will show a dropdown list with the provided actions.

The ISmartTagAction interface has 2 important members:

  • DisplayText property, which is shown to the user in the dropdown list
  • Invoke method, which provides the actual implementation

Third Step: Check the Spelling

I remembered that one of the features of WPF is spellchecking and decided to investigate whether this would suffice. Although in WPF 4.0 support for custom words dictionaries has been added, it has been limited to English, French, German and Spanish. This wasn’t a problem as the Microsoft .NET naming guidelines state that identifiers should be declared in English.

However, the actual implementation was hidden (marked as internal), which I could either hack using reflection or using an actual WPF textbox underneath the covers. Tip: When you use the WPF textbox, the spellchecker disabled itself when the user uses a different input locale from the previously mentioned languages (+dialects). You can override this user setting to a specific language using an attached property “xml:lang” and an ISO 3166 code, e.g. “en-us” or “en-gb”.

I struggled with both strategies for a while, and eventually came to the conclusion that both were lacking what I was looking for... So I decided to look for alternatives and came across “Hunspell”, an open source spellchecking library, currently used by open office, Firefox, and others. It comes with a very large number of dictionaries and has thesaurus, hyphen and grammar functionality as well.

Unfortunately it is not exposed as com, so no standard PInvoke. Using some old school kernel32.dll trickery, we can load the library and unload it. But the real magic is that GetProcAddress combined with the Marshal.GetDelegateForFunctionPointer allows you to create a delegate which calls into the library. (For more details, check out the Hunspell namespace.)

Fourth Step: Getting the Bits to Check the Spelling Of

An interesting technique used by the spell checker created by Roman Golovin and Noah Richards. This spell checker is/will be part of the VS SDK and uses a service called the classifier. This mechanism is used by the language services to classify bits of text as to their contextual and generic meaning. My guess is that this is used by the syntax colorizing process.

Unfortunately, it did not provide any detail when a name was declared as opposed to used. So I decided to build my own C# scanner & simple parser (as this is my language of choice). Though the scanner and the lexical tokens are complete, the intention of the parser is only to correctly identify all different types of name declarations.

Screenshots

You can tell from this screenshot that the word “misspellled" (having 3 L’s) and Namespace have been marked. It also shows namespace, alias and delegate (with parameter and defaults) declarations.

SpellSharp1.jpg

By hovering over the smart tag indicator (blue rectangle), the drop down becomes visible.

SpellSharp2.jpg

The first word to be added to the custom words dictionary results in the file CustomWords.txt to be added to the solution in the standard folder “Solution items”.

SpellSharp3.jpg

An enumeration with attributes and comments:

SpellSharp4.jpg

An interface with various members:

SpellSharp5.jpg

A class with various members:

SpellSharp6.jpg

Known Bugs

  • No handling C# pre compiler directives (Fixed)

Future Versions

  • Including unit tests
  • Adding MS build task
  • Adding a VS command to scan all files in the solution akin to the find/replace method
  • Adding (T) SQL language support
  • Adding VB.NET language support
  • Adding F# language support
  • Using a managed spelling checker, maybe writing my own for fun
  • Suggestions or bugs?

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here