Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

RegexTester

4.56/5 (8 votes)
20 Aug 2009CPOL6 min read 35.3K   309  
Presenting a test bed for .NET Regular Expressions
Image 1

Introduction

One of the best-known Regular Expression development and test systems is Expresso[^]; if you need to write Regular Expressions and haven't tried it, you should give it a try. I have tried it, but I have also decided that it does more than I need and I recently conceived of a handy feature that it doesn't offer.

This article is not an introduction to Regular Expressions and the application is not intended as a substitute for Expresso. You may find that you use Expresso for the initial development of your Regular Expressions, copy them to your program code, and thereafter use this application to test changes you make to the Regular Expressions.

Background

I don't consider myself a Regular Expressions expert, but I use them frequently enough that I spend more time than I like copying them from my source code, tweaking and testing them, and then copying them back and testing again. Because Regular Expressions often contain quotes and the backslash (\), part of the trouble is with escaping and unescaping such characters (this may be more of a problem for we C# developers than for VB developers).

Example:
 
A Regular Expression to match a sequence of alphanumeric characters within quotes:
"\w+"
 
When writing a string literal in C# to contain this value, you need to escape the quotes and backslash:

"\"\\w+\""

or, using a verbatim string literal:
@"""\w+"""
notice that the backslash no longer has to be escaped, and the escaping of the quotes changes.

In VB:
"""\w+"""

(I don't use VB, so please alert me about any errors in the VB code I present so I can fix them.)

I began writing this little utility application with the idea of allowing it to handle the unescaping automatically so I could copy my code right from the editing window to a TextBox in the tester without concern for removing the escapes and adding them back in again. That version worked well, but because I often write Regular Expressions that are long enough that I break them onto multiple lines, copying these to and from the tester still required more effort than I liked.

Example:

"(?:/\\s*(?'Switch'\\w+)(?:\\s*=\\s*(?:(?:\"(?'Value'[^\"]*)\")" +
"|(?:\\((?'Value'(?:(?:\"[^\"]*\")|[^\\)]*?)*)\\))|(?'Value'\\w+)))?)" +
"|(?:\"(?'Value'[^\"]*)\")|(?'Value'\\w+)"

Copying such a beast would still have to be done in several steps, and copying it back, ensuring that it's split at valid spots, is even more tedious. As I was working with the above Regular Expression the thought struck me -- what if I could copy the lines, complete with the surrounding quotes and concatenation operators (+) and have the test application pass it through the compiler?

Modes of operation

RegexTester currently has four modes for passing the entered (or pasted) text from the Regex TextBox to the Regex constructor:

AsIs
The text will be passed unchanged.
Unescape
Escapes will be removed by the very simple, non-foolproof, technique of replacing "" with ", \\ with \, and \" with ".
CSharp
The text will be passed through the C# compiler.
VisualBasic
The text will be passed through the VisualBasic compiler.

Whitespace

I also realized that, when viewing the results of a Regex Match, I desired the ability to see any whitespace characters produced. So I added the Show Whitespace checkbox; when it is checked, RegexTester will replace whitespace characters in the results with an appropriate character. To accomplish this, I chose to use the Arial Unicode MS typeface. This typeface contains glyphs (graphemes?) that represent alphanumeric characters inside circles; I chose characters based on the characters they represent:

S
Space (' ')
0
Null ('\0')
A
Alert (Bell) ('\a')
B
Backspace ('\b')
F
Formfeed ('\f')
N
Newline (Linefeed) ('\n')
R
Return (Carriage Return) ('\r')
T
Tab (Horizontal Tab) ('\t')
V
Vertical Tab ('\v')

How it's done

There's really not much involved. Only three event handlers are implemented:
frmRegexTester_Load -- loads the last saved state from an XML file (if any).
frmRegexTester_FormClosing -- stores the current state as XML in "%USERPROFILE%\\Local Settings\\RegexTester.xml".
bGetResult_Click -- initiates the GetResult process.

Because GetResult can be a lengthy process, the method is executed on a different thread and the Get result button will be disabled until the method is complete. Consequently, there are a number of plumbing methods and delegates involved to avoid cross-thread exceptions; I won't document them here though they may be of interest to any who need to see how that can be done.

Actors in supporting roles

Here are a few things that are essential to the GetResult method:

private enum Mode
{
    AsIs
,
    Unescape
,
    CSharp
,
    VisualBasic
}
 
 
private static readonly System.Collections.Generic.Dictionary<Mode,string> templates ;
 
private static readonly System.Collections.Generic.Dictionary<char,char> whitespace ;
 
private sealed class GetResultControlBlock
{
    public readonly System.Windows.Forms.Control                Control        ;
    public readonly System.Text.RegularExpressions.RegexOptions Options        ;
    public readonly Mode                                        Mode           ;
    public readonly bool                                        ShowWhitespace ;
    public readonly string                                      Regex          ;
    public readonly string                                      Input          ;
    public readonly EnableDelegate                              Enable         ;
    public readonly AppendResultDelegate                        Result         ;
 
    public GetResultControlBlock
    (
        System.Windows.Forms.Control                Control
    ,
        System.Text.RegularExpressions.RegexOptions Options
    ,
        Mode                                        Mode
    ,
        bool                                        ShowWhitespace
    ,
        string                                      Regex
    ,
        string                                      Input
    ,
        EnableDelegate                              Enable
    ,
        AppendResultDelegate                        Result
    )
    {
        this.Control        = Control        ;
        this.Options        = Options        ;
        this.Mode           = Mode           ;
        this.ShowWhitespace = ShowWhitespace ;
        this.Regex          = Regex          ;
        this.Input          = Input          ;
        this.Enable         = Enable         ;
        this.Result         = Result         ;
 
        return ;
    }
}
  
static frmRegexTester
(
)
{
    templates = new System.Collections.Generic.Dictionary<Mode,string>() ;
 
    templates [ Mode.CSharp ] =
        @"
        namespace TextWrapper
        {{
            public static class TextWrapper
            {{
                public const string Text = {0} ;
            }}
        }}
        " ;
 
    templates [ Mode.VisualBasic ] =
        @"
        Namespace TextWrapper
            Public Class TextWrapper
                Public Shared Text As String = {0}
            End Class
        End Namespace
        " ;
 
    whitespace = new System.Collections.Generic.Dictionary<char,char>() ;
 
    whitespace [ ' '  ] = 'Γôê' ;
    whitespace [ '\0' ] = 'Γô¬' ;
    whitespace [ '\a' ] = 'Ⓐ' ;
    whitespace [ '\b' ] = 'Ⓑ' ;
    whitespace [ '\f' ] = 'Ⓕ' ;
    whitespace [ '\n' ] = 'Γôâ' ;
    whitespace [ '\r' ] = 'Γôç' ;
    whitespace [ '\t' ] = 'Γôë' ;
    whitespace [ '\v' ] = 'Γôï' ;
 
    return ;
}

No, those characters don't display in my browser either, but they do (or should) in the application when using the Arial Unicode MS typeface (see the screenshot above).

To add a language; add an entry to the Mode enumeration and an entry to the templates Dictionary.

The whitespace Dictionary may also be modified to suit.

WrapText

This method:

  1. Uses the Language parameter (the selected Mode) to select a template from the template Dictionary.
  2. Wraps the provided Text in the template.
  3. Compiles the wrapped Text with the requested compiler. I have a separate article on how that's done.[^]
  4. Returns the value of the Text after it has been processed by the compiler.
private static string
WrapText
(
    string Text
,
    Mode   Language
)
{
    string code = System.String.Format
    (
        templates [ Language ]
    ,
        Text
    ) ;

    System.Reflection.Assembly assm = PIEBALD.Lib.LibSys.Compile
    (
        code
    ,
        Language.ToString()
    ) ;

    System.Type type = assm.GetType ( "TextWrapper.TextWrapper" ) ;

    System.Reflection.FieldInfo field = type.GetField
    (
        "Text"
    ,
        System.Reflection.BindingFlags.Public
        |
        System.Reflection.BindingFlags.Static
    ) ;

    return ( (System.String) field.GetValue ( null ) ) ;
}

ReplaceWhitespace

This method simply replaces characters according to the contents of the whitespace Dictionary.

private static string
ReplaceWhitespace
(
    string Text
)
{
    System.Text.StringBuilder result =
        new System.Text.StringBuilder ( Text.Length ) ;

    foreach  (char ch in Text )
    {
        if ( whitespace.ContainsKey ( ch ) )
        {
            result.Append ( whitespace [ ch ] ) ;
        }
        else
        {
            result.Append ( ch ) ;
        }
    }

    return ( result.ToString() ) ;
}

GetResult

The GetResult method is fairly straight-forward:

  1. Disable the Control that was passed in (the Get result button).
  2. Modify the Regex text according to the selected Mode.
  3. Instantiate a Regex using the Regex text and selected RegexOptions.
  4. Ask the Regex if the provided Input text matches the Regex.
  5. If there are no Matches, say so.
  6. If there are Matches, then enumerate them and their Groups, displaying whitespace characters if requested.
  7. If an Exception is thrown, then display it instead.
  8. If the Exception contains a System.CodeDom.Compiler.CompilerErrorCollection then enumerate that as well.
  9. Enable the Control that was passed in (the Get result button) even if an error occurs.
private static void
GetResult
(
    object ControlBlock
)
{
    GetResultControlBlock controlblock =
        ControlBlock as GetResultControlBlock ;
 
    try
    {
        controlblock.Enable ( controlblock.Control , false ) ;
 
        string regtext = controlblock.Regex ;
 
        switch ( controlblock.Mode )
        {
            case Mode.AsIs :
            {
                break ;
            }
 
            case Mode.Unescape :
            {
                /* replace "" with ", \\ with \, and \" with " */
                regtext = regtext.Replace ( "\"\"" , "\"" ) ;
                regtext = regtext.Replace ( "\\\\" , "\\" ) ;
                regtext = regtext.Replace ( "\\\"" , "\"" ) ;
 
                break ;
            }
 
            default :
            {
                regtext = WrapText ( regtext , controlblock.Mode ) ;
 
                break ;
            }
        }
 
        System.Text.RegularExpressions.Regex reg =
            new System.Text.RegularExpressions.Regex
            (
                regtext
            ,
                controlblock.Options
            ) ;
 
        System.Text.RegularExpressions.MatchCollection matches =
            reg.Matches ( controlblock.Input ) ;
 
        if ( matches.Count == 0 )
        {
            controlblock.Result ( "No matches" ) ;
        }
        else
        {
            foreach
            (
                System.Text.RegularExpressions.Match mat
            in
                matches
            )
            {
                controlblock.Result ( "Match: " ) ;
 
                controlblock.Result
                (
                    controlblock.ShowWhitespace ?
                        ReplaceWhitespace ( mat.Value ) :
                        mat.Value
                ) ;
 
                foreach ( string grp in reg.GetGroupNames() )
                {
                    controlblock.Result ( System.String.Format
                    (
                        "\r\n    Group {0}: "
                    ,
                        grp
                    ) ) ;
 
                    controlblock.Result
                    (
                        controlblock.ShowWhitespace ?
                            ReplaceWhitespace ( mat.Groups [ grp ].Value ) :
                            mat.Groups [ grp ].Value
                    ) ;
                }
 
                controlblock.Result ( "\r\n\r\n" ) ;
            }
        }
    }
    catch ( System.Exception err )
    {
        controlblock.Result ( err.ToString() ) ;
 
        if ( err.Data.Contains ( "Errors" ) )
        {
            System.CodeDom.Compiler.CompilerErrorCollection errors =
                err.Data [ "Errors" ] as
                System.CodeDom.Compiler.CompilerErrorCollection ;
 
            if ( errors != null )
            {
                foreach
                (
                    System.CodeDom.Compiler.CompilerError error
                in
                    errors
                )
                {
                    controlblock.Result ( "\r\n" + error.ToString() ) ;
                }
            }
        }
    }
    finally
    {
        controlblock.Enable ( controlblock.Control , true ) ;
    }
 
    return ;
}

Using the code

The ZIP file contains the source code, along with a BAT file to compile it into a .NET 2.0 WinExe. You may need to adjust the paths in the BAT file for your system; I'm running WinXP SP3 with .net 3.5.

You are welcome to use the code in a development environment of your own choosing, but as I don't know what that is, and it may not even exist at the time of this writing, I can't help you with that.

The ZIP also contains an installer (MSI) in case you just wish to use the application; the executable in the MSI targets .net 2.0.

Conclusion

This application neatly solves my problem with copying Regular Expressions between a code editing window and a Regex testing utility. It offers various modes for passing the text to the Regex constructor. It has the ability to display whitespace in the results. It persists its state.

I consider this to be version 0.0 of this application; I've only been working on it for a week. I'd like some feedback on usability and bugs; glowing testimonials are also welcome. Additional features don't come readily to mind (other than to expand Help (F1)) and I may not add any, but if you have suggestions, please post them.

History

2009-08-19 First submitted

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)