Introduction
Regular Expressions are an immensely powerful tool in a developer�s arsenal. The flavour of regexs and the classes surrounding them that are available in the .NET Framework is, in my humble opinion, excellent. One of the most useful methods of all the methods available on all the types in the System.Text.RegularExpressions
namespace is the static CompileToAssembly
method of the Regex
class. The CompileToAssembly
method allows you to compile a regex to a standalone assembly - not much good you might think. Well, ponder on the problem I was left with:
The Problem
I was developing, actually I still am developing, an application that does quite a bit of text processing and makes a lot of use of regular expressions. Now regular expressions are complex, mysterious creatures. You may think you know lots about them, but trust me, you never know as much as you can until you have read Mastering Regular Expressions, 2nd Edition [^] a few hundred times :-).
I don't believe for one second that a regular expression that I write will be efficient the first time around. Sure, it will do exactly what I want it to do, but will it do it in as efficient a way as possible? Probably not. So, I needed to make extensive use of regular expressions and I knew that I didn't have the time to spend, or rather didn't want to spend the time, measuring the performance of every regex I wrote straightaway and then spending lots of time making it as efficient as possible.... so I was left with a problem: how do I easily replace the regexs within my application with new, more efficient regexs when I needed them? I didn't want to move them to a config file, as they are too easy to modify by the end user, and loading and parsing XML is, relatively speaking, a slow enough operation.
The Solution
A "Regular Expression Library," CompileToAssembly
came to my rescue. I wrote a tool to quickly allow me to easily add multiple regular expressions to a "Regular Expression Library." I can create new libraries, load and modify existing ones, quickly add, remove, modify regular expressions and then redistribute the "Regular Expression Library" assembly via an auto-update to the application. The "parent" application can get new �ber-efficient regular expressions and continue working without any user intervention if needs be.
Figure 1: The RegexLibrary Builder Main Screen
The RegexLibrary Builder
The RegexLibrary builder allows you to:
- Create CLS-compliant Regular Expression Libraries: .NET assemblies that contain only regular expressions.
- Add multiple regular expressions to a single assembly.
- Define individual names, namespaces, regex modifiers and accessibility levels on a per regex basis.
- Reload existing Regular Expression Libraries and add, remove or modify regular expressions contained within.
- Manually set the version number of the assembly to help ensure compatibility with existing versions.
- Much, much, more... ;-)
Seriously though, I wrote this tool because I really needed it. I'm not sure if other people will, but if you do, you can download it from above. Remember that this tool was put together quickly and for my own use, so don't expect any fancy exception handling, nice coding patterns, comments, unit tests or anything like that. ;-) It will have bugs... I'd appreciate it if you could leave a comment here or contact me to let me know if/when you find one/many. :-)
If people are interested in the tool, I will develop it further and add things such as the ability to sign the Regular Expression Libraries that have been created, the ability to add custom attributes on a per regex basis, add some exception handling, code comments and so on. Also, if you use the tool, let me know! It's always good to know if something you've created for your own use is useful for others too.
Creating and Modifying a Regular Expression Library
The RegexLibrary Builder is an easy-to-use tool and most people should be able to figure it out without much difficulty. However, just in case, I'll write a quick (very quick) overview of how to create a Regular Expression Library and then how to open it again and modify it.
Creating a Regular Expression Library
- Open up the RegexLibrary Builder application... obviously. ;-)
Steps 2 and 3 are interchangeable - i.e. order doesn't matter.
- Fill in the assembly details (Figure 2). You can use the "..." button to select the location where the Regular Expression Library will be saved to.
Figure 2: Fill in the Assembly Details
- Create a new regular expression to add to the Regular Expression Library by filling in the details in the regex group box and then clicking on "Add" (Figure 3). Check out the Regulator [^] and RegexDesigner.NET [^], two very cool regular expression testing and learning tools.
Figure 3: Add a Regular Expression to the Regular Expression Library
The regex will now appear in the list of regular expressions in the list box at the bottom of the application.
- Finally, click on the "Save" button on the toolbar or select "Save Regex Library" from the "File" menu.
Modifying a Regular Expression Library
- Load the Regular Expression Library by clicking on the "Open" button on the toolbar or by selecting "Load Regex Library" from the "File" menu.
- When the library loads, all the regular expressions contained within will be displayed in the list box at the bottom of the application (Figure 4).
Figure 4: The Regular Expressions Loaded from a Regular Expression Library
- When you click on a regular expression in the list, its details will be filled in in the regex group box. To modify the regex, simply change the details and click on "Add" again. You can delete a regex by highlighting it and then clicking on the "Delete" button, clicking on "Delete" on the toolbar or by selecting "Delete Regex" on the "Regex" menu.
- When you are happy with the changes you've made, you can save the changes to the Regular Expression Library by clicking on the "Save" button on the toolbar or selecting "Save Regex Library" from the "File" menu.
Using a Regular Expression Library
Once you have created a Regular Expression Library, using it in one of your applications is straightforward:
- Add a reference to the assembly, as you would any other assembly.
- Add the
using
statement for the namespace of the regex you want. Remember that you can set the namespace on a per regex basis if you like.
- Use the regex as you would the
Regex
class:
bool result = DutchPostCode.IsMatch("some text");
MatchCollection matches = DutchPostCode.Matches(new
System.IO.StreamReader(@"c:\DATA.txt").ReadToEnd()));
The Source
The source is straightforward. Nothing complicated. As I mentioned above, I wrote this tool quickly and for myself. I didn't expect to release the source, so you won't find any fancy exception handling, nice coding patterns, comments, unit tests or anything like that. ;-) It will have bugs... I'd appreciate it if you could leave a comment here or contact me to let me know if/when you find one/many. :-)
The interesting source code file is RegexLibraryBuilder.cs, which contains the RegexLibraryBuilder
class and the nested RegexLibraryLoader
class. There is also a strongly typed RegexCompilationInfo
collection class (cleverly named RegexCompilationInfoCollection
;-) with your boiler plate strongly typed collection code. The RegexLibraryBuilderForm
class is the implementation of the UI and, before someone points it out, I know that it isn't very modular or well-designed. I should have moved certain aspects to a custom control and provided property access to certain things, as well as moved certain things out to separate classes. However, the app was thrown together quickly.
Links
Two very cool regular expression testing and learning tools:
Other useful links:
- RegexLib [^], an online library of regular expressions: I grabbed the three regexs in the screen shot in Figure 1 from there.
- Regular Expressions [^] on MSDN
- Original posting of the RegexLibrary Builder on my blog [^]
History
- June 14th, 2005 - Added support for validating regexs before saving them: now you cannot save invalid regexs.
- June 8th, 2005 - Updated formatting.
- June 6th, 2005 - First posted.