Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server / SQL-Server-2008

Regular Expressions in MS SQL Server 2005/2008

4.84/5 (22 votes)
11 Oct 2009CPOL11 min read 340.7K   9.1K  
describes how to create SQL Wrapper for Regular Expressions using SQL Server CLR Integration

Using of RegExpLike function to filter rows

Introduction

The Regular Expressions feature is available in MS SQL Server 2005/2008. You can use all .NET Framework Regular Expression stuff via MS SQL Server CLR integration.

This article describes how to create and use extensions for the LIKE (Transact-SQL) clause that supports Regular Expressions. Just for the demo, we also have created a text parser that extracts tokens from a text by a given Regular Expression pattern. Also, there is an overview of the namespaces and libraries required to compile database objects using Microsoft SQL Server integration with the aid of the .NET Framework Common Language Runtime (CLR).

Background

The stated material could be helpful if you know T-SQL and C#. In that case, you can make use of the extensive library functionality. If you have mastered T-SQL only (without C#), you may just use the RegExpLike function instead of the standard LIKE clause in places where Regular Expressions functionality is needed.

Performance

Refer to the MSDN site articles to be clear on the performance side:

Developers should view the CLR as an efficient alternative for logic that cannot be expressed declaratively in the query language. Regular Expressions, for instance.

Microsoft recommends to use CLR Integration in scenarios where CLR-based programming can complement the expressive power of the T-SQL query language. Such as a need for embedding procedural logic inside a query that can be called as a function (LIKE-function). This includes situations such as:

  • Performing complex calculations (that have to be expressed using procedural logic) on a per-row basis over values stored in database tables. This can involve sending the results of these calculations to the client, or using the calculations to filter the set of rows that are sent to the client.
  • Using procedural logic to evaluate tabular results that are then queried in the FROM clause of a SELECT or DML statement.

SQL Server 2000 introduced T-SQL functions (both scalar and table-valued) that enable these scenarios. With SQL Server 2005, these functions can be more easily written using CLR languages, since developers can take advantage of the much more extensive libraries in the .NET Framework. In addition, CLR programming languages provide rich data structures (such as arrays, lists, etc.) that are lacking in T-SQL, and can perform significantly better due to the different execution models of the CLR and T-SQL.

Functions are, in general, good candidates to be written using the CLR, since there is seldom a need to access the database from within a function: values from the database are usually passed as arguments. This then plays to the strength of the CLR, which is better at computational tasks than T-SQL.

Using the Code

Part 1. Extension of LIKE clause

At the beginning, you have to allow MS SQL Server to use CLR Integration, i.e., to make possible the usage of .NET assemblies and methods from them (by default, this possibility is disabled). To do this, use the following script:

SQL
sp_configure 'clr enabled', 1
GO
RECONFIGURE
GO

If you want to revert to the default state, run this script:

SQL
sp_configure 'clr enabled', 0
GO
RECONFIGURE
GO

From here, we create an assembly that is a wrapper for the Regular Expression .NET classes. To create the user defined function for MS SQL Server in C#/.NET, you just create a library project, you create a class, and you add public static methods that will be SQL functions in future. And, SqlFunctionAttribute must forestall each from these methods. It is used to mark the method definition of a user-defined aggregate as a function in SQL Server. For our RegularExpressionLike method, we have got a method as shown:

C#
/// <summary>
/// Class that allows to support regular expressions
/// in MS SQL Server 2005/2008
/// </summary>
public partial class SqlRegularExpressions
{
    /// <summary>
    /// Checks string on match to regular expression
    /// </summary>
    /// <param name="text">string to check</param>
    /// <param name="pattern">regular expression</param>
    /// <returns>true - text consists match one at least,
    ///           false - no matches</returns>
    [SqlFunction]
    public static bool Like(string text, string pattern)
    {
        Match match = Regex.Match(text, pattern);
        return (match.Value != String.Empty);
    }
    
    //...
}

The next step is assembly building. From now, you have to deploy a given assembly to MS SQL Server. To do this, run the next script (but you have to indicate the path to the assembly, for your machine):

SQL
CREATE ASSEMBLY 
--assembly name for references from SQL script
SqlRegularExpressions 
-- assembly name and full path to assembly dll,
-- SqlRegularExpressions in this case
from 'd:\Projects\SqlRegularExpressions\
        SqlRegularExpressions\bin\Release\SqlRegularExpressions.dll' 
WITH PERMISSION_SET = SAFE

Bingo! Your assembly is registered, and from now on, we may use its functionality. That is exactly what we plan do.

By the way, to revert this action, you can run the script as follows:

SQL
drop assembly 
--assembly name for references from SQL script
SqlRegularExpressions

To bind the assembly method with a SQL function, you have to run the script as shown:

SQL
--function signature
CREATE FUNCTION RegExpLike(@Text nvarchar(max), @Pattern nvarchar(255)) RETURNS BIT
--function external name
AS EXTERNAL NAME SqlRegularExpressions.SqlRegularExpressions.[Like]

And that is all. Now, you may use the RegExpLike function to check for the string matching pattern with the Regular Expression:

SQL
-- get all titles where title consists word that starts by 'A'
select * from titles
where 1 = dbo.RegExpLike(title, '\b(A\S+)')

The result of the above script running is the following:

Using of UDF RegExpLike to filter rows by regular expressions pattern

This is the function removing the script:

SQL
DROP FUNCTION RegExpLike

Part 2. Text Parsing

The next point is extraction of strings from a given text by using a Regular Expression pattern.

This is the task of getting a table, i.e., the result of function execution is a table with some data. Let it be the start index in the text, the length, and the value of the result string. Altogether, there are three columns in the result table. The CLR represents a streaming model for table-valued functions which ensures that results can be consumed immediately after the first row is available, instead of waiting for the entire table to be populated. It gives a lot of advantages for the end user and for performance, and it, of course, causes additional complexity in the implementation. User-defined table-valued functions require the implementation of two public static methods: one is the master method that is called by MS SQL Server and returns an objects enumeration (IEnumerable), and the other is the secondary method that is called by the first to fill the table rows. All will be clear if we look at the code below:

C#
/// <summary>
/// Class that allows to support regular expressions
/// in MS SQL Server 2005/2008
/// </summary>
public partial class SqlRegularExpressions
{
    // this is place of Like() method
    
    /// <summary>
    /// Gets matches from text using pattern
    /// </summary>
    /// <param name="text">text to parse</param>
    /// <param name="pattern">regular expression pattern</param>
    /// <returns>MatchCollection</returns>
    [SqlFunction(FillRowMethodName="FillMatch")]
    public static IEnumerable GetMatches(string text, string pattern)
    {
        return Regex.Matches(text, pattern);
    }

    /// <summary>
    /// Parses match-object and returns its parameters 
    /// </summary>
    /// <param name="obj">Match-object</param>
    /// <param name="index">TThe zero-based starting
    /// position in the original string where the captured
    ///     substring was found</param>
    /// <param name="length">The length of the captured substring.</param>
    /// <param name="value">The actual substring
    ///          that was captured by the match.</param>
    public static void FillMatch(object obj, out int index, 
                                 out int length, out SqlChars value)
    {
        Match match = (Match)obj;
        index = match.Index;
        length = match.Length;
        value = new SqlChars(match.Value);
    }
}

GetMatches returns all the matched elements as a MatchCollection object, and FillMatch is called for each object (Match) obtained before to determine the table row fields based on the data from it. We can see it from the list of parameters: first is the object reference and the rest is the variables marked by the out attribute. This 'rest' determines the possible columns' nature.

Be accurate! You have to indicate the Fill method name in the SqlFunction.FillRowMethodName property; on the other hand, it allows some flexibility.

Now, we build the assembly, register it in MS SQL Server again because its strong name was changed by the building, and create a target table-valued function:

SQL
CREATE FUNCTION 
--function signature
RegExpMatches(@text nvarchar(max), @pattern nvarchar(255))
RETURNS TABLE 
([Index] int, [Length] int, [Value] nvarchar(255))
AS 
--external name
EXTERNAL NAME SqlRegularExpressions.SqlRegularExpressions.GetMatches
GO

And now, we can extract the strings from some text by a certain pattern. For instance, let's get all the words from the text that starts with a lower-case 'a':

SQL
-- RegExpMatches sample
DECLARE @Text nvarchar(max);
DECLARE @Pattern nvarchar(255);
 
SET @Text = 
'This is comprehensive compendium provides a broad and thorough investigation of all '
+ 'aspects of programming with ASP.Net. Entirely revised and updated for the 2.0 '
+ 'Release of .Net, this book will give you the information you need to master ASP.Net '
+ 'and build a dynamic, successful, enterprise Web application.';
SET @Pattern = '\b(a\S+)';   --get all words that start from 'a'

select * from dbo.RegExpMatches(@Text, @Pattern)
GO

and you get the result below:

Using UDF RegExpMatches to parse text using given regular expressions pattern

Please give attention to that fact that the script does not return any 'ASP.NET' token because they start with upper-case 'A'. If you want to ignore case when matching occurs, you have to have a new function, or just add one more parameter to the Regex.Matches() method as shown:

C#
[SqlFunction(FillRowMethodName="FillMatch")]
public static IEnumerable GetMatches(string text, string pattern)
{
    return Regex.Matches(text, pattern, RegexOptions.IgnoreCase);
}

And now, RegExpMatches returns all 'a'-starting words including 'ASP.NET'.

This removes the script for the RegExpMatches function:

SQL
DROP FUNCTION RegExpMatches

SQL Regular Expressions, Version 2

The SqlServerProject download consists of a 'SQL Server Project' type solution instead of a previously created 'Class Library' type solution. It is a more natural type for developing objects for a database. There are a couple of files in the project. We use them because it was boring to do monotonous actions to drop/create assembly/function during the development in the 'Class Library' type project. The 'SQL Server Project' type assumes all these actions without any human participation. Also, its IDE consists of special menu items to perform database objects tasks. But, you can enjoy it if you only have VS Pro/Team.

Easy deploying. We have to do three steps to deploy Regular Expressions to our SQL Server:

  • Setup the project to reference the target database.
  • Build the project.
  • Deploy the assembly that supports Regular Expressions in SQL Server to our database engine.

And now, step by step:

To setup a project to reference a database where it would be good to use Regular Expressions power, we do this (see image below):

Steps to change referenced SQL Server project database

  1. We right click on the project in the Solution Explorer window.
  2. We choose the 'Properties' menu item. The assembly properties window will be expanded.
  3. On the left side, we choose a bookmark with a 'Database' title.
  4. We click on the 'Browse...' button. The 'Add database reference' window will appear.
  5. Next, we choose an existing database reference from the list or add new. To do this, we click on the 'Add New Reference...' button. The 'New Database Reference' dialog will be shown.
  6. We choose a server name and a database name from the respective dropdown lists.
  7. Then, we click 'OK' - and the 'Add New Reference' dialog will be closed.
  8. We click 'OK' again, the 'Add Database Reference' dialog will be closed and a new target for the assembly deploying is set.

Now it is time for the final actions:

  1. Build the project by selecting 'Build SqlRegularExpressions' from the 'Build' menu (we may skip this step if changes are absent, because the assembly is already compiled).
  2. Select 'Deploy SqlRegularExpressions' from the same 'Build' menu.

This is the result we get:

========== Deploy: 1 succeeded, 0 failed, 0 skipped ==========

In case you have not got Visual Studio IDE, you may install the Regular Expressions functionality as described in the 'Use of Code' section for the RegExpLike function. The assembly DLL is located in the 'Bin' folder of the SqlServerProject package. One more thing, this is the script for the RegexMatch function:

SQL
--function signature
CREATE FUNCTION RegexMatch(@Text nvarchar(max), 
      @Pattern nvarchar(255), @Options INT) RETURNS BIT
--function external name
AS EXTERNAL NAME SqlRegularExpressions.UserDefinedFunctions.RegexMatch

Documentation for Assembly Content

RegexMatch, RegexStrMatch

Searches the specified input string for the first occurrence of the Regular Expression specified in the Regex constructor, and returns true on success (RegexMatch), or the matched string (RegexStrMatch).

Syntax:

SQL
RegexMatch(<input>, <pattern>, <options>) RETURNS BIT

RegexStrMatch(<input>, <pattern>, <options>) 
RETURNS nvarchar(max)

Parameters

  • input The string to be tested for a match.
  • pattern The Regular Expression pattern to match.
  • options A bitwise OR combination of the enumeration values. See certain values for this parameter in the table below.

Return value

  • RegexMatch: True if inputted string matches the pattern, false otherwise.
  • RegexStrMatch: A Regular Expression matched string, or null if one does not exist.

Remarks

Values for the options parameter:

OptionsC# RegexOptions enum member analogueDescription
0 -or- nullNoneSpecifies that no options are set.
1IgnoreCaseSpecifies case-insensitive matching.
2MultilineMultiline mode. Changes the meaning of ^ and $ so they match at the beginning and end, respectively, of any line, and not just the beginning and end of the entire string.
4ExplicitCaptureSpecifies that the only valid captures are explicitly named or numbered groups of the form (?<name>...). This allows unnamed parentheses to act as non-capturing groups without the syntactic clumsiness of the expression (?:...).
8CompiledSpecifies that the Regular Expression is compiled to an assembly. This yields faster execution, but increases the startup time. This value should not be assigned to the Options property when calling the CompileToAssembly method.
16SinglelineSpecifies the single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n).
32IgnorePatternWhitespaceEliminates unescaped white space from the pattern, and enables comments marked with #. However, the IgnorePatternWhitespace value does not affect or eliminate white space in character classes.
64RightToLeftSpecifies that the search will be from right to left instead of from left to right.
256ECMAScriptEnables ECMAScript-compliant behavior for the expression. This value can be used only in conjunction with the IgnoreCase, Multiline, and Compiled values. The use of this value with any other value results in an exception.
512CultureInvariantSpecifies that cultural differences in language are ignored.

For more information, see the MSDN documentation for the Regex.Match method.

Examples

A. Following code gets all titles where the title consists of a word that starts with 'A' or 'a'.

SQL
select * from titles
where dbo.RegexMatch(title, '\b(A\S+)', 1) = 1

B. Sometimes it happens that you only need a certain part from a string. You can do this extraction using Transact-SQL, but you can use Regular Expressions also. The following code demonstrates how to order author names ('pubs' database) by last word in the surname, in case the surname consists of more than one word, for instance, 'del Castillo', 'de Balzak', 'al Mahdi', etc.

SQL
select au_id, (au_fname + ' ' + au_lname) au_name
from authors
order by
    dbo.RegexStrMatch(au_lname, '(\w+)', 64)

The result is a sorted list of authors where 'Innes del Castillo' precedes 'Michel DeFrance'.

Source code

C#
using System;                         //String
using System.Data.SqlTypes;           //SqlString, SqlInt32, SqlBoolean
using System.Text.RegularExpressions; //Match, Regex
using Microsoft.SqlServer.Server;     //SqlFunctionAttribute


public partial class UserDefinedFunctions
{
    /// <summary>
    /// Searches the input string for an occurrence
    /// of the regular expression supplied
    /// in a pattern parameter with matching options
    /// supplied in an options parameter.
    /// </summary>
    /// <param name="input">The string to be tested for a match.</param>
    /// <param name="pattern">The regular expression pattern to match.</param>
    /// <param name="options">A bitwise OR combination
    ///    of RegexOption enumeration values.</param>
    /// <returns>true - if inputted string matches
    /// to pattern, else - false</returns>
    /// <exception cref="System.ArgumentException">
    ///       Regular expression parsing error.</exception>
    [SqlFunction(Name="RegexMatch", IsDeterministic=true, IsPrecise=true)]
    public static SqlBoolean RegexMatch(SqlString input, 
                  SqlString pattern, SqlInt32 options)
    {
        if (input.IsNull)
        {
            return SqlBoolean.Null;
            //if input is NULL, return NULL
        }
        if (pattern.IsNull)
        {
            pattern = String.Empty;
            //""
        }
        if (options.IsNull)
        {
            options = 0;  //RegexOptions.None
        }

        try
        {
            Match match = Regex.Match((string)input, 
                          (string)pattern, (RegexOptions)(int)options);
            if (match.Value != String.Empty)
            {
                return SqlBoolean.True;
            }
        }
        catch 
        {
            throw;
        }

        return SqlBoolean.False;
    }
};

Could you please notify me if I have missed some functionality which you wish? Thank you!

Points of Interest

This is just a few of the methods that can be wrapped. I am ready to expand this 'SQL Regular Expression library'. I invite all to participate as specification writers. If you do not find what you expect to find here, you may always leave a message to me and I will do my best.

All comments, suggestions, and remarks are very welcome!

History

  • 30 September, 2009: Article was created.
  • 7 October, 2009: The 'Performance' section was added.
  • 8 October, 2009: SQL Regular Expressions, ver.2 (SQL Server project) was added.
  • 11 October, 2009: The 'RegexStrMatch' function was added.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)