Introduction
The Regular Expressions feature is available in MS SQL Server 2005/2008. You can use all .NET Framework Regular Expression stuff via MS SQL Server CLR integration.
This article describes how to create and use extensions for the LIKE
(Transact-SQL) clause that supports Regular Expressions. Just for the demo, we also have created a text parser that extracts tokens from a text by a given Regular Expression pattern. Also, there is an overview of the namespaces and libraries required to compile database objects using Microsoft SQL Server integration with the aid of the .NET Framework Common Language Runtime (CLR).
Background
The stated material could be helpful if you know T-SQL and C#. In that case, you can make use of the extensive library functionality. If you have mastered T-SQL only (without C#), you may just use the RegExpLike
function instead of the standard LIKE
clause in places where Regular Expressions functionality is needed.
Performance
Refer to the MSDN site articles to be clear on the performance side:
Developers should view the CLR as an efficient alternative for logic that cannot be expressed declaratively in the query language. Regular Expressions, for instance.
Microsoft recommends to use CLR Integration in scenarios where CLR-based programming can complement the expressive power of the T-SQL query language. Such as a need for embedding procedural logic inside a query that can be called as a function (LIKE-function). This includes situations such as:
- Performing complex calculations (that have to be expressed using procedural logic) on a per-row basis over values stored in database tables. This can involve sending the results of these calculations to the client, or using the calculations to filter the set of rows that are sent to the client.
- Using procedural logic to evaluate tabular results that are then queried in the
FROM
clause of a SELECT
or DML statement.
SQL Server 2000 introduced T-SQL functions (both scalar and table-valued) that enable these scenarios. With SQL Server 2005, these functions can be more easily written using CLR languages, since developers can take advantage of the much more extensive libraries in the .NET Framework. In addition, CLR programming languages provide rich data structures (such as arrays, lists, etc.) that are lacking in T-SQL, and can perform significantly better due to the different execution models of the CLR and T-SQL.
Functions are, in general, good candidates to be written using the CLR, since there is seldom a need to access the database from within a function: values from the database are usually passed as arguments. This then plays to the strength of the CLR, which is better at computational tasks than T-SQL.
Using the Code
Part 1. Extension of LIKE clause
At the beginning, you have to allow MS SQL Server to use CLR Integration, i.e., to make possible the usage of .NET assemblies and methods from them (by default, this possibility is disabled). To do this, use the following script:
sp_configure 'clr enabled', 1
GO
RECONFIGURE
GO
If you want to revert to the default state, run this script:
sp_configure 'clr enabled', 0
GO
RECONFIGURE
GO
From here, we create an assembly that is a wrapper for the Regular Expression .NET classes. To create the user defined function for MS SQL Server in C#/.NET, you just create a library project, you create a class, and you add public static
methods that will be SQL functions in future. And, SqlFunctionAttribute
must forestall each from these methods. It is used to mark the method definition of a user-defined aggregate as a function in SQL Server. For our RegularExpressionLike
method, we have got a method as shown:
public partial class SqlRegularExpressions
{
[SqlFunction]
public static bool Like(string text, string pattern)
{
Match match = Regex.Match(text, pattern);
return (match.Value != String.Empty);
}
}
The next step is assembly building. From now, you have to deploy a given assembly to MS SQL Server. To do this, run the next script (but you have to indicate the path to the assembly, for your machine):
CREATE ASSEMBLY
SqlRegularExpressions
from 'd:\Projects\SqlRegularExpressions\
SqlRegularExpressions\bin\Release\SqlRegularExpressions.dll'
WITH PERMISSION_SET = SAFE
Bingo! Your assembly is registered, and from now on, we may use its functionality. That is exactly what we plan do.
By the way, to revert this action, you can run the script as follows:
drop assembly
SqlRegularExpressions
To bind the assembly method with a SQL function, you have to run the script as shown:
CREATE FUNCTION RegExpLike(@Text nvarchar(max), @Pattern nvarchar(255)) RETURNS BIT
AS EXTERNAL NAME SqlRegularExpressions.SqlRegularExpressions.[Like]
And that is all. Now, you may use the RegExpLike
function to check for the string matching pattern with the Regular Expression:
select * from titles
where 1 = dbo.RegExpLike(title, '\b(A\S+)')
The result of the above script running is the following:
This is the function removing the script:
DROP FUNCTION RegExpLike
Part 2. Text Parsing
The next point is extraction of strings from a given text by using a Regular Expression pattern.
This is the task of getting a table, i.e., the result of function execution is a table with some data. Let it be the start index in the text, the length, and the value of the result string. Altogether, there are three columns in the result table. The CLR represents a streaming model for table-valued functions which ensures that results can be consumed immediately after the first row is available, instead of waiting for the entire table to be populated. It gives a lot of advantages for the end user and for performance, and it, of course, causes additional complexity in the implementation. User-defined table-valued functions require the implementation of two public static
methods: one is the master method that is called by MS SQL Server and returns an objects enumeration (IEnumerable
), and the other is the secondary method that is called by the first to fill the table rows. All will be clear if we look at the code below:
public partial class SqlRegularExpressions
{
[SqlFunction(FillRowMethodName="FillMatch")]
public static IEnumerable GetMatches(string text, string pattern)
{
return Regex.Matches(text, pattern);
}
public static void FillMatch(object obj, out int index,
out int length, out SqlChars value)
{
Match match = (Match)obj;
index = match.Index;
length = match.Length;
value = new SqlChars(match.Value);
}
}
GetMatches
returns all the matched elements as a MatchCollection
object, and FillMatch
is called for each object (Match
) obtained before to determine the table row fields based on the data from it. We can see it from the list of parameters: first is the object reference and the rest is the variables marked by the out
attribute. This 'rest' determines the possible columns' nature.
Be accurate! You have to indicate the Fill method name in the SqlFunction.FillRowMethodName
property; on the other hand, it allows some flexibility.
Now, we build the assembly, register it in MS SQL Server again because its strong name was changed by the building, and create a target table-valued function:
CREATE FUNCTION
RegExpMatches(@text nvarchar(max), @pattern nvarchar(255))
RETURNS TABLE
([Index] int, [Length] int, [Value] nvarchar(255))
AS
EXTERNAL NAME SqlRegularExpressions.SqlRegularExpressions.GetMatches
GO
And now, we can extract the strings from some text by a certain pattern. For instance, let's get all the words from the text that starts with a lower-case 'a':
DECLARE @Text nvarchar(max);
DECLARE @Pattern nvarchar(255);
SET @Text =
'This is comprehensive compendium provides a broad and thorough investigation of all '
+ 'aspects of programming with ASP.Net. Entirely revised and updated for the 2.0 '
+ 'Release of .Net, this book will give you the information you need to master ASP.Net '
+ 'and build a dynamic, successful, enterprise Web application.';
SET @Pattern = '\b(a\S+)';
select * from dbo.RegExpMatches(@Text, @Pattern)
GO
and you get the result below:
Please give attention to that fact that the script does not return any 'ASP.NET' token because they start with upper-case 'A'. If you want to ignore case when matching occurs, you have to have a new function, or just add one more parameter to the Regex.Matches()
method as shown:
[SqlFunction(FillRowMethodName="FillMatch")]
public static IEnumerable GetMatches(string text, string pattern)
{
return Regex.Matches(text, pattern, RegexOptions.IgnoreCase);
}
And now, RegExpMatches
returns all 'a'-starting words including 'ASP.NET'.
This removes the script for the RegExpMatches
function:
DROP FUNCTION RegExpMatches
SQL Regular Expressions, Version 2
The SqlServerProject download consists of a 'SQL Server Project' type solution instead of a previously created 'Class Library' type solution. It is a more natural type for developing objects for a database. There are a couple of files in the project. We use them because it was boring to do monotonous actions to drop/create assembly/function during the development in the 'Class Library' type project. The 'SQL Server Project' type assumes all these actions without any human participation. Also, its IDE consists of special menu items to perform database objects tasks. But, you can enjoy it if you only have VS Pro/Team.
Easy deploying. We have to do three steps to deploy Regular Expressions to our SQL Server:
- Setup the project to reference the target database.
- Build the project.
- Deploy the assembly that supports Regular Expressions in SQL Server to our database engine.
And now, step by step:
To setup a project to reference a database where it would be good to use Regular Expressions power, we do this (see image below):
- We right click on the project in the Solution Explorer window.
- We choose the 'Properties' menu item. The assembly properties window will be expanded.
- On the left side, we choose a bookmark with a 'Database' title.
- We click on the 'Browse...' button. The 'Add database reference' window will appear.
- Next, we choose an existing database reference from the list or add new. To do this, we click on the 'Add New Reference...' button. The 'New Database Reference' dialog will be shown.
- We choose a server name and a database name from the respective dropdown lists.
- Then, we click 'OK' - and the 'Add New Reference' dialog will be closed.
- We click 'OK' again, the 'Add Database Reference' dialog will be closed and a new target for the assembly deploying is set.
Now it is time for the final actions:
- Build the project by selecting 'Build SqlRegularExpressions' from the 'Build' menu (we may skip this step if changes are absent, because the assembly is already compiled).
- Select 'Deploy SqlRegularExpressions' from the same 'Build' menu.
This is the result we get:
========== Deploy: 1 succeeded, 0 failed, 0 skipped ==========
In case you have not got Visual Studio IDE, you may install the Regular Expressions functionality as described in the 'Use of Code' section for the RegExpLike
function. The assembly DLL is located in the 'Bin' folder of the SqlServerProject package. One more thing, this is the script for the RegexMatch
function:
CREATE FUNCTION RegexMatch(@Text nvarchar(max),
@Pattern nvarchar(255), @Options INT) RETURNS BIT
AS EXTERNAL NAME SqlRegularExpressions.UserDefinedFunctions.RegexMatch
Documentation for Assembly Content
RegexMatch, RegexStrMatch
Searches the specified input string for the first occurrence of the Regular Expression specified in the Regex constructor, and returns true on success (RegexMatch
), or the matched string (RegexStrMatch
).
Syntax:
RegexMatch(<input>, <pattern>, <options>) RETURNS BIT
RegexStrMatch(<input>, <pattern>, <options>)
RETURNS nvarchar(max)
Parameters
- input The string to be tested for a match.
- pattern The Regular Expression pattern to match.
- options A bitwise OR combination of the enumeration values. See certain values for this parameter in the table below.
Return value
RegexMatch
: True if inputted string matches the pattern, false otherwise.RegexStrMatch
: A Regular Expression matched string, or null
if one does not exist.
Remarks
Values for the options
parameter:
Options | C# RegexOptions enum member analogue | Description |
---|
0 -or- null | None | Specifies that no options are set. |
1 | IgnoreCase | Specifies case-insensitive matching. |
2 | Multiline | Multiline mode. Changes the meaning of ^ and $ so they match at the beginning and end, respectively, of any line, and not just the beginning and end of the entire string. |
4 | ExplicitCapture | Specifies that the only valid captures are explicitly named or numbered groups of the form (?<name>...). This allows unnamed parentheses to act as non-capturing groups without the syntactic clumsiness of the expression (?:...). |
8 | Compiled | Specifies that the Regular Expression is compiled to an assembly. This yields faster execution, but increases the startup time. This value should not be assigned to the Options property when calling the CompileToAssembly method. |
16 | Singleline | Specifies the single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n). |
32 | IgnorePatternWhitespace | Eliminates unescaped white space from the pattern, and enables comments marked with #. However, the IgnorePatternWhitespace value does not affect or eliminate white space in character classes. |
64 | RightToLeft | Specifies that the search will be from right to left instead of from left to right. |
256 | ECMAScript | Enables ECMAScript-compliant behavior for the expression. This value can be used only in conjunction with the IgnoreCase , Multiline , and Compiled values. The use of this value with any other value results in an exception. |
512 | CultureInvariant | Specifies that cultural differences in language are ignored. |
For more information, see the MSDN documentation for the Regex.Match
method.
Examples
A. Following code gets all titles where the title consists of a word that starts with 'A' or 'a'.
select * from titles
where dbo.RegexMatch(title, '\b(A\S+)', 1) = 1
B. Sometimes it happens that you only need a certain part from a string. You can do this extraction using Transact-SQL, but you can use Regular Expressions also. The following code demonstrates how to order author names ('pubs' database) by last word in the surname, in case the surname consists of more than one word, for instance, 'del Castillo', 'de Balzak', 'al Mahdi', etc.
select au_id, (au_fname + ' ' + au_lname) au_name
from authors
order by
dbo.RegexStrMatch(au_lname, '(\w+)', 64)
The result is a sorted list of authors where 'Innes del Castillo' precedes 'Michel DeFrance'.
Source code
using System;
using System.Data.SqlTypes;
using System.Text.RegularExpressions;
using Microsoft.SqlServer.Server;
public partial class UserDefinedFunctions
{
[SqlFunction(Name="RegexMatch", IsDeterministic=true, IsPrecise=true)]
public static SqlBoolean RegexMatch(SqlString input,
SqlString pattern, SqlInt32 options)
{
if (input.IsNull)
{
return SqlBoolean.Null;
}
if (pattern.IsNull)
{
pattern = String.Empty;
}
if (options.IsNull)
{
options = 0;
}
try
{
Match match = Regex.Match((string)input,
(string)pattern, (RegexOptions)(int)options);
if (match.Value != String.Empty)
{
return SqlBoolean.True;
}
}
catch
{
throw;
}
return SqlBoolean.False;
}
};
Could you please notify me if I have missed some functionality which you wish? Thank you!
Points of Interest
This is just a few of the methods that can be wrapped. I am ready to expand this 'SQL Regular Expression library'. I invite all to participate as specification writers. If you do not find what you expect to find here, you may always leave a message to me and I will do my best.
All comments, suggestions, and remarks are very welcome!
History
- 30 September, 2009: Article was created.
- 7 October, 2009: The 'Performance' section was added.
- 8 October, 2009: SQL Regular Expressions, ver.2 (SQL Server project) was added.
- 11 October, 2009: The '
RegexStrMatch
' function was added.