Introduction
Quite often, SQL Server developers need to execute long running SQLs such as updates, inserts, bulk inserts, index creation, and other non-query types of T-SQL. At the end, the total execution time will be a sum of each SQL’s execution time. With such powerful multi-processors, SQL Server developers do not utilize that power. This is where multi-tasking comes in place. SQL Server 2005 and up provides database developers with one of the tools to execute SQL asynchronously- a Service Broker. It has its purpose, but sometimes, it does not solve the problem. A Service Broker is a messaging framework, and if a developer needs to wait until all parallel processes are complete in order to move to another task in the T-SQL application, some extra work has to be done to implement this: wait till all the messages are processed, place the next step in the Service Broker queue, and other techniques. I am not going to elaborate on multiple ways to accomplish this with a Service Broker, instead, I will present you with a different way to execute multiple T-SQLs at the same time.
Background
Let's say you are developing an ETL process which has to bulk load multiple tables, create a number of indexes, move data from one database/server to another, and your process has to move a lot of rows. And, at the end, your process runs hours and hours, even though you tuned up all the indexes and optimized the infrastructure as much as you can. But, what if, let's say, you have to copy 10 tables, and you will copy all of them at the same time. You will save a lot of time if you do this in parallel. And, here is a solution for that - a very simple one, and does not involve messing up with a Service Broker and synchronizing the sequence of your SQL. After you copy all the 10 tables, you have to create indexes on each table you copied. So, you have two major steps in this simple project:
- Step 1 - Copy tables
- Step 2 - Create indexes
I will show in this example below how it can be done using Stored Procedures from the deployment script attached.
Using the code
The attached deployment script has SQL CLR Stored Procedures and user defined functions which implement parallel SQL execution.
- Parallel_Declare - Declares a parallel block. This procedure expects a parameter
@name.
The parameter is for information only, in case you have to get an error message. - Parallel_UseTransaction - Indicates that all SQL in the parallel block will run under a single distributed transaction. This procedure expects a parameter
@mode
. This parameter can have varchar(20)
values: Serializable
, RepeatableRead
, ReadCommitted
, ReadUncommitted
, Snapshot
, Chaos
. - Parallel_SetOption_CommandTimeout - Sets a SQL execution timeout. The parameter is a
@commandTimeout
of type int
. The default value is 120 seconds. If set to 0, the execution of any SQL in the parallel block will not timeout. - Parallel_SetOption_MaxThreads - Sets the maximum number of threads to be executed at the same time. Even though you can add to the parallel block unlimited number of SQL to run in parallel, only the number of SQLs identified by this option will be executed at the same time. Others will wait in the threadpool queue. The parameter is
@maxThreads
of type int
. The default value is 10. The valid values are from 1 to 64. - Parallel_AddSql - This procedure adds a SQL to the parallel block to be executed in parallel. This procedure does not execute a SQL. It just adds it to the list. Parameters are:
nvarchar(128)
@key
, and nvarchar(MAX)
@sql
, where @key
is a unique key identifying the SQL in the array of SQLs to be executed, and @sql
is the actual SQL to be executed. The SQL can be any T-SQL, but it can not contain GO
statements. - Parallel_Execute - This procedure actually starts the execution of all SQLs added to a parallel block. The procedure has no parameters, but it returns these values: 0 - successful execution, or a non-negative value of the count of how many threads have failed. The procedure will not crash if any SQL failed to execute; instead, it will return a non-zero value. If you called
Parallel_UseTransaction
for the current block execution, the results from all the SQLs will be committed in the case of success, or all will be rolled-back in the case of any SQL failure. Here is a major point of all these procedures: the Parallel_Execute procedure will not end until all SQLs are complete, or any SQL has failed. - Parallel_GetExecutionResult() - A user-defined tabled function which will return information about all the thread executions. The table will have a SQL key, a success flag, a thread run time, an error message, and an error stack.
- Parallel_GetErrorMessage() - This function will return an error message of type
nvarchar(MAX)
if any SQL in the parallel block has failed; otherwise, it will return NULL
.
To use these Stored Procedures, you have to apply the provided script to your database. For example, I have setup a ClrLibDb database. Make sure the instance of the server is configured with CLR support turned on. Details of how to do this are omitted.
- Set this database option
TRUSTWORTHY
on. - Type USE [your database name];.
- Copy/paste Deployment.sql, and execute it.
Also, I created a [test] database, and a table in this database [Table_1] with two fields of type int
, and varchar(50)
.
Here are examples of how to use it:
exec ClrLibDb.dbo.Parallel_Declare 'test block'
exec ClrLibDb.dbo.Parallel_SetOption_MaxThreads 9
exec ClrLibDb.dbo.Parallel_SetOption_CommandTimeout 30
exec ClrLibDb.dbo.Parallel_UseTransaction 'ReadUncommitted'
exec ClrLibDb.dbo.Parallel_AddSql 'sql1',
'insert into test.dbo.table_1 values (1, ''test1'')'
exec ClrLibDb.dbo.Parallel_AddSql 'sql2',
'insert into test.dbo.table_1 values (2, ''test2'')'
exec ClrLibDb.dbo.Parallel_AddSql 'sql3',
'insert into test.dbo.table_1 values (3, ''test3'')'
exec ClrLibDb.dbo.Parallel_AddSql 'sql4',
'insert into test.dbo.table_1 values (4, ''test4'')'
DECLARE @RC int
exec @RC = ClrLibDb.dbo.Parallel_execute
IF @RC != 0
BEGIN
DECLARE @ErrorMessage varchar(MAX)
SET @ErrorMessage = ClrLibDb.dbo.Parallel_GetErrorMessage()
print @ErrorMessage
END
select * from ClrLibDb.dbo.parallel_GetExecutionResult()
And, here are 10 tables copying and indexing; a pseudo-example (for simplicity, just three tables :)):
exec ClrLibDb.dbo.Parallel_Declare 'Copy tables'
exec ClrLibDb.dbo.Parallel_UseTransaction 'ReadUncommitted'
exec ClrLibDb.dbo.Parallel_AddSql 'sql1',
'SELECT * INTO targetdb.dbo.Table1 FROM sourcedb.dbo.table1'
exec ClrLibDb.dbo.Parallel_AddSql 'sql2',
'SELECT * INTO targetdb.dbo.Table2 FROM sourcedb.dbo.table2'
exec ClrLibDb.dbo.Parallel_AddSql 'sql3',
'SELECT * INTO targetdb.dbo.Table3 FROM sourcedb.dbo.table3'
DECLARE @RC int
exec @RC = ClrLibDb.dbo.Parallel_execute
IF @RC != 0
BEGIN
DECLARE @ErrorMessage varchar(MAX)
SET @ErrorMessage = ClrLibDb.dbo.Parallel_GetErrorMessage()
RAISERROR(@ErrorMessage, 16, 1)
END
exec ClrLibDb.dbo.Parallel_Declare 'Create indexes #1'
exec ClrLibDb.dbo.Parallel_UseTransaction 'ReadUncommitted'
exec ClrLibDb.dbo.Parallel_AddSql 'index1',
'CREATE INDEX IX01_Table1 ON targetdb.Table1 (<fields>)'
exec ClrLibDb.dbo.Parallel_AddSql 'index2',
'CREATE INDEX IX02_Table1 ON targetdb.Table1 (<fields>)'
exec ClrLibDb.dbo.Parallel_AddSql 'index3',
'CREATE INDEX IX03_Table1 ON targetdb.Table1 (<fields>)'
DECLARE @RC int
exec @RC = ClrLibDb.dbo.Parallel_execute
IF @RC != 0
BEGIN
DECLARE @ErrorMessage varchar(MAX)
SET @ErrorMessage = ClrLibDb.dbo.Parallel_GetErrorMessage()
RAISERROR(@ErrorMessage, 16, 1)
END
exec ClrLibDb.dbo.Parallel_Declare 'Create indexes #2'
exec ClrLibDb.dbo.Parallel_UseTransaction 'ReadUncommitted'
exec ClrLibDb.dbo.Parallel_AddSql 'index1',
'CREATE INDEX IX01_Table2 ON targetdb.Table2 (<fields>)'
exec ClrLibDb.dbo.Parallel_AddSql 'index2',
'CREATE INDEX IX02_Table2 ON targetdb.Table2 (<fields>)'
exec ClrLibDb.dbo.Parallel_AddSql 'index3',
'CREATE INDEX IX03_Table2 ON targetdb.Table2 (<fields>)'
DECLARE @RC int
exec @RC = ClrLibDb.dbo.Parallel_execute
IF @RC != 0
BEGIN
DECLARE @ErrorMessage varchar(MAX)
SET @ErrorMessage = ClrLibDb.dbo.Parallel_GetErrorMessage()
RAISERROR(@ErrorMessage, 16, 1)
END
exec ClrLibDb.dbo.Parallel_Declare 'Create indexes #3'
exec ClrLibDb.dbo.Parallel_UseTransaction 'ReadUncommitted'
exec ClrLibDb.dbo.Parallel_AddSql 'index1',
'CREATE INDEX IX01_Table3 ON targetdb.Table3 (<fields>)'
exec ClrLibDb.dbo.Parallel_AddSql 'index2',
'CREATE INDEX IX02_Table3 ON targetdb.Table3 (<fields>)'
exec ClrLibDb.dbo.Parallel_AddSql 'index3',
'CREATE INDEX IX03_Table3 ON targetdb.Table3 (<fields>)'
DECLARE @RC int
exec @RC = ClrLibDb.dbo.Parallel_execute
IF @RC != 0
BEGIN
DECLARE @ErrorMessage varchar(MAX)
SET @ErrorMessage = ClrLibDb.dbo.Parallel_GetErrorMessage()
RAISERROR(@ErrorMessage, 16, 1)
END
You can use your creativity to run parallel blocks inside other parallel blocks. Yes. These parallel blocks can be nested by placing some parallel blocks in Sotred Procedures. But, a transaction started in one parallel block will not be passed to nested parallel blocks. This is a limitation of SQL Server CLR integration.
Please note
What you have to know about how this is working:
- Each SQL is executed in a separate thread and connects to a server using ADO.NET. It will use Windows authentication to connect to SQL Server, and it will be authenticated using the Windows account the SQL Server service is running under. You can add this account to any database as DBO, or setup privileges for this account to perform tasks programmed in your parallel SQLs.
- Since each SQL is running under its own connection, this connection default database is set to [
master
] and you have to provide a database name to the tables, Stored Procedures, and other objects you use in your SQL. You can see from the examples that I have used the targetdb and sourcedb databases.
History
- 9/16/2008 - Uploaded the SqlClrLib project.