Pitfalls of the SQL Server Service Broker

Herman<T>.Instance

5.00/5 (4 votes)

16 Oct 2022CPOL5 min read

7.3K

It should reflect the event

There are articles written about the service broker and issues, but none of them pinpoint the cause. In this article, a fix is given.

Introduction

It is very easy to misuse the Service Broker in a SQL Server when using a query in the C# SQLDependency object and set that query as event in your database. When your query requests data from more than 1 table, your database can end up in extreme locking of data. When you run into this issue, you will get exactly 0 answers from BING and Google to solve it.

In the official Microsoft documentation, the requirements are set for your query. As long as you keep the query simple, you create disaster easily in your database without being aware. Especially for lazy developers, this is a learning curve. Also, for Microsoft which gave response fast to our situation but seemed to lose interest after a fortnight by stating that what I found was not unthinkable. It is a matter of perception how to work with the Service Broker without getting into trouble.

There are more articles written about the service broker and issues, but none of them pinpoints the cause:

Good examples:

In this article, a fix is given.

Background

In our production environment, we sometimes had big performance degradation. Last year, it became a daily problem. A simple update query took maximum 2 minutes. The queryplan for that update showed an immense plan full of parallelism. The query looked like:

SQL

UPDATE table SET FIELD = VALUE Where ID = IdValue;

So the parallelism was completely unexpected behaviour. After a restart of the database, the normal queryplan showed up again. Within a day, the issues returned.

SQLDependency Requirements

Microsoft gives the next requirements for the query you use in a SQLDependency here.

Under the header of supported SELECT statements, the most important rule is not mentioned: The statement must not include JOIN. I will explain why I think you should not use more than 1 table in your query.

The Case

Our database (nearly 100 tables) have 6 important tables which represent the workflow. These tables can be joined over the primary key Id fields. These 6 tables have the 47 most needed columns to be able to understand what is involved and done in a project for our customers. After a measurement is done, the crew has to reject or accept the measurement. That checkbox is the event the service broker should respond to. Other processes would start in our workflow after that event.

What was Build

In the service that set the SQLDependency, the developer thought it would be wise to ask all these 47 fields in the query. Since it is not prohibited to use JOIN in these queries and the query was pretty straight forward, no errors were given. The application for doing measurements got all the problems to handle. Better: my phone was ringing off the wall.

We first thought that the measurement system had issues because we saw the issue there. Extra problem: we could never tell which of the 6 tables had to handle the parallelism issue in an update query. It was always the update. After a given time, we decided to stop the service with the dependency and everyone could work again.

That was the moment we investigated the query. What do we really need to know from the event?

Our query looked like:

SQL

SELECT tbl1.Field1, tbl1.Field2, tbl1.Field3, _
       tbl1.Field4, tbl1.Field5, tbl1.Field6, tbl1.Field7,
tbl2.Field1, tbl2.Field2, tbl2.Field3, tbl2.Field4, _
             tbl2.Field5, tbl2.Field6, tbl2.Field7,
tbl3.Field1, tbl3.Field2, tbl3.Field3, tbl3.Field4, _
             tbl3.Field5, tbl3.Field6, tbl3.Field7,
tbl4.Field1, tbl4.Field2, tbl4.Field3, tbl4.Field4, _
             tbl4.Field5, tbl4.Field6, tbl4.Field7,
tbl5.Field1, tbl5.Field2, tbl5.Field3, tbl5.Field4, _
             tbl5.Field5, tbl5.Field6, tbl5.Field7,
tbl6.Field1, tbl6.Field2, tbl6.Field3, tbl6.Field4, _
             tbl6.Field5, tbl6.Field6, tbl6.Field7, tbl6.lastModified
FROM dbo.Table1 as tbl1
Join dbo.Table2 as tbl2 on tbl2.table1Id = tbl1.Id
Join dbo.Table3 as tbl3 on tbl3.table2Id = tbl2.Id
Join dbo.Table4 as tbl4 on tbl4.table3Id = tbl3.Id
Join dbo.Table5 as tbl5 on tbl5.table4Id = tbl4.Id
Join dbo.Table6 as tbl6 on tbl6.table5Id = tbl5.Id
Where tbl6.lastModified > @lastModified
AND tbl6.IsAccepted IS NOT NULL

According to Microsoft, this is a valid query for a SqlDependency object. On CodeProject, I asked several questions about this issue without any valid response:

Further Investigation

The biggest issue with the above query was that no one set the field IsAccepted. When an organization with 400 people adds rows of data to all these 6 tables, the collection of tuples gets huge because you wait forever for the IsAccepted to be set to false or true. So after a restart of SQL server, it took about an hour to have the same issue again.

Reminders

First, you have to know that a SQLDependency is set for a maximum of 2 minutes in the Service Broker. If the event occurs earlier, the event informs your application that it has occurred. If not, then the Service Broker will set up the event again for 2 minutes.

The Service Broker sets your query as an Indexed View in your database, schema sys! That is what someone at Microsoft could tell me. And here, your problem starts when JOINING more tables in your SQLDependency. We used 47 fields from 6 tables while we only needed to know the involved Id of the measurement table after a measurement was accepted or declined.

So when 47 fields are created in indexed views (at lower priority- is still a priority-), it has all kinds of influence on your database. Our found parallelism in the UPDATE query showed some strange information.

XML

<OutputList>
    <ColumnReference Database="[TDMS-TEST]" Schema="[sys]" 
    Table="[query_notification_1626853258]" Column="id" />

You should read the number at the end as object_id in your database. With the next query, you can find these indexes:

SQL

SELECT IndexName = i.Name, ColName = c.Name,  *
FROM sys.indexes i
     INNER JOIN sys.index_columns ic ON ic.object_id = i.object_id
                                        AND ic.index_id = i.index_id
     INNER JOIN sys.columns c ON c.object_id = ic.object_id
                                 AND c.column_id = ic.column_id
WHERE 1 = 1
AND I.OBJECT_ID = 1626853258
ORDER BY i.Name;
You cannot drop these indexes. The service broker can!

But when asking 47 fields from 6 tables to be watched as an indexed view, your database has a lot of work with creating and dropping indexes each 2 minutes. Continues checking 6 tables for a change in exactly 1 of them.

Perceptions

What did we really need from the Service Broker? The event that in one table one bitfield changed from null to not null relative to a lastmodified value. So why did we need the other 45 fields? Only to have degradation of performance in a production database.

In our case, we wanted each one to deliver a lot of data. It is better to split your queries in two parts.

Part 1 is the dependency query that reflects the event you want to be informed of. Part 2 is the query you need to fulfill the data needs.

Example:

The DependencyQuery would be like:

SQL

SELECT tbl6.Id, tbl6.lastModified
FROM dbo.Table6 as tbl6
WHERE tbl6.LastModified > @LastModified
AND tbl.IsAccepted IS NOT NULL

The query for your workflow would be:

SQL

SELECT tbl1.Field1, tbl1.Field2, tbl1.Field3, tbl1.Field4, _
       tbl1.Field5, tbl1.Field6, tbl1.Field7,
tbl2.Field1, tbl2.Field2, tbl2.Field3, tbl2.Field4, _
             tbl2.Field5, tbl2.Field6, tbl2.Field7,
tbl3.Field1, tbl3.Field2, tbl3.Field3, tbl3.Field4, _
             tbl3.Field5, tbl3.Field6, tbl3.Field7,
tbl4.Field1, tbl4.Field2, tbl4.Field3, tbl4.Field4, _
             tbl4.Field5, tbl4.Field6, tbl4.Field7,
tbl5.Field1, tbl5.Field2, tbl5.Field3, tbl5.Field4, _
             tbl5.Field5, tbl5.Field6, tbl5.Field7,
tbl6.Field1, tbl6.Field2, tbl6.Field3, tbl6.Field4, _
             tbl6.Field5, tbl6.Field6, tbl6.Field7
FROM dbo.Table1 as tbl1
Join dbo.Table2 as tbl2 on tbl2.table1Id = tbl1.Id
Join dbo.Table3 as tbl3 on tbl3.table2Id = tbl2.Id
Join dbo.Table4 as tbl4 on tbl4.table3Id = tbl3.Id
Join dbo.Table5 as tbl5 on tbl5.table4Id = tbl4.Id
Join dbo.Table6 as tbl6 on tbl6.table5Id = tbl5.Id
Where tbl6.Id = @Id

So in the C# class, we maintain the lastmodified value of the table (to prevent having enormous amount of records in memory) that is set in the Service Broker as event.

After the result (might be more than 1 rows of records, depending on how widely used your database is), we set the Id value as parameter for the second query, which directly calls the database.

History

16^th October, 2022: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)