Introduction
In this article, I'll make a comparison of set based and procedural approaches using some simple examples.
For running the scripts, you'll need AdventureWorks database that can be downloaded here. Below is the diagram of the tables that will be used in these examples:
Contact
– 19972 rows
SalesOrderHeader
– 31465 rows
The Task
Just for testing purposes, let’s say we need to get the list of all contacts and for each contact the sum of SubTotal
column from SalesOrderHeader
table. The resulting table should contain the following columns:
ContactID
, LastName
, FirstName
, TotalSales
Testing
With getting ContactID
, FirstName
and LastName
all is simple, we just query Contact
table and get the results. Now how to get TotalSales
information for each contact? If we follow the “procedural” approach, then one solution could be to write a scalar UDF and use it in the query. Below is the UDF script:
CREATE FUNCTION dbo.GetContactTotalSales
(
@contactID int
)
RETURNS money
AS
BEGIN
DECLARE @contactTotalSales money;
SET @contactTotalSales = (SELECT Sum(SubTotal) _
FROM Sales.SalesOrderHeader WHERE ContactID = @contactID);
RETURN @contactTotalSales;
END
Now the final query will look like this:
SELECT
c.ContactID,
c.LastName,
c.FirstName,
dbo.GetContactTotalSales(c.ContactID) as TotalSales
FROM
Person.Contact c
To test each query, I used SQL Server Profiler. I took the standard template and just added to it the query execution plan. Also, before running each query, I was executing:
DBCC DROPCLEANBUFFERS
This cleans the buffers from the buffer pool. Otherwise second call will be much faster than the previous one because SQL Server would return the results from the cache. After running the query, I got the following results:
Below are the significations of these columns:
CPU
- Amount of CPU time (in milliseconds) used by the event Reads
- Number of logical disk reads performed by the server on behalf of the event Duration
- Amount of time taken by the event ( in microseconds)
To understand how bad these results are, we need to compare them with the results of the “set based” variant. Below is the query script:
SELECT
c.ContactID,
c.LastName,
c.FirstName,
ts.TotalSales as TotalSales
FROM
Person.Contact c
LEFT OUTER JOIN
(
SELECT
soh.ContactID, SUM(soh.SubTotal) AS TotalSales
FROM
Sales.SalesOrderHeader soh
GROUP BY
soh.ContactID
) as ts ON ts.ContactID = c.ContactID
After running the query, I got the following results:
- CPU: 203
- READS: 1273
- DURATION: 787
If we compare the durations of “set based” and “procedural” queries, then we get that procedural query is about 159 times slower. That’s a huge difference!
Things are not always so bad with “procedural” approach. For small number of rows, the performance of the queries could be similar or even better when comparing to “set based” queries. If there is no way to write the query in “set based” form, then write it in “procedural” form but just make sure you don't return too many rows. This number should be determined separately for each particular case through testing.
There is also another way of writing the query that I would like to discuss. The inline query could be placed in the select
part.
SELECT
c.ContactID,
c.LastName,
c.FirstName,
(
SELECT
SUM(soh.SubTotal)
FROM
Sales.SalesOrderHeader soh
WHERE
soh.ContactID = c.ContactID
) as TotalSales
FROM
Person.Contact c
Many developers will consider this as another way of writing the query using “procedural” approach. However the results are showing the opposite:
- CPU: 140
- READS: 1273
- DURATION: 794
As we can see, the results are almost the same as for “set based” query. If we compare the query plans, then we can see that the query plans are almost equal with an insignificant difference.
“Set based” query:
“inline query in the select” query:
As you can see, the only difference is the position of the “compute scalar” operation. The conclusion I can make is that SQL Server is smart enough to handle such queries as “set based” queries. Anyway I don't recommend adding inline queries to the select because I noticed that when queries become more complex, SQL Server starts to handle them worse than equivalent “classic set based” queries.
Improve “Set Based” Queries with Views
One of the benefits of using “procedural” approach is that it makes the query more readable. It’s easy to move a complex query to a UDF and reuse it in other queries. On the other hand, “set based” queries tend to be more complex to write and read. A simple way to improve “set based” queries is to use views. Views allow you to do the same as UDFs (encapsulating SQL script) but without any overhead. So, if we create the following view:
CREATE VIEW dbo.ContactTotalSales
AS
SELECT
ContactID, SUM(SubTotal) AS TotalSales
FROM
Sales.SalesOrderHeader
GROUP BY
ContactID
then we can make our query look like this:
SELECT
c.ContactID,
c.LastName,
c.FirstName,
ts.TotalSales
FROM
Person.Contact c
LEFT OUTER JOIN
ContactTotalSales ts ON ts.ContactID = c.ContactID
If we'll compare the performance of this query, we will see that it’s equal to original “set based” query.
Conclusion
Always test your queries for performance and check their execution plan. Improve complex query structures using views. Use “procedural” approach very carefully and only if there is no other choice.
Thank you for reading. I hope this was useful to you.