Introduction
SQL Server is a great system, but it has its quirks that can cause extreme performance issues.
Not knowing the system very well can lead you straight into those quirks. Another thing is that SQL Server is alive. By that, I mean that it changes over time. That can be a hassle because what you knew to be true may end up being false. It's also a great opportunity, because it opens up new avenues of coding. Jumping straight into something new without understanding the ramifications can put you into those quirks big-time.
This is an example of an extreme performance issue caused by this misunderstanding.
I can explain why my functions are faster, I can sort of explain why top 1000 is slower than looking at the entire set of information for one view. I can't explain why "top 100" is 11 times slower, "top 10" is 10.5 and "top 1" 3.3 times slower than "top 1000". I can't explain why the entire process of running two select
statements in SSMS query takes less than 6 seconds, but it takes 19 seconds to start to see results in the query window.
Background
First, let's get back to the basics. A performance issue is usually caused by large IO, CPU, or IO contention. About the only thing you can do about the first one is to limit the transfer of data to smaller chunks. CPU is caused by executing something. In SQL, executing a function that produces a constant value in a where
clause is usually the biggest offender. Executing once rather than many times is usually faster in any situation. Defining the constant value produced by the function in a variable and then using that in the where
clause will reduce the execution costs for the function. If the field is indexed, it can find the row(s) being looked for in a hurry, but if the where
uses a function, the entire table has to be checked executing the function for every row. The where
clause wasn't the problem here, but executing many times is one of the minor problems in this case. IO contention involves trying to access many resources at one time but the disk can only be in one place at a time, so you have to wait for the disk to move. I think that's the major problem in this case. (A lot of things happening here don't make sense to me.)
In SQL 2005, a lot of things changed, but two things in particular caused this problem. Common Table Expressions (cte) were introduced in 2005. It's a new way of creating a subquery that has recursive features that make the cte capable of doing things that are impossible in a subquery. However, a subquery should be used with care because it eliminates the indexing in the table that might have helped the query. This used several ctes all at once.
Enter innocent me, about to enter the lion's den. I've just started a new position, I'm in SSMS... hmmn, what kind of data is in this view? Use the pull-down to get the first 1000 rows. Huh, 30 seconds, no data, and still going. Well, what's IN this view? Open a “create view
” script in a new window. Gosh, a monster. Many(6) ctes joining with a view over and over again and the first cte query I see makes my skin crawl. How's the view doing? Still running. 19 minutes later, I get my results.
While it was running I created two queries, the skin crawling one and an alternate one that would do the same thing, but I think less expensively. I get a cost estimate. 100% for skin crawler, 0% for my version. I prepare an e-mail talking about the view, showing the cost estimates, how I don't like cost estimates because they can be inaccurate and this might not be causing the problem. (It wasn't causing the problem, and the cost was nearly 50/50 because the conditions it was testing for didn't exist in the data.)
Using the Code
The download has one SQL file. This is designed to create a new database, use it and set up tables with data, views, and tabular functions. In other words, a test environment on your server. Only execute it if you want to set up the test environment to see what is going on in SQL. Before you do, you can change the database name or just use one you have already defined. I set it up this way so you can drop everything by dropping the database if you wish.
The two views are similar to the ones I executed. Names have been changed to try and hide anything that might be proprietary. Conditions have been changed to make an easier test environment to set up.
The following is the script that will be generated if you use SSMS to select 1000 rows from the view vwVRec
(if you keep the database name and scripts intact). It uses vwA
13 times. vwA
executes much faster than vw<code>VRec
but is still slow. If you want to see the speeded up version of vwVRec
, you could open a query window, point it to the db, and execute "select * from dbo.<code>fnVRec
()"
SELECT
TOP 1000 [ClassID]
,[Class]
,[ClassDesc]
,[SubClassID]
,[SubClass]
,[SubClassDesc]
,[GroupID]
,[GroupCode]
,[GroupDesc]
,[LIID]
,[LI]
,[LIDesc]
,[LIDID]
,[LID]
,[LIDDesc]
,[CEID]
,[CE]
,[CECode]
,[CEDesc]
FROM [testdbx].[dbo].[vwVRec]
On my machine, this runs in about a minute. The first time I tried it, I had forgotten to add an index to one of the tables and it ran for 2 hours. This is really simplified from the original version. The version I thought was exactly like the one I ran in 19 minutes at work runs in 8 minutes on my machine. It isn't unusual for test environments to act differently. The data in the view is nonsense data, but it keeps the spirit of uniqueness that the original had. fnVRec
doesn't return data exactly like the view, because it is set up to do what the comments says the view should do. I originally matched the view's results and the performance of the function isn't impacted with either version.
I picked functions because you can do much more complex logic in a function than you can in the view and used that to improve the performance.
In vwA
, it only uses one CTE. (It doesn't HAVE to be CALLED CTE, it just seems to be a common practice. You could also set the field names before the query if you wanted.) This produces a query that has 6 hierarchyid
type fields. In a hierarchy, you usually look from top down, that is how the script loads the data, but this query goes from bottom up. So, CEHID
is the lowest level and it has 10 records defined for every LIDHID
. Every other level has 5 child levels including the last level ClassHID
.(A total of 5 records.)
WITH CTE
as
(
select g.MembHierarchyId CEHID
,g.MembHierarchyId.GetAncestor(1) LIDHID
,g.MembHierarchyId.GetAncestor(2) LIHID
,g.MembHierarchyId.GetAncestor(3) GroupHID
,g.MembHierarchyId.GetAncestor(4) SubClassHID
,g.MembHierarchyId.GetAncestor(5) ClassHID
from dbo.tstAMemb g
join dbo.tstANode n on n.NodeId = g.NodeID
where n.NodeName = 'FinalA'
)
5 to the 6'th power times 2 is 31250. That is the number of records in CEHID
(NodeName = 'FinalA'
.) That's the number of times 5 function calls (GetAncestor
) are made in vwA
's only cte
. This cte
then joins with tstAMemb
6 times trying to look up 6 different locations in one table simultaneously. 6 joins to one table IS IO contention. (Not much on my machine, this view runs in 3-4 seconds. fnA
runs in 2-3 seconds.)
In fnA
, it defines a variable table @CE
that retrieves all the data one of the 6 joins in vwA
would have picked up. The function writes to 6 variable tables and overall, calls GetAncestor
35,155 times (the number of rows in the table.) The script in fnA
is more complex and longer than vwA
but still slightly faster. Here is the first query that executes GetAncestor
31,250 times (Putting the same number of rows in @CE.)
DECLARE @CE TABLE(CEHID HierarchyId primary key, LIDHID HierarchyId
, CEid int, CE varchar(10), CEDesc varchar(200)
)
INSERT INTO @CE (CEHID, LIDHID, CEid, CE, CEDesc)
select g.MembHierarchyId CEHID
,g.MembHierarchyId.GetAncestor(1) LIDHID
,g.AMembId
,g.Code
,g.Description
FROM dbo.tstAMemb g
JOIN dbo.tstANode n ON n.NodeId = g.NodeID
WHERE n.NodeName = 'FinalA';
It defines a variable table @LID
that retrieves the distinct child values in @CE
. Then this table with 3,125 records is updated with another one of the six joins in vwA used here. It calls GetAncestor
3,125 times. This process is repeated four more times with four variable tables, so tstAMemb
is accessed six times to get six sets of information here, but one at a time. (No IO contention.) In vwA
, all six times the join
s have to match 31K times. In fnA
, the last join
matches five records.
DECLARE @LID TABLE(LIDHID HierarchyId primary key, LIHID HierarchyId
, LIDid int, LID varchar(10), LIDDesc varchar(200)
)
INSERT INTO @LID (LIDHID)
SELECT DISTINCT LIDHID FROM @CE;
update a set LIHID = a.LIDHID.GetAncestor(1), LIDid=g.AMembId
,LID=g.Code, LIDDesc=g.Description
FROM @LID a
JOIN dbo.tstAMemb g ON g.MembHierarchyId=LIDHID;
vwVRec
has six ctes like this:
WITH CLASSCODE (ClassDesc, Class, Pair)
as
(
SELECT distinct
f1.ClassDesc
,RTRIM(f1.Class)
,f1.ClassDesc + ' (' + RTRIM(f1.Class) + ')'
FROM dbo.vwA f1
join dbo.vwA f2 on f2.ClassDesc = f1.ClassDesc
AND f2.Class <> f1.Class
)
vwA
has six joins to the same table and is executing a function over 150K times. So the above cte
(This cte is called CLASSCODE
and is used in a join
later.) is twice that. This is the query that caused me to cringe. It's looking for duplicate descriptions where the class name isn't the same. The reason I cringed: If there were a thousand rows that used the same description, this would join a million rows and if all thousand used different class names the "distinct" would have to reduce the million rows back to a thousand. Note that the RTRIM
function is executed twice per record (before the distinct reduces the records)
In the following code, removing the datetime
logic and executing the estimated execution cost, the first query cost 99% and the second 1%. That's more usual in the estimation tool when one query isn't optimal. I first ran the comparison using the above fields which produced 0 results and saved both the select execution results in commented lines. (0 and 2 results in the queries) The wallclock time came from the SSMS query window. I wasn't watching the results window, so I ran it a third time. 19 seconds before any results showed up at all. Both times were faster. Exactly 7/100th and 1/10th of a second faster. I have no idea why there is a delay in the results shown, that is not normal. Commenting out the first two select
statements, all three times (2 datetime
differences in third select
, and the SSMS query window) recorded and took 0 seconds.
DECLARE @d1 DATETIME=GETDATE(), @d2 DATETIME, @d3 DATETIME
SELECT distinct
f1.LIDDesc
,RTRIM(f1.LID)
,f1.LIDDesc + ' (' + RTRIM(f1.LID) + ')'
FROM dbo.vwA f1
join dbo.vwA f2 on f2.LIDDesc = f1.LIDDesc
AND f2.LID <> f1.LID
SET @d2=GETDATE()
SELECT DISTINCT
LIDDesc
,RTRIM(LID)
,LIDDesc + ' (' + RTRIM(LID) + ')'
FROM dbo.vwA
WHERE LIDDesc IN
(SELECT LIDDesc FROM dbo.vwA GROUP BY LIDDesc HAVING COUNT(DISTINCT LID) > 1)
SET @d3=GETDATE()
select DATEDIFF(millisecond,@d1,@d2)/1000. [Secs first], _
DATEDIFF(millisecond,@d2,@d3)/1000. [Secs second]
Now, to the view join
that produces the output:
from dbo.vwA ce
left join CLASSCODE CL1 on CL1.Class = RTRIM(ce.Class)
left join SUBCLASSCODE SC2 on SC2.SubClass = RTRIM(ce.SubClass)
left join GROUPCODE GR3 on GR3.GroupCode = RTRIM(ce.GroupCode)
left join LICODE LI4 on LI4.LI = RTRIM(ce.LI)
left join LIDCODE LD5 on LD5.LID = RTRIM(ce.LID)
left join CECODE CE6 on CE6.CE = RTRIM(ce.CE)
There isn't any indexing, so there's no point complaining about functions in where
clauses. By my count, that's 13 joins of vwA, 78 synchronous joins to the same table, and (not counting RTRIM
function executions) 2,031,250 (31,250*5*13) function calls when creating all the records. I have no idea what is going on, under the hood for TOP processing.
For duplicate description processing in fnVRec
, I create a variable table that finds all the duplicate descriptions used in more than one Code record value and then remove the descriptions when it is a single description per level (Tested in original setup, not in this one.) This handles all 6 levels in one table and one query.
DECLARE @Code TABLE (Code varchar(10), level smallint, _
Descrpt varchar(200), Pair varchar(200))
INSERT INTO @Code
SELECT DISTINCT Code, MembHierarchyId.GetLevel(), _
Description, Description + ' (' + RTRIM(Code) + ')'
FROM dbo.tstAMemb
WHERE Description IN
(SELECT Description from dbo.tstAMemb
GROUP BY Description
HAVING COUNT(DISTINCT Code) > 1)
DELETE a
FROM @Code a
JOIN (SELECT level, Descrpt
FROM @Code group by level, Descrpt having count(*) = 1) b
ON b.level=a.level AND b.Descrpt=a.Descrpt
In the original database, this would have found zero records, in this db it will find 2 records (Unless you change the data in the table.) To match up a different level, you can manually update one record if you wish. The view creates a "pair
" field in the cte that is never used, I do use it (Because I changed the join
logic so it matches the comments.)
In the select
part of the view/function, I do use the pair value in ISNULL
logic and the view uses a case
statement:
", ISNULL(CL1.Pair,ce.ClassDesc)"
" , ce.ClassDesc + CASE WHEN CL1.ClassDesc IS NOT NULL
THEN + ' (' + RTRIM(CL1.Class) + ')'
ELSE ''
END"
I used ISNULL
because I've read that the inline function is faster than a CASE
statement. I think it reads better too.
This statement: ce.ClassDesc + ISNULL(' (' + CL1.Class + ')','')
would do the same thing the view's case
statement does. (Because in the view's cte, RTRIM
has already been executed on this field.) This could be done in the function as well if you want to mark up every unique description the view does with the class data. There is one "fifth1
" code and one "fifth2
" code (LID
field) that uses a duplicate description. The function includes the bracketed class name (fifth1
/fifth2
) in 20 records while the view includes it in 6K records. This is a result of my not using distinct naming conventions for the code value when I generated the data.
Points of Interest
The view vwVRec
runs faster if you don't use "TOP
" any#. This tells me SQL doesn't take the stream of data and interrupt it when the "top
" criteria are met. The results you first see with "top
" is DIFFERENT than the first results when everything is selected. This tells me "TOP
" AFFECTS the data processing order.
"OK, the top 1000
takes 8 minutes, so all the data is 31 times bigger, it's going to take about 4 hours to list everything. UGH, do I want to do this? Yea, I better. 1 minute later, it's done!?!"
These were (and still are) my original observations on my machine with a schema much closer to the original setup that returned results in 19 minutes. With the new version supplied here, top 1000
runs in about a minute and no TOP
runs in about 40 seconds. I didn't test this on the original setup, but on this version TOP 100
runs in 11 minutes. TOP 10
- 10:30, top 5
- 5:20, top 1
- 3:20. I have no explanation.
Not stopping the stream after it was calculated wasn't a surprise, I knew that beforehand. Affecting the processing order, and the performance being slower using TOP
was a surprise to me.
fnVRec
's performance is almost the same on the schema that takes 8 minutes as this one (in the download) that takes 1 minute to get the TOP 1000
rows in the view. When I forgot to include the index, it cost the function about 2 more seconds to execute.
For vwA
, there isn't any significant difference in the view or the function. Uncached, they both complete in 4-5 seconds and produce the output in zero to 2 seconds based on the output. (TOP
makes the view faster than the function. Expected difference.) Cached they complete in about 2 seconds. ORDER BY
doesn't affect the speed of the function and with vwA
has only minor impact. vwVRec
is delayed by HOURS!
TOP
can significantly speed up vwA
(Expected behavior using TOP
).
While investigating what was going on, I turned on SQL profiler and told it to tell me everything (on vwA
.) I got 155K records that told me when the command started but nothing else. No command, elapsed time, cost, etc. The whole table had 33K records. What the??? Oh, duh. The query has 31K records of interest and it is executing 5 functions. 5*31K=155K. With fnA
, it runs one function 35,155 (5+25+125+625+3,125+31,250) times. First 31K for the records of interest and then 1/10th of that for their parent values, then 1/5th of that each time, down to the base 5 records.
If you want to learn more about hierarchyid
, look at what the download script does to generate the data and how the views and the functions treat the data. Then you might want to look up help to get a more detailed explanation of how this relatively new element works.
If you want to see how ctes work, look in vwVRec
. Don't bother looking in the functions, they aren't used there. Look in help for more information because they do a lot more than what these ctes do in the views. They are a valuable tool in the SQL toolbox, I just think they are thrown in a little more often than they are really needed. I have to admit they make it much more readable than using subqueries.
History
Haven't recorded each step getting a finished article. Basically rereading, rewriting, and running new tests while writing the article. Last update 10/6.