This is a follow up to a post about subqueries with an estimation for how long it would take for the query to run.
In a recent post about subqueries, I compared 4 queries and how they handled 4 million rows of data. The queries involved:
- Joins with
AND
conditions - Subqueries with
AND
conditions - Joins with
OR
conditions - Subqueries with
OR
conditions
I compared the speed differential between all of them [on SQL Server], but the last of the queries never finished. As I indicated I might do, I left it running overnight – it still never finished!
Well, my curiosity about how long it would take for the query to finish wasn’t diminished – it was only enhanced. How could I find out and at least be able to estimate how long it would take? I needed to have some sort of understanding of how the performance was degrading as more records were added to the table. So I did that. I ran the query using a smaller number of records in the StudentTestData
table. Here are the results:
# records | Run Time |
25,000 | 6 seconds |
50,000 | 23 seconds |
100,000 | 99 seconds |
200,000 | 505 seconds |
My anticipation is SQL Server was doing some sort of cross-join type comparison in the background, and the numbers above seem to support this. The efficiency, in Big-O notation, is around O(n^2). Since I want to work with actual numbers to estimate how long the query would run with N = 4 million rows, here’s a rough function:
Number of Seconds = 6*(N/25000)^2, where N is the number of rows.
This function underestimates the time requirement as you add more records, as it is slightly too low for N=100,000
and it is about 2 minutes off for N=200,000
. Despite this, it gives me something to work with.
As a spot-check, I tried N = 80,000
students. The function expectation is 61 seconds, and the actual time the query ran was 60 seconds. That’s about as close as I can ask for.
But what about if N is 4 million rows as it was in the previous post? Based on the function, the expectation is over 43 hours. Also, this is an understatement given the results above for N=200,000
. It’s a good thing I didn’t wait for it to finish!