Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

SQL Query Run Time Estimation

5.00/5 (1 vote)
31 May 2011CPOL2 min read 16.4K  
SQL Query run time estimation

This is a follow up to a post about subqueries with an estimation for how long it would take for the query to run.

In a recent post about subqueries, I compared 4 queries and how they handled 4 million rows of data. The queries involved:

  1. Joins with AND conditions
  2. Subqueries with AND conditions
  3. Joins with OR conditions
  4. Subqueries with OR conditions

I compared the speed differential between all of them [on SQL Server], but the last of the queries never finished. As I indicated I might do, I left it running overnight – it still never finished!

Well, my curiosity about how long it would take for the query to finish wasn’t diminished – it was only enhanced. How could I find out and at least be able to estimate how long it would take? I needed to have some sort of understanding of how the performance was degrading as more records were added to the table. So I did that. I ran the query using a smaller number of records in the StudentTestData table. Here are the results:

# recordsRun Time
25,0006 seconds
50,00023 seconds
100,00099 seconds
200,000505 seconds

My anticipation is SQL Server was doing some sort of cross-join type comparison in the background, and the numbers above seem to support this. The efficiency, in Big-O notation, is around O(n^2). Since I want to work with actual numbers to estimate how long the query would run with N = 4 million rows, here’s a rough function:

Number of Seconds = 6*(N/25000)^2, where N is the number of rows.

This function underestimates the time requirement as you add more records, as it is slightly too low for N=100,000 and it is about 2 minutes off for N=200,000. Despite this, it gives me something to work with.

As a spot-check, I tried N = 80,000 students. The function expectation is 61 seconds, and the actual time the query ran was 60 seconds. That’s about as close as I can ask for.

But what about if N is 4 million rows as it was in the previous post? Based on the function, the expectation is over 43 hours. Also, this is an understatement given the results above for N=200,000. It’s a good thing I didn’t wait for it to finish!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)