Introduction
My everyday job is to develop back-office applications for a mobile telecom operator. When a customer orders a service through the web or voice front-end, our applications have to provide a very quick feedback. Although we are required to answer in less than one second, we have to perform complex SQL statements on our databases which are dozens of GBs.
In this environment, a single inefficient query can have disastrous effects. A bad statement may overload all database processors, so that they are no longer available to serve other customers' orders. Of course, such problems typically occur shortly after the launch of new offers... that is, precisely under heavy marketing fire. Could you imagine the mood of our senior management if such a disaster happens?
Unfortunately, suboptimal statements are difficult to avoid. Applications are generally tested against a much smaller amount of data than in production environment, so performance problems are not likely to be detected empirically.
That's why every database developer (and every application developer coping with databases) should understand the basic concepts of database performance tuning. The objective of this article is to give a theoretical introduction to the problem. At the end of this article, you should be able to answer the question: is this execution plan reasonable given the concrete amount of data I have?
I have to warn you: this is about theory. I know everyone dislike it, but there is no serious way to go around it. So, expect to find here a lot of logarithms and probabilities... Not afraid? So let's continue.
In this article, I will try to cover the following topics:
- What are we optimizing.
- Looking for records in tables.
- Joining tables with nested loops.
In the second part (to be written), the following has to be sketched:
- Applying predicates.
- Joining tables with hash.
- Sorting
- Merging
- Joining tables with merge joins.
Scenario
I need a sample database for the examples of this article. Let's set up the scene.
The CUSTOMERS
table contains general information about all customers. Say the company has about a million of customers. This table has a primary key CUSTOMER_ID
, which is indexed by PK_CUSTOMERS
. The LAST_NAME
column is indexed by IX_CUSTOMERS_LAST_NAME
. There are 100000 unique last names. Records in this table have an average of 100 bytes.
The REGION_ID
column of the CUSTOMERS
table references the REGIONS
table, which contains all the geographical regions of the country. There are approximately 50 regions. This table has a primary key REGION_ID
indexed by PK_REGIONS
.
I will use the notations RECORDS(CUSTOMERS)
and PAGES(CUSTOMERS)
to denote respectively the number of records and pages in the CUSTOMERS
table, and similarly for other tables and even for indexes. Prob[CUSTOMERS.LAST_NAME = @LastName]
will denote the probability that a customer will be named by the @LastName
when we have no other information about him.
What is an execution plan
An SQL statement expresses what you want but does not tell the server how to do it. Using an SQL statement, you may for instance ask the server to retrieve all customers living in the region of Prague. When the server receives the statement, the first thing it does is to parse it. If the statement does not contain any syntax error, the server can go on. It will decide the best way to compute the results. The server chooses whether it is better to read completely the table of customers, or whether using an index would be faster. It compares the cost of all possible approaches. The way that a statement can be physically executed is called an execution plan or a query plan.
An execution plan is composed of primitive operations. Examples of primitive operations are: reading a table completely, using an index, performing a nested loop or a hash join,... We will detail them in this series of articles. All primitive operations have an output: their result set. Some, like the nested loop, have one input. Other, like the hash join, have two inputs. Each input should be connected to the output of another primitive operation. That's why an execution plan can be sketched as a tree: information flows from leaves to the root. There are plenty of examples below in this article.
The component of the database server that is responsible for computing the optimal execution plan is called the optimizer. The optimizer bases its decision on its knowledge of the database content.
How to inspect an execution plan
If you are using Microsoft SQL Server 2000, you can use the Query Analyzer, to which execution plan is chosen by the optimizer. Simply type an SQL statement in the Query window and press the Ctrl+L key. The query is displayed graphically:
As an alternative, you can get a text representation. This is especially useful if you have to print the execution plan. Using a Command Prompt, open the isql program (type isql -?
to display the possible command line parameters). Follow the following instructions:
- Type
set showplan_text on
, and press Enter.
- Type
go
, and press Enter.
- Paste your SQL statement at the command prompt, and press Enter.
- Type
go
, and press Enter.
The program will display the execution plan:
The top operation of this execution plan is a hash join
, whose inputs are an index scan of UNC_Dep_DepartmentName
and a clustered index scan of PK_USERS
. The objective of this series of articles is to learn how to understand such execution plans.
What are we optimizing?
Application developers usually have to minimize processor use and sometimes memory use. However, when developing database applications, the bottleneck is elsewhere. The main concern is to minimize disk access.
The main disk allocation unit of database engines is called a page. The size of a page is typically some kilobytes. A page usually contains between dozens and hundreds of records. This is important to remember: sometimes you may think a query is optimal from the point of view of the record accesses, while it is not if you look at page accesses.
Looking for records in tables
- Full table scan
Say we are looking for a few records in a single table -- for instance we are looking for the customers whose last name is @LastName
.
sql1 ::= SELECT * FROM CUSTOMERS WHERE LAST_NAME = @LastName
The first strategy is to read records from the table of customers and select the ones fulfilling the condition LAST_NAME = @LastName
. Since the records are not sorted, we have to read absolutely all the records from the beginning to the end of the table. This operation is called a full table scan. It has linear complexity, which means that the execution time is a multiple of the number of rows in the table. If it takes 500 ms to look for a record in a table of 1000 records, it may take 8 minutes in a table of one million records and 5 days in a table of one billion records...
To compute the cost of sql1
, we set up a table with primitive operations. For each operation, we specify the cost of one occurrence and the number of occurrences. The total cost of the query is obviously the sum of the products of operation unit cost and number of repetitions.
Operation |
Unit Cost |
Number |
Full table scan of CUSTOMERS |
PAGES(CUSTOMERS) |
1 |
Let's take a metaphor: a full table scan is like finding all occurrences of a word in a Roman.
- Index seek and index range scan
Now what if the book is not a Roman but a technical manual with an exhaustive index at the end? For sure, the search would be much faster. But what is precisely an index?
- An index is a collection of pairs of key and location. The key is the word by which we are looking. In the case of a book, the location is the page number. In the case of a database, it is the physical row identifier. Looking for a record in a table by physical row identifier has constant complexity, that is, it does not depend on the number of rows in the table.
- Keys are sorted, so we don't have to read all keys to find the right one. Indeed, searching in an index has logarithmic complexity. If looking for a record in an index of 1000 records takes 100 ms, it may take 200 ms in an index of million of rows and 300 ms in an index of billion of rows. (Here I'm talking about B-Tree indexes. There are other types of indexes, but they are less relevant for application development).
If we are looking for customers by name, we can perform the following physical operations:
- Seek the first entry in
IX_CUSTOMERS_LAST_NAME
where LAST_NAME=@LastName
. This operation is named an index seek.
- Read the index from this entry to the last where the
LAST_NAME=@LastName
is still true
. This will cost to read PAGES(IX_CUSTOMERS_LAST_NAME)*Prob[LAST_NAME=@LastName]
pages from disk. This operation (always coupled with an index seek) is called an index range scan.
- Each index entry found by the previous steps gives us the physical location of the a record in the
CUSTOMERS
table. We still have to fetch this record from the table. This will imply RECORDS(CUSTOMERS)*Prob[LAST_NAME=@LastName]
page fetches. This operation is called a table seek.
The detailed cost analysis of sql1
using an index range scan is the following.
Operation |
Unit Cost |
Number |
Index Seek of IX_CUSTOMERS_LAST_NAME |
Log( PAGES(IX_CUSTOMERS_LAST_NAME) ) |
1 |
Index Range Scan of IX_CUSTOMERS_LAST_NAME |
PAGES(IX_CUSTOMERS_LAST_NAME)* Prob[LAST_NAME = @LastName] |
1 |
Table Seek of CUSTOMERS |
1 |
RECORDS(CUSTOMERS)* Prob[LAST_NAME = @LastName] |
Bad news is that the query complexity is still linear, so the query time is still a multiple of the table size. Good news is that we cannot do really better: the complexity of a query cannot be smaller than the size of its result set.
In the next section of this article, we will accept a simplification: we will assume that index look-up has unit cost. This estimation is not so rough because a logarithmic cost can always be neglected if it is added to a linear cost. This simplification is not valid if it is multiplied to another cost.
- Index selectivity
Comparing the cost of the full table scan approach and the index range scan approach introduces us to a crucial concept in database tuning. The conclusion of the previous section is that the index range scan approach shall be faster if, in terms of order of magnitude, the following condition is true:
[1] RECORDS(CUSTOMERS)* Prob[LAST_NAME = @LastName]< PAGES(CUSTOMERS)
The probability that a customer has a given name is simply the number customers having this name divided by the total number of customers. Let KEYS(IX_CUSTOMERS_LAST_NAME)
denote the number of unique keys in the index IX_CUSTOMERS_LAST_NAME
. The number of customers named @LastName
is statistically RECORDS(CUSTOMERS)/KEYS(IX_CUSTOMERS_LAST_NAME)
.
So the probability can be written:
[2] Prob[LAST_NAME = @LastName] =
(RECORDS(CUSTOMERS)/ KEYS(IX_CUSTOMERS_LAST_NAME))/RECORDS(CUSTOMERS)
= 1 / KEYS(IX_CUSTOMERS_LAST_NAME)
Injecting [2] in [1] we have:
[3] RECORDS (CUSTOMERS)/ KEYS(IX_CUSTOMERS_LAST_NAME)< PAGES(CUSTOMERS)
That is, an index is adequate if the number of records per unique key is smaller than the number of pages of the table.
The inverse of the left member of the previous expression is called the selectivity of an index:
SELECTIVITY(IX_CUSTOMERS_LAST_NAME) =
KEYS(IX_CUSTOMERS_LAST_NAME) / RECORDS(CUSTOMERS)
The selectivity of a unique index is always 1. The more an index is selective (the larger is its selectivity coefficient), the more is its efficiency. Corollary: indexes with poor selectivity can be counter-productive.
Joining tables with nested loops
Things become much more difficult when you need to retrieve information from more than one table.
Suppose we want to display the name of the region besides the name of the customer:
SELECT d.NAME, e.FIRST_NAME, e.LAST_NAME
FROM CUSTOMERS e, REGIONS d
WHERE e.REGION_ID = d.REGION_ID
Among the possible strategies, I will present in this article the most natural: choosing a table, reading it from the beginning to the end and, for each record, search the corresponding record in the second table. The first table is called the outer table or leading table, and the second one the inner table. The dilemma is of course to decide which table should be leading.
So let's first try to start with the table of regions. We learnt before that an index on CUSTOMERS.REGION_ID
would have too low selectivity to be efficient, so our first candidate execution plan is to read the table of regions and, for each region, perform a full table scan of CUSTOMERS
.
Operation |
Unit Cost |
Number |
Full table scan of REGIONS |
PAGES(REGIONS) |
1 |
Full table scan of CUSTOMERS |
PAGES(CUSTOMERS) |
RECORDS(REGIONS) |
The leading cost is clearly PAGES(CUSTOMERS)*RECORDS(REGIONS)
. If we give numeric values, we have approximately 50*PAGES(CUSTOMERS)
.
Now what if we did the opposite? Since the table of regions is so small that it has a single page, it is useless to have an index, so we choose again two nested full table scans.
Operation |
Unit Cost |
Number |
Full table scan of CUSTOMERS |
PAGES(CUSTOMERS) |
1 |
Full table scan of REGIONS |
PAGES(REGIONS) |
RECORDS(CUSTOMERS) |
At first sight, the leading cost is PAGES(REGIONS)*RECORDS(CUSTOMERS)
. Since the table of regions is so small that it fits in one page, and since we have approximately 80 customer records per page (pages have, say, 8K),we can write that the leading cost is 80*PAGES(CUSTOMERS)
, which seems a little worse than the first approach. However, this second join order shall be must faster than the first one. To see this, we have to take into account a factor that we have forgotten up to now: the memory cache.
Since we are interested only in minimizing disk access, we can consider that the cost of reading a page from memory is zero.
The REGIONS
table and its primary key can both be stored in cache memory. It follows that the cost matrix can be rewritten as follows:
Operation |
Unit Cost |
Number |
Full table scan of CUSTOMERS |
PAGES(CUSTOMERS) |
1 |
First Full table scan of REGIONS |
PAGES(CUSTOMERS) |
1 |
Next Full table scan of REGIONS |
0
|
RECORDS(CUSTOMERS)
|
So, finally the leading cost term is PAGES(CUSTOMERS)
, which is around 200 times better than the first join order.
Summary
Let's summarize what I have tried to introduce in this article:
- We are optimizing page access from disk, not record access.
- An execution plan describes how data flows between primitive operations. To understand an execution plan, it is necessary to know the complexity of each primitive operation and the number of records of their input flows.
- Full table scan has linear complexity, index seek has logarithmic complexity.
- Pay attention to myths. Index scans are not always better than full table scans.
- A nested loop is similar to a
foreach
iteration.
- Caching can have a tremendous effect.
- The order of execution has a direct influence on the memory use efficiency.
There is however still a lot of work before us. I haven't still talked about the most common type of join
s: the hash join
s. Worse: I haven't discussed how predicates (non-joining parts of the WHERE
clause) can totally mislead the optimizer. If you want to be better than the built-in optimizer, you still have to read the second article.
To be continued. Stay tuned on my RSS feed.
References
A good source of information, which helped me a lot to write this article, is the Oracle Performance Tuning Guide and Reference. Most of the concepts explained there are also relevant for Microsoft SQL Server. Microsoft users can also read the Query Tuning section of the on-line help.
Document History
- 2005-03-14
- 2005-03-31
- Posted on The Code Project.
- 2005-04-07
- Less offending to the Query Analyzer;).
- 2005-04-18
- Some corrections (courtesy of Peter Molnarand Mikael H�kansson).