You probably know that database indexes are a means for improving database performance. But surprisingly few people understand the different types of indexes there are, how they work, and how to choose an appropriate index for a specific performance problem – or indeed whether an index will help at all. This article may help you get started. It will focus on B-tree indexes, the most commonly used index type in both Oracle and SQL Server databases. Some terminology, such as ‘branch blocks’ and ‘leaf blocks’ is specific to Oracle databases, as these are my primary interest at present.
Why Database Indexing?
In short, indexes help our database to find rows quickly.
Without an index, our database must perform a full table scan every time it needs to find a row or a collection of rows. A full table scan describes a process where the database just looks through a table one row at a time, looking for the desired row or rows. In large tables, this can be slow. An index provides shortcuts which the database can use to help find what it is looking for more quickly.
One analogy is a library. Imagine how long it would take you to find a particular book if the books in your local library were arranged in no particular order. You would have to examine one book at a time until you found what you were looking for. Thankfully, libraries are organized in such a way as to provide clues on where to look. Books are typically arranged by genre and by author. In most libraries there are computer terminals dotted around which you can use to access a catalog. You type in a title or author, and the catalog tells you where exactly to look for your book. In this way, the catalog is much like a database index.
The key of an index is the column or group of columns to which the index is applied. An index can only help speed up a query if its key includes one or more of the columns used in the query’s predicate (i.e. in its WHERE
clause). In a library, if all you know is a book’s title, then having all books arranged solely in order of the author’s surname isn’t going to help you.
B-tree Indexes
B-tree indexes are a particular type of database index with a specific way of helping the database to locate records. B-tree stands for ‘balanced tree’1 (not ‘binary tree’ as I once thought). A B-tree index orders rows according to their key values (remember the key is the column or columns you are interested in), and divides this ordered list into ranges. These ranges are organized in a tree structure, with the root node defining a broad set of ranges, and each level below defining a progressively narrower range.
Consider again our library example, and imagine that the books in the library are arranged purely with regards to the author’s surname. Now let’s say you are looking for a book by an author named Osman.
On entering the library, you see that the ground floor contains books by authors named A-G, the first floor is H-N, the second floor O-U and the third floor V-Z. So you go directly to the second floor to search for books by Osman. On the second floor, you see a sign which describes how the books are arranged on this floor. There is one aisle for O-R, and another aisle for S-U, so you progress to the O-R aisle. In this aisle, there is a shelf for each letter, so you find the shelf for O, and here the books are ordered alphabetically so you can quickly find your book by Osman.
Now let us consider the equivalent database search. Imagine all the books in the library were contained within a Books
database table, with a B-tree index on the author_surname
column. To find your book by Osman
, you might perform the following query:
SELECT * FROM Books WHERE author_surname = ‘Osman’
First the database would examine the root node of the B-tree index, which might define four ranges corresponding to our four floors in the library (A-G, H-N, O-U, V-Z). It would then progress to the node in the next level of the tree representing the O-U range, corresponding to progressing to the second floor of our library. Within this node would be a further set of ranges (O-R, S-U), corresponding to the aisles on the second floor, and so on.
Each node in the B-tree is called a block. Blocks on every level of the tree except the last are called branch blocks, and those on the last level are called leaf blocks. Entries in branch blocks consist of a range and a pointer to a block on the next level of the tree. Each entry in a leaf block represents a row and consists of a key value (e.g. ‘Osman
’) and a rowid
which the database uses to obtain all other data contained within the relevant row.
Traversing Across Leaf Nodes
In addition to traversing up and down the B-tree, the database also has the ability to traverse horizontally between leaf nodes. This can be useful, for example, when processing a query containing an ORDER BY
clause on the indexed column. Consider the following query:
SELECT * FROM Books ORDER BY author_surname
With our B-tree index, the database can simply start at the first leaf node in the tree, and traverse horizontally across all leaf nodes in sequence to obtain the requested ordered collection. Without this index, sorting the rows by surname would obviously be a much more complicated and slower operation.
Traversing from leaf node to leaf node is also useful when executing queries which return a range of rows based on a predicate. For example, consider the following query:
SELECT * FROM Books WHERE author_surname > ‘O’
Using our B-tree index, the database will traverse down the tree from the root until it reaches the leaf node containing rows where author_surname begins with ‘O’, and then simply traverse to the right across all remaining leaf nodes to obtain books for ‘P’, ‘Q’ and so on.
Final Words
There are a number of ways you can configure a B-tree index to meet your specific needs. In Oracle, you may want to look into index-organized tables, reverse key indexes, descending indexes and B-tree cluster indexes. In addition, you may want to consider Bitmap indexes, which are an alternative to B-tree indexes. These topics may be covered in future blog posts, but hopefully this article has given you a quick introduction to B-tree indexes.
1 There is in fact some debate as to what the B stands for in B-Tree. My definition of 'balanced' comes from the Oracle documentation.
The post B-tree Indexes appeared first on The Proactive Programmer.