Common table expressions (CTEs) were introduced with SQL 2005. They are a powerful new construct for building advanced queries.
The Microsoft CTE Article describes 4 main uses for CTEs:
- Recursive queries
- Inline views
- Grouping by derived columns
- Reuse of the same table
It still amazes me how many people on the new SQL platform actually aren't familiar with or don't use CTEs. I'm going to share two quick-and-dirty examples to demonstrate just how powerful they can be.
Hierarchical Data
I've blogged a bit about hierarchical data in the past (read Hierarchical Data Templates in Silverlight for an example). SQL makes it far easier to retrieve data in a hierarchical fashion using CTEs.
Let's assume you have a content management site with articles, and articles can be grouped into hierarchical categories (i.e., one category may contain others). There are several ways to approach this, but one way is to have the main category table referenced by another that encapsulates the relationships, like this:
We'll start with an Articles section that has Whitepapers as a subcategory, which also has "Technology Related" as a subcategory, then add a FAQ which has "Using the Site" and "Logging Out" as subcategories. Our main table looks like this:
CategoryID | Category |
1 | Articles |
2 | FAQ |
3 | Whitepapers |
4 | Using the Site |
5 | Logging Out |
7 | Technology Related |
To show the relationships, we'll enter records into CategoryLevel
like this:
One real-world challenge would be showing a breadcrumb when the user is viewing a technology article. Given the child node (7, Technology Related) how do we recursively traverse the hierarchy to show all of the parent nodes?
Using a common table expression, we would alias the "top" of our query (the selection of the current category) and then join that to recursively traverse the tree, like this:
with RecursiveCat(CategoryID,Category)
as (
select
CategoryID,
Category
from dbo.Categories
where CategoryID = 7
union all
select
L.ParentCategoryID as CategoryID,
Categories.Category
from dbo.CategoryLevels L
join RecursiveCat c
on c.CategoryID = L.ChildCategoryID
inner join dbo.Categories
on L.parentCategoryID = Categories.CategoryID
)
select * from RecursiveCat
The key here is to note the "union all
" with the join
. The top query "kicks it off" and selects the target node. The union
describes a recursive template ... on the inside, where we reference the CategoryLevels
table that describes the parent-child relationships, we join to the Common Table Expression (CTE) on the outside. This means for each record, it joins back to itself and "unrolls" until the full hierarchy is exposed.
Running the above query yields:
CategoryID | Category |
7 | Technology Related |
3 | Whitepapers |
1 | Articles |
As it walks up the tree.
Ranked Data
The other most common use is for ranked data. Let's take the above example even further, and declare an articles table:
The articles simply belong to a category and have a date when they were published. I'm leaving out details like title, body, etc. for now.
A very common way to feature articles on a content website is to show the most recent article published to a given category. Assuming you have dozens of articles, what is the easiest way to do that?
Of course, if you are pragmatic, you'll probably put a MostRecentArticleID
column on the Categories
table or even introduce a new table to hold that. However, if you want to have fun (and let me keep my excuse for giving an example) you can use a rank function (we're not yet to the Common Table Expression, so bear with me). For now, try to ignore my typo as ArticlePublishData
should be ArticlePublishDate
. The first query will give you an idea of how the RANK
works:
SELECT CategoryID, ArticleID, ArticlePublishData,
ROW_NUMBER() OVER (PARTITION BY CATEGORYID ORDER BY ArticlePublishData DESC) As "MyRank"
FROM Articles
When we run this, we get all of the articles, but if you notice, the "MyRank
" column resets for each category. Within a category, it starts at the most date date with "1
" and then increments. This is because we are partitioning, or telling the CTE to reset the ranking, by the categories, and then ordering within those categories by the publish date. In this example, both 1
and 2
are in the same category, then 3
is in a new category.
CategoryID | ArticleID | ArticlePublishData | Rank |
1 | 2 | 2008-12-13 00:00:00:00 | 1 |
1 | 1 | 2006-12-14 00:00:00:00 | 2 |
2 | 3 | 2009-01-02 00:00:00:00 | 1 |
Now, it's just a simple matter to add a filter using a Common Table Expression. We'll alias the ranking function to extract the row, and then filter on the row:
;with CTE(CategoryID, ArticleID, ArticlePublishData, Row) AS (
SELECT CategoryID, ArticleID, ArticlePublishData,
ROW_NUMBER() OVER (PARTITION BY CATEGORYID ORDER BY ArticlePublishData DESC) As "MyRank"
FROM Articles)
SELECT * from CTE
where Row = 1
This will give us only the most recent article for each category.
Hopefully, this has been a useful, quick, and easy introduction to Common Table Expressions. Just search online for the phrase and you'll find plenty of examples and tutorials to help you take your SQL skills to the next level.
codeproject