Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server

Recursive CTE

2.87/5 (10 votes)
25 Oct 2016CPOL8 min read 9.4K  
A simple explanation of the recursive CTE concept of T-SQL

Introduction

A recursive CTE is a very challenging concept to grasp for newcomers to T-SQL. Forget about newcomers, even experienced folks find it a tough nut to crack and apply.

I will attempt to explain recursive CTE concept with as simple an example as I can.

Background

Below is the table setup which I will use to explain things further:

Image 1

A recursive CTE is often used to traverse a parent-child hierarchy.

If you look at the Employee Hierarchy table, you will notice a self-referential key SUPERVISORID which points back to the same table (hence the self-reference) and the column it points back to is EMPID, which in turn happens to be the primary key of the Employee Hierarchy table. SUPERVISORID, as the name suggests, is the EMPID of the person responsible for managing the employee in question. So, for example, since Rocky’s SUPERVISORID is 4, Dimitri is his boss. Dimitri has got an EMPID of 4.

If you sit and think through things for a minute, you will realize you have a ‘you don’t know how deep the rabbit hole goes’ kind of situation on your hands. Say, someone asks you to trace out the entire hierarchy chain of Rocky, starting from Rocky himself and extending all the way back to a top-of-the-food-chain head honcho who doesn’t have any supervisor to supervise him or her (the CEO, perhaps?). In our case, this head person is Aravind. But starting from Rocky himself, all that I know is that his parent or boss is someone with EMPID=4 i.e. Dimitri. Now when I knock on Dimitri’s door, I find out his boss is someone with EMPID=2 i.e. Bakshi. And like a persistent investigator determined to pursue a case, I knock on Bakshi and find that he reports to a person with EMPID=1 and voila! This is our man Arvind who doesn’t report to anyone. In our situation, it took 3 hops (Rocky to Dimitri, then Dimitri to Bakshi and finally Bakshi to Aravind) to get to the top of the corporate food chain but you do realize that you have no way of knowing that beforehand. You might as well have required 30 hops, or 300!

If someone asked you to print Rocky’s details as well as his boss’s details, you can pull it off with a simple self-join. In plain English, you can join EmployeeHierarchy with EmployeeHierarchy and spit out both Rocky’s details as well as his boss Dimitri’s details. Take a look at What is SELF JOIN and when would you use it? to understand what I am talking about.

What if someone asked you to print Rocky’s details, Dimitri’s details as well as Dimitri’s boss’s details? It takes a simple leap of logic to figure out that now you need 2 self-joins of EmployeeHierarchy. In other words, this table is participating thrice in a join relationship.

A little bit of inductive reasoning will tell you that the deeper you want to go down the rabbit hole (or the more you want to scale the corporate ladder), you will need more and more self-joins. But what if you don’t know beforehand, how deep you need to burrow? This is where a recursive logic comes in handy.

 

So, let’s start our journey into the rabbit hole. What have we got to begin with? Where is our shovel?

SELECT EMPID,
EMPNAME,
SUPERVISORID
FROM dbo.EmployeeHierarchy
WHERE EMPID=8

Rocky is EMPID=8. He is our shovel. I like to call him the ‘seed’. You can call him the alpha. T-SQL experts like to call it the ‘anchor’. Call it whatever you like. Just remember, he is our ignition key to get things started.

Now, I would proceed in a row-by-row approach. I have one row corresponding to Rocky. I would now add Rocky’s boss Dimitri now.

How do I add Dimitri? How about a UNION?

SELECT EMPID,
EMPNAME,
SUPERVISORID
FROM dbo.EmployeeHierarchy
WHERE EMPID=8

UNION

SELECT src.EMPID,
src.EMPNAME,
src.SUPERVISORID
FROM dbo.EmployeeHierarchy src
INNER JOIN <<some imaginary query which returns (8,Rocky,4) according to a schema (EMPID, EMPNAME, SUPERVISORID)>>
ON src.EMPID=<<the same imaginary query which returns (8,Rocky,4) according to a schema (EMPID, EMPNAME, SUPERVISORID)>>.SUPERVISORID

Hold on!

<<some imaginary query which returns (8,Rocky,4) according to a schema (EMPID, EMPNAME, SUPERVISORID)>>

Am I serious? What on earth do I mean by all that bloated stuff within double chevrons?

This whole bloated thing between the double chevrons (<< >>) is the key to understanding recursive CTE. When I say ‘some imaginary query which returns (8,Rocky,4) according to a schema (EMPID, EMPNAME, SUPERVISORID)’ what I mean is imagine for now that there is a sub-query out there which returns 3 columns- EMPID, EMPNAME and SUPERVISORID and for now, the sub-query has just got one record to return where EMPID=8, EMPNAME=Rocky and SUPERVISORID=4.

Imagine a sub-query? 

Just stay with me.

For convenience, let’s just refer to our imaginary sub-query as ISQ-V1 from now on, shall we? ‘Imaginary Sub Query Version 1’ in short and sweet form- ISQ-V1.

What about the inner join? Well, I am joining EmployeeHierarchy table with the result-set returned by my imaginary sub-query based on EMPID of EmployeeHierarchy and SUPERVISORID of my imaginary sub-query.

Look closely below. There is a dot followed by SUPERVISORID after the closing double chevron at the bottom right of the snippet. Which basically means that (pardon me for being repetitive) -

I am joining EmployeeHierarchy table with the result-set returned by my imaginary sub-query based on EMPID of EmployeeHierarchy and SUPERVISORID of my imaginary sub-query.

INNER JOIN <<some imaginary query which returns (8,Rocky,4) according to a schema (EMPID, EMPNAME, SUPERVISORID)>>
ON src.EMPID=<<the same imaginary query which returns (8,Rocky,4) according to a schema (EMPID, EMPNAME, SUPERVISORID)>>.SUPERVISORID

Now what does this UNION leave me with? Let’s shorten the above beastly syntax a bit for better understanding. We will use our short-hand ISQ-V1 for this.

SELECT EMPID,
EMPNAME,
SUPERVISORID
FROM dbo.EmployeeHierarchy
WHERE EMPID=8

UNION

SELECT src.EMPID,
src.EMPNAME,
src.SUPERVISORID
FROM dbo.EmployeeHierarchy src
INNER JOIN ISQ-V1
ON src.EMPID=ISQ-V1.SUPERVISORID

A result-set with 2 rows is returned by the above-

EMPID EMPNAME SUPERVISORID
8 Rocky 4
4 Dimitri 2

Let’s stretch our imagination a bit more. Imagine a new sub-query which returns the result-set above. Call it ISQ-V2.

Now, we will pick things up a notch! Consider the below query:

SELECT EMPID,
EMPNAME,
SUPERVISORID
FROM dbo.EmployeeHierarchy
WHERE EMPID=8

UNION

SELECT src.EMPID,
src.EMPNAME,
src.SUPERVISORID
FROM dbo.EmployeeHierarchy src
INNER JOIN ISQ-V1
ON src.EMPID=ISQ-V1.SUPERVISORID

UNION

SELECT src.EMPID,
src.EMPNAME,
src.SUPERVISORID
FROM dbo.EmployeeHierarchy src
INNER JOIN ISQ-V2
ON src.EMPID=ISQ-V2.SUPERVISORID

If you put your thinking cap on, you will find the above leaves you with-

EMPID EMPNAME SUPERVISORID
8 Rocky 4
4 Dimitri 2
2 Bakshi 1

I will play the ‘inductive detective’ routine again and call this result-set the result of imaginary sub-query ISQ-V3!

Can you figure out what the following query will give me?

SELECT EMPID,
EMPNAME,
SUPERVISORID
FROM dbo.EmployeeHierarchy
WHERE EMPID=8

UNION

SELECT src.EMPID,
src.EMPNAME,
src.SUPERVISORID
FROM dbo.EmployeeHierarchy src
INNER JOIN ISQ-V1
ON src.EMPID=ISQ-V1.SUPERVISORID

UNION

SELECT src.EMPID,
src.EMPNAME,
src.SUPERVISORID
FROM dbo.EmployeeHierarchy src
INNER JOIN ISQ-V2
ON src.EMPID=ISQ-V2.SUPERVISORID

UNION

SELECT src.EMPID,
src.EMPNAME,
src.SUPERVISORID
FROM dbo.EmployeeHierarchy src
INNER JOIN ISQ-V3
ON src.EMPID=ISQ-V2.SUPERVISORID

This!

EMPID EMPNAME SUPERVISORID
8 Rocky 4
4 Dimitri 2
2 Bakshi 1
1 Aravind NULL

What if I now call the above ISQ-V4 and continue my painstaking quest to uncover the top boss?

SELECT EMPID
    ,EMPNAME
    ,SUPERVISORID
FROM dbo.EmployeeHierarchy
WHERE EMPID = 8

UNION

SELECT src.EMPID
    ,src.EMPNAME
    ,src.SUPERVISORID
FROM dbo.EmployeeHierarchy src
INNER JOIN ISQ - V1 ON src.EMPID = ISQ - V1.SUPERVISORID

UNION

SELECT src.EMPID
    ,src.EMPNAME
    ,src.SUPERVISORID
FROM dbo.EmployeeHierarchy src
INNER JOIN ISQ - V2 ON src.EMPID = ISQ - V2.SUPERVISORID

UNION

SELECT src.EMPID
    ,src.EMPNAME
    ,src.SUPERVISORID
FROM dbo.EmployeeHierarchy src
INNER JOIN ISQ - V3 ON src.EMPID = ISQ - V3.SUPERVISORID

UNION

SELECT src.EMPID
    ,src.EMPNAME
    ,src.SUPERVISORID
FROM dbo.EmployeeHierarchy src
INNER JOIN ISQ - V4 ON src.EMPID = ISQ - V4.SUPERVISORID

I still land up with-

EMPID EMPNAME SUPERVISORID
8 Rocky 4
4 Dimitri 2
2 Bakshi 1
1 Aravind NULL

Notice nothing new gets added. The previous result-set from ISQ-V4 is same as what I got above. This calls to an end my painstaking quest. I have found the bottom of the rabbit hole!

Pay attention to the jumbo query what we now have on our hands. There is a pattern. If we leave out our ‘seed’ query, whatever follows after the first UNION is pretty much the same syntax over and over again with just ISQ-V1 getting upgraded to ISQ-V2, then ISQ-V3 and so on and so forth.

Notice also that ISQ-V2 has an output which includes the output of ISQ-V1. ISQ-V2 ‘owns’ ISQ-V1. In a similar inductive fashion, ISQ-V3 ‘owns’ ISQ-V2 and transitively ISQ-V1 as well. ISQ-V4 owns everyone from V3 to V1. ISQ feels like an expanding bubble, does it not? A bubble which is adding a new record with every iteration. It is like, with every iteration, a more mature, older, bigger version of ISQ is emerging from the cocoon of a less mature, younger and smaller version of ISQ.

Is there a way to create a self-training ISQ bubble, which grows on its own, feeding off from its younger self, adding records to itself as required, and which knows when to stop growing?

Sigh! Sounds like a dream, doesn’t it?

Well, that’s what a recursive CTE is. It collapses all the duplicate, repetitive UNIONs into a single, elegant looking, compact UNION ALL and tops it all with a self-referencing clause. Think of the self-referencing as the bubble referring to its previous version for obtaining the nutrition it needs to build on and grow.

;with [ISQ-bubble] as
(
SELECT EMPID,
EMPNAME,
SUPERVISORID
FROM dbo.EmployeeHierarchy
WHERE EMPID=8

UNION ALL

SELECT src.EMPID,
src.EMPNAME,
src.SUPERVISORID
FROM dbo.EmployeeHierarchy src
INNER JOIN [ISQ-bubble]
ON src.EMPID=[ISQ-bubble].SUPERVISORID
)
SELECT * FROM [ISQ-bubble]
order by empid

And that’s recursive CTE for you.

The ‘UNION ALL’ is a syntactic handcuff. You can’t replace it with ‘UNION’. Try. You will get a syntax error. You can see the self-referencing happening above. Internally, the whole thing will work like the process I described, like a detective following one lead which leads to another.

Which brings us to another question? How many leads can the detective follow? Before he drops out of exhaustion (perhaps)? Or tears his hair out in frustration and calls it quits? Well, by default SQL Server allows recursion up to 100 levels deep. You can change this with OPTION (MAXRECURSION <<specify how many levels deep you want to go>>). Here is a nice link -OPTION MAXRECURSION | SQL with Manoj .

Before I end-another neat trick and food for thought for you. If I think of a recursive CTE as an growing, expanding, self-referencing bubble which automatically knows when to stop, can you tell me what this piece of code does?

;WITH recursive_counter
AS (
    SELECT 1 AS seq   

    UNION ALL   

    SELECT recursive_counter.seq + 1
    FROM recursive_counter
    WHERE recursive_counter.seq < 1000
    )
SELECT *
FROM recursive_counter
OPTION (MAXRECURSION 1000)

Also, ask yourself what happens if I remove the filter clause ‘WHERE recursive_counter.seq<1000’ and OPTION (MAXRECURSION 1000). What happens if I remove one among these 2 and let the other remain? Try out.

P.S. No self-referencing, ever-expanding bubbles were hurt during the making of this answer.

 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)