Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

Finding the Lowest Common Ancestor in a tree

5.00/5 (1 vote)
23 Sep 2012CPOL1 min read 12.5K  
How to find the Lowest Common Ancestor in a tree.

Background

There are quite a few code snippets out there on the internet that finds the Lowest Common Ancestor. But all that I've seen has the drawbacks that they can only use two values as input and are usually vendor specific. I wanted the code to be a bit more general and to allow multiple inputs.

Using the code

This SQL script is easy enough to adjust to your needs, such as converting it to a stored procedure or a function.

The first CTE nodes is just a place holder for the indata in this demo code, and is supposed to be exchanged for your indata.

The second CTE called cte is a recursive CTE that's running the "wrong" direction and finds out all nodes above the input nodes and at the same time "marks" every row with the level in relation to the input nodes.

In the third CTE called levels we're finding out which tree IDs are existing as many times as the number of input nodes.

And then it's an easy thing to select the tree ID with the lowest level.

SQL
WITH nodes AS (
    SELECT treeid FROM tree WHERE treeid IN (14,27,32) --Exchange this for whatever input data you have
    )
,cte (treeid,lvl) AS (
    SELECT  treeid
           ,1
    FROM    nodes
    UNION all
    SELECT  parentid
           ,lvl + 1 lvl
    FROM    tree t,cte
    WHERE   t.treeid = cte.treeid
    )
,levels AS (
    SELECT  treeid
           ,Min(lvl) lvl
    FROM    cte
    HAVING  Count(lvl) = (SELECT Count(treeid) FROM nodes)
    GROUP BY treeid
    ORDER BY lvl
    )
,minlevel AS (
    SELECT  Min(lvl) minlevel
    FROM    levels
    )
SELECT  treeid
FROM    levels l,minlevel m
WHERE   l.lvl = m.minlevel

The slightly stupid construction in the last CTE have the purpose of making the construction compatible between SQL Server and Oracle. If you're using SQL Server you can easily exchange it for a TOP statement. And in Oracle ROWNUM = 1 (see below).

If you're using Oracle before 11G R2, you can't use recursive CTEs, instead you have to use the Connect By statement.

SQL
WITH nodes AS (
    SELECT treeid FROM tree WHERE treeid IN (14,27,32)
    )
,lvl AS (
    SELECT  treeid
           ,Min(LEVEL) lvl
    FROM    tree
    CONNECT BY PRIOR parentid = treeid
    START WITH treeid IN (SELECT treeid FROM nodes)
    HAVING Count(treeid) = (SELECT Count(*) cnt FROM nodes)
    GROUP BY treeid
    ORDER BY lvl
    )
SELECT  treeid
FROM    lvl
WHERE ROWNUM = 1

If you want to search for the Lowest Common Ancestor in the whole  tree rather than by just a few nodes there are more efficient queries, such as this one:

SQL
WITH HasSingleChildren AS (
    SELECT  ParentID
    FROM    Tree t
    HAVING  Count(ID) =1
    GROUP BY ParentID
  )
,AllSingleNodes as (
    SELECT  t.ID,t.ParentID
    FROM    Tree t
    JOIN    HasSingleChildren h 
        ON  t.ParentID = h.ParentID
        OR  (t.ParentID IS NULL AND h.ParentID IS NULL)
  )
,TopSingleNodes (ID,ParentID,lvl) as (
    SELECT  ID
           ,ParentID
           ,1 AS lvl
    FROM    AllSingleNodes
    WHERE   ParentID IS NULL
    UNION ALL
    SELECT  a.ID
           ,a.ParentID
           ,t.lvl + 1 as lvl
    FROM    AllSingleNodes a
    JOIN    TopSingleNodes t
        ON  a.ParentID = t.ID
    )
SELECT  ID
FROM    TopSingleNodes t
WHERE   t.lvl = (SELECT Max(lvl) FROM TopSingleNodes)

This can also be done a bit prettier in Oracle:

SQL
WITH HasSingleChildren AS (
    SELECT  ParentID
    FROM    Tree t
    HAVING  Count(ID) =1
    GROUP BY ParentID
  )
,AllSingleNodes as (
    SELECT  t.ID,t.ParentID
    FROM    Tree t
    JOIN    HasSingleChildren h 
        ON  t.ParentID = h.ParentID
        OR  (t.ParentID IS NULL AND h.ParentID IS NULL)
  )
,TopNodes as (
    SELECT  ID,ParentID,LEVEL lvl,Max(LEVEL) OVER () MaxLevel
    FROM    AllSingleNodes s
    CONNECT BY PRIOR ID = ParentID
    START WITH ParentID IS NULL
  )
SELECT  ID
FROM    TopNodes
WHERE   lvl = MaxLevel

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)