(untagged)

Improve hierarchy performance using nested sets

James Simpson

0.00/5 (No votes)

17 May 2003

This article describes how to use nested sets to improve performance of hierarchies within SQL Server and other relational databases

Download source files - 2 Kb

Introduction

This article and accompanying code demonstrates the theory and practice behind "Nested Sets". The theory is applicable with any type hierarchy in any database, however the example code was written using SQL Server 2000 and every attempt has been made to make it compatible with most major SQL providers.

The Theory

Nested sets rely on a demoralization of parts of the hierarchy, They can be used to efficiently use a hierarchy without relying on the Parent / Child link. How a person usually views or represents a hierarchy is by using parent child relationship, this can be seen below:

The above hierarchy shows a make-believe company organisational structure, this data can be compiled into a database table and represented very easily, the SQL Script below shows how to create the table and populate the data:

-- Create the SQL Table

CREATE TABLE Employee
(
  EmployeeID  INT  IDENTITY(1, 1) NOT NULL,
  ParentID    INT NULL,
  JobTitle    VARCHAR(50) NOT NULL,
  FirstName   VARCHAR(50) NOT NULL, 
  PRIMARY KEY(EmployeeID),
  FOREIGN KEY ParentID REFERENCES Employee (EmployeeID)
)

GO

And populate the data:

-- Set identity insert ON so we can insert the employee ID

SET IDENTITY_INSERT Employee ON 
SET NOCOUNT ON

INSERT INTO Employee (EmployeeID, ParentID, JobTitle, FirstName) 
  VALUES (1, NULL, 'Managing Director', 'Bill')
-- Employees under Bill

INSERT INTO Employee (EmployeeID, ParentID, JobTitle, FirstName) 
  VALUES (2, 1, 'Customer Services', 'Angela')
INSERT INTO Employee (EmployeeID, ParentID, JobTitle, FirstName) 
  VALUES (3, 1, 'Development Manager', 'Ben')
-- Employess under angela

INSERT INTO Employee (EmployeeID, ParentID, JobTitle, FirstName) 
  VALUES (4, 2, 'Assistant 1', 'Henry')
INSERT INTO Employee (EmployeeID, ParentID, JobTitle, FirstName) 
  VALUES (5, 2, 'Assistant 2', 'Nicola')
-- Employees under ben

INSERT INTO Employee (EmployeeID, ParentID, JobTitle, FirstName) 
  VALUES (6, 3, 'Snr Developer', 'Kerry')
INSERT INTO Employee (EmployeeID, ParentID, JobTitle, FirstName) 
  VALUES (7, 3, 'Assistant', 'James')
-- Emplyees under kerry

INSERT INTO Employee (EmployeeID, ParentID, JobTitle, FirstName) 
  VALUES (8, 6, 'Jrn Developer', 'Tim')
SET NOCOUNT OFF

SET IDENTITY_INSERT Employee OFF

The above SQL code will create the table needed to represent the above hierarchy. Most hierarchy are developed using the above ParentID method, and usually it is enough to get by with especially if the hierarchy has a fixed amount of levels. However if a large part of an application is based around the hierarchy and the hierarchy is dynamic, then the above method becomes less efficient and very costly to use. Various common hierarchy operations become difficult. For instance finding the leaf node of the hierarchy using the ParentID becomes very difficult and can not easily be retrieved with a simple SELECT statement. To find any parent of a child at any level of the hierarchy you must use the ParentID of each node, this is also difficult and inefficient to retrieve with a simple select statement.

This is where nested sets become infinitely useful. If we were to represent the above hierarchy slightly differently we can gain a lot of advantages. Below is the same hierarchy represented as nested sets:

Although at first this is slightly more confusing to read visually, to a computer system this is a cinch. The above diagram is indicating that each node is aware of all its descendents, and vice versa. For example, we can see easily enough from the diagram that Ben and Angela are children of Bill because they are contained within Bill. We can also see that Tim And James are children of Ben because they are contained within Ben. This ultimately gives us a link between each member of the hierarchy.

For us to represent this is data terms we need to store two more values per node, these are the extent values of the set. These are shown below on the same diagram in blue and red.

Lets call these Left (Red Values) and Right (Blue Values). These values should be stored on the Employee table along with the ParentID and this is all we need to fully represent a nested sets hierarchy.

-- Alter the employee table to include the left and right extent values

ALTER TABLE Employee ADD [LeftExtent] INT NULL
ALTER TABLE Employee ADD [RightExtent] INT NULL

GO

By using the Left and Right values of the database you can pick out some simple rules and simple queries:

-- If Left is between parent left and parent right then the node is a child

SELECT * FROM Employee 
    CROSS JOIN (
                       SELECT [LeftExtent], [RightExtent] 
                       FROM Employee WHERE Firstname = 'Ben'
        ) AS Parent
WHERE Employee.[LeftExtent] 
BETWEEN Parent.[LeftExtent] and Parent.[RightExtent]

-- All Leaf nodes (an item with any children) can be 

-- identified by Left = Right - 1

SELECT * FROM Employee WHERE [LeftExtent] = [RightExtent] - 1

More advanced queries can be used, but before these are explained the Left and Right values must be populated. Below is the SQL code used to to populate the Left and Right values, the explanation for how it work is contained in the comments:

-- Create the stack table

DECLARE @tmpStack TABLE 
  (
    ID INT IDENTITY(1, 1), 
    EmployeeID INT NOT NULL, 
    LeftExtent INT NULL, 
    RightExtent INT NULL
  )

-- what we do is start from a parent, in this instance

-- we want to start from the very top which is the NULL parent

DECLARE @parentid AS INTEGER
SET @parentid = NULL 

-- our counter is the variable used to set the left and right extent

DECLARE @counter AS INTEGER
SET @counter = 1

-- what we do is go check the parentid for children, if it 

-- has children we push

-- it into the stack and move on to the next child

DECLARE @id AS INTEGER
DECLARE @oldid AS INTEGER

SET NOCOUNT ON

WHILE 1 = 1
BEGIN
SET @id = NULL

  -- get the first row which is a child of @partentid, which has not already

  -- been pushed onto the stack

  INSERT INTO @tmpStack
    (
      EmployeeID, 
      LeftExtent
    )
  SELECT TOP 1 EmployeeID, @counter 
  FROM Employee 
  WHERE ISNULL(ParentID, 0) = ISNULL(@parentid,0) 
  AND EmployeeID NOT IN (SELECT EmployeeID FROM @tmpStack)



  -- just check we are getting the right identity value

  -- this value does not get reset once its reas so we 

  -- need to see if its the same as the older one

  SET @id = SCOPE_IDENTITY()
  IF ISNULL(@id, 0) = ISNULL(@oldid, 0)
    SET @id = NULL
    ELSE
  BEGIN
    SET @oldid = @id
    SET @counter = @counter + 1
  END


  -- if we have children (i.e if this operation inserted a row, 

  -- then we carry on

  IF @id IS NULL
  BEGIN
    -- no rows left, which means we want to pop the last item

    SELECT TOP 1 @id = ID FROM @tmpStack
    WHERE RightExtent IS NULL ORDER BY ID DESC
    
    -- there are no items left to pop, so exit the procedure

    IF @id IS NULL
      BREAK

    UPDATE @tmpStack
    SET RightExtent = @counter
    WHERE ID = @id

    SET @counter = @counter + 1 

    SELECT @parentid = ParentID 
    FROM Employee WHERE EmployeeID = 
      (SELECT EmployeeID FROM @tmpStack WHERE ID = @id)
  END
  ELSE
  BEGIN
    -- this is easy enough, move on to the next level

    -- we want the parent id of the item just inserted 

    SELECT @parentid = EmployeeID FROM @tmpStack WHERE ID = @id
  END
END

-- And update all the Items in Hier_Dept

-- in the original hierarchy table

UPDATE e
SET LeftExtent = ts.LeftExtent,
RightExtent = ts.RightExtent
FROM Employee e
INNER JOIN @tmpStack ts
ON ts.EmployeeID = e.EmployeeID

GO

The logic to the above code is simple enough, if you imagine the hierarchy as its original structure, the left and right values need to be numbered as if you were tracing a line around the outside of the tree e.g.:

Below are some more examples or using the left and right values to query the data:

-- Which level of the hierarchy are the employees on

SELECT COUNT(P2.EmployeeID) AS level, 
  P1.FirstName,
  P1.LeftExtent ,
  P1.RightExtent
FROM Employee AS P1
INNER JOIN Employee AS P2 
  ON (P1.LeftExtent BETWEEN P2.LeftExtent AND P2.RightExtent)
GROUP BY P1.FirstName, P1.LeftExtent, P1.RightExtent
ORDER BY Level ASC

-- counting the employees of a manager (children of a node)

SELECT p1.FirstName,
COUNT(p2.EmployeeID) Employees
FROM Employee p1
INNER JOIN Employee p2
ON (p2.LeftExtent BETWEEN p1.LeftExtent AND p1.RightExtent 
    AND p2.LeftExtent <> p1.LeftExtent)
GROUP BY p1.FirstName
ORDER BY Employees DESC

That's about it. Nested sets in a nut-shell. I hope you have found the code useful and interesting. If you need more detailed information about the theory there are plenty of SQL websites around which will have some information on this subject. Any corrections please don't hesitate to email me!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here