Introduction
Sooner or later in your database project you will need to store hierarchical
information. For instance enterprise structure, categories of merchandise,
product catalogs, or folders with documents all are good nominees for such
structures. It is easy of course to implement that kind of store with
self-related table. Problems arrive once you need to answer hierarchical
questions. For example what if we have an enterprise structure and need to know
how many employees reporting to manager �X�? There have been already some
solutions of the problem.
See:
Joe Celko�s book and articles also covered it.
The above techniques are good when you only reading the data, but when you
need to modify the tree you will have considerable performance deprivation.
I am going to introduce method that has similar performance for reading but
better performance for modification of the tree.
The Idea
Suppose we have the following table:
Where NodeId
is the primary key, ParentId
is the
foreign key, and NodeName
is some extra field. In order to provide
simple examples let�s assume that NodeName
has unique constraint on
it.
For all examples below I will use the following tree:
Let�s create another table having two relationships with the Node
table:
Where both NodeId
and ParentId
are foreign keys
referencing Node table.
Each record in the Tree table means that the Node pointed by
Tree.NodeId
has the ancestor pointed by Tree.ParentId
.
By ancestor I understand any node lying on the path from the node to the root of
the tree, i.e. direct parent or grand parent or grand-grand parent and so
on.
Now let�s ensure one simple invariant: all ancestors of each node from Node
table are enumerated in Tree table.
For example for node 4 they are going to be 2 and 1.
The immediate result of this invariant is: this is very easy to get full path
from the node to the root of the tree - just select all records from the Tree
table related to the seeking node. Let�s write down this select. Suppose we need
to find path from node 7 to the root.
Query 1:
select p.*
from Node n, Tree t, Node p
where n.NodeName=7
and n.NodeId = t.NodeId
and t.ParentId = p.NodeId
It is a pretty strait forward select, isn�t it?
The second result of the above invariant is: we can select all descendants of
the node. This is because we are enlisting all ancestors for every node from
Node table in the Tree table including of course all ancestors of the seeking
node. In order to select entire sub tree of the node we just query for all nodes
that have the seeking node as their ancestor.
Query 2:
select c.*
from Node n, Tree t, Node c
where n.NodeName=2
and n.NodeId = t.ParentId
and t.NodeId = c.NodeId
This is as simple as that. Note that the two queries are opposite in the
direction passing through Tree table.
Implementation
Before going further let me make two notes.
# 1: from my experience it is very convenient to have one more field
in the Tree table representing the distance on the tree between nodes pointed by
NodeId and ParentId, in another words - level of ancestor. So the actual
structure includes Level field:
# 2: in most probable cases we need sub tree of the node or path from
the node to the root including the node itself. So it would be useful having in
the Tree table the node itself as its own ancestor of the level 0.
It is possible to implement the above invariant either as stored procedures
or as triggers. I personally prefer trigger one.
Let�s start with insert trigger for Node table. I will be using MS SQL
server.
First of all the trigger must create a record to satisfy # 2. This can be
done by the following statement:
insert into Tree(NodeId, ParentId, Level)
select NodeId, NodeId, 0
from inserted
Now we are ready to add the list of all ancestors of the inserted nodes. Take
in consideration that the parent of each inserting node already has enumerated
all his ancestors, so we can use them as a foundation. But we need to replace
parent�s NodeId with the id of the new node and increment the level.
insert into Tree(NodeId, ParentId, Level)
select n.NodeId, t.ParentId, t.Level + 1
from inserted n, Tree t
where n.ParentId = t.NodeId
So the overall trigger will be:
create trigger NodeInsert on Node for insert as
begin
set nocount on
insert into Tree(NodeId, ParentId, Level)
select NodeId, NodeId, 0
from inserted
insert into Tree(NodeId, ParentId, Level)
select n.NodeId, t.ParentId, t.Level + 1
from inserted n, Tree t
where n.ParentId = t.NodeId
end
go
Now it is time to define update trigger which takes care of changing node�s
parent. The goal of update trigger is to delete all obsolete records from Tree
table and create new ones. In order to minimize changes been made by this
trigger let�s reuse the information already stored in the Tree table in a manner
we did in insert trigger. What is not going to be affected by changing node�s
parent?
Apparently for updated node itself it is persistent only one record that
satisfies #2. And for descendants they are going to be permanent all records
about ancestors below updated node including this node. While every ancestor
above the updated node will obsolete.
For example if we are changing parent for node 4 then for nodes 5, 6, and 7
will be changed only ancestors above node 4 i.e. 1 and 2.
On the first step we back up all descendants of updated nodes in the
following table:
declare @child table(NodeId int, Level int)
We insert all descendants� primary keys and the level of each of them
relatively to updated nodes. Condition t.Level > 0 allows to exclude updated
node from been inserted to the @child table.
insert into @child(NodeId, Level)
select t.NodeId, t.Level
from inserted n, Tree t
where n.NodeId = t.ParentId and t.Level > 0
The second step deletes all obsolete rows for all descendants:
delete Tree
where
Tree.NodeId in(select NodeId from @child)
and Tree.ParentId in(
select t.ParentId
from inserted n, Tree t
where n.NodeId = t.NodeId and t.Level > 0
)
Condition t.Level > 0 excludes updated nodes themselves from the
deletion.
The third step removes obsolete records for updated nodes:
delete Tree
where Tree.NodeId in(select NodeId from inserted)
and Tree.Level > 0
On the next two steps we will populate new information in the Tree table.
So the forth step is going to be:
insert into Tree(NodeId, ParentId, Level)
select n.NodeId, t.ParentId, t.Level + 1
from inserted n, Tree t
where n.ParentId = t.NodeId
This is exactly how we did it in the insert trigger.
And finally the fifth step:
insert into Tree(NodeId, ParentId, Level)
select c.NodeId, t.ParentId, t.Level + c.Level
from inserted n, Tree t, @child c
where n.NodeId = t.NodeId and t.Level > 0
It might look strange that there is no any joins with @child table in this
statement. But this is what we really need here because we are going to repeat
ancestor�s information for each child. Also note that we are adding child�s
level stored in the @child table rather that just 1.
The entire trigger looks like:
create trigger NodeUpdate on Node for update as
if update(ParentId)
begin
set nocount on
declare @child table(NodeId int, Level int)
insert into @child(NodeId, Level)
select t.NodeId, t.Level
from inserted n, Tree t
where n.NodeId = t.ParentId and t.Level > 0
delete Tree
where
Tree.NodeId in(select NodeId from @child)
and Tree.ParentId in(
select t.ParentId
from inserted n, Tree t
where n.NodeId = t.NodeId and t.Level > 0
)
delete Tree
where Tree.NodeId in(select NodeId from inserted) and Tree.Level > 0
insert into Tree(NodeId, ParentId, Level)
select n.NodeId, t.ParentId, t.Level + 1
from inserted n, Tree t
where n.ParentId = t.NodeId
insert into Tree(NodeId, ParentId, Level)
select c.NodeId, t.ParentId, t.Level + c.Level
from inserted n, Tree t, @child c
where n.NodeId = t.NodeId and t.Level > 0
end
go
As you can see it will be executed only if ParentId
was updated
and all other updates of Node
table will be filtered out by the
very first if statement.
Restrictions of the implementation
If you are going to insert or update more than one Node at a time please make
sure you are not doing this for more then one level of the tree. From the
practical point of view this is not a strong restriction at all.
Remaining steps
In order to make the functionality completed we need the ability to clean the
entire Tree table and populate it from the scratch. It is probably a good idea
to create a stored procedure for this purpose, and I am not going to take away
from you the pleasure to write down such procedure in you own.
Examples
You can find SQL statements for creating both tables, both triggers, and
statements that populating Node table in the attached file. It also contains
Query 1 and 2 and statements to change node�s parent.