Introduction
Yes, this is the same old story of hierarchical structures stored in relational databases. Webmail content, company hierarchy, task dependencies - you name it. I came across this issue when I was designing a mini-forum component for a web portal and could not find a solution in the net. So, here is my approach. I used SQL Server 2000 and 2005.
Background
Suppose you have a table that stores some entities (messages, employees, etc.). Every entity instance can be a parent for many other instances. Some instances do not have children - they are leaf nodes in the tree hierarchy. Some instances do not have parents - they are root nodes. Consider a simple table:
create table Messages
(
[MessageID] [uniqueidentifier] primary key clustered NOT NULL ,
[ParentID] [uniqueidentifier] NULL ,
[ReceivedDate] [datetime] NOT NULL,
[Subject] varchar(256) NULL,
CONSTRAINT [FK_Messages_Messages] FOREIGN KEY ([ParentID]) <BR> REFERENCES Messages ([MessageID])
)
Yes, primary key is a GUID, so you won't be tempted to use it for ordering and sorting purposes. No, there is no "NestedLevel" field that holds generation number. Yes, MessageID is a clustered primary key, so records are stored in the database in random order. Let's not cheat. Parent-child relations are defined by a pair of GUIDs, and all sorting is performed using datetime field. That's it. Subject field is just a cosmetic feature.
Let's populate our table with some meaningful data:
insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('F24B7175-1905-4783-A8D6-0DA6F1759F2C', NULL, getdate(), 'Message 1')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('E10B2FC6-E860-4BFF-8642-29A318CDB6CE', <BR>'F24B7175-1905-4783-A8D6-0DA6F1759F2C', dateadd(day, 1, getdate()), <BR>'Message 1-1')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('E931AC59-4356-48AD-A587-9F0246532733', <BR>'F24B7175-1905-4783-A8D6-0DA6F1759F2C', dateadd(day, 2, getdate()), <BR>'Message 1-2')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('FAEF6F5B-54AF-452F-A72E-3DBD21F643FD', <BR>'E931AC59-4356-48AD-A587-9F0246532733', dateadd(day, 2, getdate()), <BR>'Message 1-2-1')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('5B1D05CE-7889-4BD9-8ABC-20A1269668F5', <BR>'FAEF6F5B-54AF-452F-A72E-3DBD21F643FD', dateadd(day, 2, getdate()), <BR>'Message 1-2-1-1')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('A3E5F551-6531-4D12-8A44-7C45C81C181A', <BR>'FAEF6F5B-54AF-452F-A72E-3DBD21F643FD', dateadd(day, 3, getdate()), <BR>'Message 1-2-1-2')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('298D6A02-5BB6-4EDD-8030-7E15CD16A021', <BR>'A3E5F551-6531-4D12-8A44-7C45C81C181A', dateadd(day, 3, getdate()), <BR>'Message 1-2-1-2-1')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('CBCE69FF-19C1-47E7-842A-4944BA146BAA', <BR>'A3E5F551-6531-4D12-8A44-7C45C81C181A', dateadd(day, 4, getdate()), <BR>'Message 1-2-1-2-2')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('3C521C3A-AA40-49D0-83FA-E5101913A0E3', <BR>'CBCE69FF-19C1-47E7-842A-4944BA146BAA', dateadd(day, 4, getdate()), <BR>'Message 1-2-1-2-2-1')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('50D2CA2B-A484-482F-A41F-ADFF1A636BE3', <BR>'A3E5F551-6531-4D12-8A44-7C45C81C181A', dateadd(day, 5, getdate()), <BR>'Message 1-2-1-2-3')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('71AD6FBD-EE92-473A-A038-C59CB7FF658A', <BR>'E931AC59-4356-48AD-A587-9F0246532733', dateadd(day, 3, getdate()), <BR>'Message 1-2-2')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('C300611C-D8EF-419B-8736-97AAE62108D8', <BR>'F24B7175-1905-4783-A8D6-0DA6F1759F2C', dateadd(day, 3, getdate()), <BR>'Message 1-3')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('A4E8E001-99F1-450F-A4D7-3E0A491A085E', <BR>'C300611C-D8EF-419B-8736-97AAE62108D8', dateadd(day, 3, getdate()), <BR>'Message 1-3-1')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('CD7F8A67-B735-4590-A92C-D91F0341DD7F', <BR>'C300611C-D8EF-419B-8736-97AAE62108D8', dateadd(day, 4, getdate()), <BR>'Message 1-3-2')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('31A05919-1BCB-403E-807A-56B50B776C4E', NULL, getdate(), 'Message 2')
insert into Messages (MessageID, ParentID, ReceivedDate, Subject) <BR>values ('A731F5DC-88F7-49E7-BCBB-F912C69BFC7C', <BR>'31A05919-1BCB-403E-807A-56B50B776C4E', dateadd(day, 1, getdate()), <BR>'Message 2-1')
We are on a quest to retrieve these records so they can be easily presented, say, on a webmail application page:
Id Indent Subject
------------------------------------ ----------- -------------------
F24B7175-1905-4783-A8D6-0DA6F1759F2C 1 Message 1
E10B2FC6-E860-4BFF-8642-29A318CDB6CE 2 Message 1-1
E931AC59-4356-48AD-A587-9F0246532733 2 Message 1-2
FAEF6F5B-54AF-452F-A72E-3DBD21F643FD 3 Message 1-2-1
5B1D05CE-7889-4BD9-8ABC-20A1269668F5 4 Message 1-2-1-1
A3E5F551-6531-4D12-8A44-7C45C81C181A 4 Message 1-2-1-2
298D6A02-5BB6-4EDD-8030-7E15CD16A021 5 Message 1-2-1-2-1
CBCE69FF-19C1-47E7-842A-4944BA146BAA 5 Message 1-2-1-2-2
3C521C3A-AA40-49D0-83FA-E5101913A0E3 6 Message 1-2-1-2-2-1
50D2CA2B-A484-482F-A41F-ADFF1A636BE3 5 Message 1-2-1-2-3
71AD6FBD-EE92-473A-A038-C59CB7FF658A 3 Message 1-2-2
C300611C-D8EF-419B-8736-97AAE62108D8 2 Message 1-3
A4E8E001-99F1-450F-A4D7-3E0A491A085E 3 Message 1-3-1
CD7F8A67-B735-4590-A92C-D91F0341DD7F 3 Message 1-3-2
31A05919-1BCB-403E-807A-56B50B776C4E 1 Message 2
A731F5DC-88F7-49E7-BCBB-F912C69BFC7C 2 Message 2-1
Do you want it fast?
Search the net: it's full of samples that use cursors, recursion and stack tables. They work. But I want it to work fast. I love joins. SQL Server is good at making joins, let's use them.
We can come up with some SQL code that links all messages on the first few levels.
declare @t table (
l1 uniqueidentifier, l2 uniqueidentifier, l3 uniqueidentifier, <BR>l4 uniqueidentifier,
id uniqueidentifier, depth int)
insert into @t
select m1.MessageID, m2.MessageID, m3.MessageID, m4.MessageID, null, 1
from Messages as m1
left outer join Messages m2 on m1.MessageID=m2.ParentID
left outer join Messages m3 on m2.MessageID=m3.ParentID
left outer join Messages m4 on m3.MessageID=m4.ParentID
where m1.ParentID is NULL
declare @depth int
update @t set depth = depth + 1 where l2 is not null
update @t set depth = depth + 1 where l3 is not null
update @t set depth = depth + 1 where l4 is not null
select @depth = max(depth) from @t
insert into @t select distinct l1, NULL, NULL, NULL, NULL, 1 from @t <BR>where l2 is not NULL
insert into @t select distinct l1, l2, NULL, NULL, NULL, 2 from @t<BR>where l3 is not NULL
insert into @t select distinct l1, l2, l3, NULL, NULL, 3 from @t <BR>where l4 is not NULL
update @t set id=coalesce(l4, l3, l2, l1)
select id, depth, m.Subject
from @t as t
left outer join Messages as m1 on m1.MessageID=t.l1
left outer join Messages as m2 on m2.MessageID=t.l2
left outer join Messages as m3 on m3.MessageID=t.l3
left outer join Messages as m4 on m4.MessageID=t.l4
left outer join Messages as m on m.MessageID=t.id
order by m1.ReceivedDate, l1, m2.ReceivedDate, l2, m3.ReceivedDate, l3, <BR> m4.ReceivedDate, l4
It seems to work fine. Look at the results, all levels from 1 to 4 are displayed in the proper order:
id depth Subject
------------------------------------ ----------- -----------
F24B7175-1905-4783-A8D6-0DA6F1759F2C 1 Message 1
E10B2FC6-E860-4BFF-8642-29A318CDB6CE 2 Message 1-1
E931AC59-4356-48AD-A587-9F0246532733 2 Message 1-2
FAEF6F5B-54AF-452F-A72E-3DBD21F643FD 3 Message 1-2-1
5B1D05CE-7889-4BD9-8ABC-20A1269668F5 4 Message 1-2-1-1
A3E5F551-6531-4D12-8A44-7C45C81C181A 4 Message 1-2-1-2
71AD6FBD-EE92-473A-A038-C59CB7FF658A 3 Message 1-2-2
C300611C-D8EF-419B-8736-97AAE62108D8 2 Message 1-3
A4E8E001-99F1-450F-A4D7-3E0A491A085E 3 Message 1-3-1
CD7F8A67-B735-4590-A92C-D91F0341DD7F 3 Message 1-3-2
31A05919-1BCB-403E-807A-56B50B776C4E 1 Message 2
A731F5DC-88F7-49E7-BCBB-F912C69BFC7C 2 Message 2-1
Do you want it big?
Major problem: the number of levels this code can handle is hardcoded. Looks like it's time to pull out the ace from your sleeve and write a simple code generator that builds SQL code that makes ten, twenty, thirty joins and brings you the whole hierarchy in one shot. That's what I did. And it worked. But:
- only for modest amounts of data and depths; 30-messages-deep forum thread is not something unusual, and 20 joins is a lot of work for SQL Server already;
- only for limited depths: the query that used 19 joins didn't fit into varchar(8000) variable and could not be executed.
I am not even going to publish this code here, believe me: it works, but it looks really scary. The third reason for not using code generator was ... well, you may call it religious: I just hate dynamic SQL. Every time I have to put exec(@sql)
call I have a feeling that I unleash a small beast that eventually will make the programmer's life miserable.
Do you want it to work?
Time to remember about cursors and recursion. I have implemented a recursive function because SQL Server allows table-valued functions and the code looks smooth. Your SQL dialect may offer other opportunities for recursion, so do not hesitate to try them. The idea is simple: pull out a few (four) hierarchy levels in one shot, analyze, drill down if needed using cursor and recursion.
CREATE FUNCTION BrowseMessages
( @rootMessageID uniqueidentifier,
@rootDepth int,
@isDrillDown bit )
RETURNS @RtnValue table (Id uniqueidentifier, Depth int, <BR> Subject varchar(256))
AS
begin
declare @t table (
l1 uniqueidentifier, l2 uniqueidentifier, l3 uniqueidentifier, <BR> l4 uniqueidentifier,
id uniqueidentifier, depth int)
insert into @t
select
m1.MessageID, m2.MessageID, m3.MessageID, m4.MessageID, null, 1
from Messages as m1
left outer join Messages m2 on m1.MessageID=m2.ParentID
left outer join Messages m3 on m2.MessageID=m3.ParentID
left outer join Messages m4 on m3.MessageID=m4.ParentID
where (@rootMessageID is NULL and m1.ParentID is NULL) or <BR> (@rootMessageID is NOT NULL and m1.ParentID=@rootMessageID)
declare @depth int
update @t set depth = depth + 1 where l2 is not null
update @t set depth = depth + 1 where l3 is not null
update @t set depth = depth + 1 where l4 is not null
select @depth = max(depth) from @t
insert into @t select distinct l1, NULL, NULL, NULL, NULL, 1 from @t <BR> where l2 is not NULL
insert into @t select distinct l1, l2, NULL, NULL, NULL, 2 from @t <BR> where l3 is not NULL
insert into @t select distinct l1, l2, l3, NULL, NULL, 3 from @t <BR> where l4 is not NULL
update @t set id=coalesce(l4, l3, l2, l1)
declare @tt table (Id uniqueidentifier, Depth int, Subject varchar(256))
insert into @tt
select Id=id, Depth=depth+@rootDepth, m.Subject
from @t as t
left outer join Messages as m1 on m1.MessageID=t.l1
left outer join Messages as m2 on m2.MessageID=t.l2
left outer join Messages as m3 on m3.MessageID=t.l3
left outer join Messages as m4 on m4.MessageID=t.l4
left outer join Messages as m on m.MessageID=t.id
order by m1.ReceivedDate, l1, m2.ReceivedDate, l2, m3.ReceivedDate, <BR> l3, m4.ReceivedDate, l4
declare @maxCurrentDepth int
select @maxCurrentDepth=max(Depth)-@rootDepth from @tt
if (@isDrillDown <> 1 or @maxCurrentDepth < 4)
begin
insert into @RtnValue select * from @tt
end
else
begin
declare crs cursor read_only fast_forward for select Id, Depth, <BR> Subject from @tt
open crs
declare @nextId uniqueidentifier, @nextDepth int, <BR> @nextSubject varchar(256)
while 1=1
begin
fetch next from crs into @nextId, @nextDepth, @nextSubject
if @@fetch_status <> 0 break
insert into @RtnValue values (@nextId, @nextDepth, @nextSubject)
if (@nextDepth-@rootDepth = 4)
begin
insert into @RtnValue select * from <BR> BrowseMessages(@nextId, @nextDepth, 1)
end
end
close crs
deallocate crs
end
return
end
It works (it retrieved those nodes at levels 5 and 6):
select * from BrowseMessages(NULL, 0, 1)
Id Depth Subject
------------------------------------ ----------- -------------------
F24B7175-1905-4783-A8D6-0DA6F1759F2C 1 Message 1
E10B2FC6-E860-4BFF-8642-29A318CDB6CE 2 Message 1-1
E931AC59-4356-48AD-A587-9F0246532733 2 Message 1-2
FAEF6F5B-54AF-452F-A72E-3DBD21F643FD 3 Message 1-2-1
5B1D05CE-7889-4BD9-8ABC-20A1269668F5 4 Message 1-2-1-1
A3E5F551-6531-4D12-8A44-7C45C81C181A 4 Message 1-2-1-2
298D6A02-5BB6-4EDD-8030-7E15CD16A021 5 Message 1-2-1-2-1
CBCE69FF-19C1-47E7-842A-4944BA146BAA 5 Message 1-2-1-2-2
3C521C3A-AA40-49D0-83FA-E5101913A0E3 6 Message 1-2-1-2-2-1
50D2CA2B-A484-482F-A41F-ADFF1A636BE3 5 Message 1-2-1-2-3
71AD6FBD-EE92-473A-A038-C59CB7FF658A 3 Message 1-2-2
C300611C-D8EF-419B-8736-97AAE62108D8 2 Message 1-3
A4E8E001-99F1-450F-A4D7-3E0A491A085E 3 Message 1-3-1
CD7F8A67-B735-4590-A92C-D91F0341DD7F 3 Message 1-3-2
31A05919-1BCB-403E-807A-56B50B776C4E 1 Message 2
A731F5DC-88F7-49E7-BCBB-F912C69BFC7C 2 Message 2-1
What's next?
Tune-up this code for your particular needs. Say, if you have a few root nodes and deep hierarchy, consider increasing the number of joins. If you have thousands of root nodes with just one level of children, consider joining only first three level instead of four. Add paging and filtering. Use other helpful fields you have in you hierachy tables. Use proper indexing. If you anticipate extremely deep trees and 4x32=128 levels is not enough (4 is the number of joins, and SQL Server 2000 limits the number of nesting calls or nesting levels to 32), increase the number of joins or do not use recursive functions.
Enjoy!