Introduction
The following article describes an implementation of a Project Scheduling Engine in Microsoft SQL Server by using recursive common table expression designed for high performance.
The solution provides project forecasting functionality based on predecessor and successor dependencies.
Background
Sometimes the complexity of project scheduling goes beyond the capabilities of commercial software systems like Primavera P6. Because these solutions were designed to fit most of the companies' requirement, they have limitations. The cost of bypassing these limitations exceeds the cost of building new software which fulfils all the functionality requirement with fairly high performance.
In my case the size and the complexity of the projects ended up with over 3 million milestones in different programmes, different projects and over 7000 different sites. The milestones had cross dependencies between Programmes, Projects, Sites and used different Work Breakdown Structures (WBS).
The solution below shows how to build a single stored procedure to forecast millions of milestones effectively within seconds.
Using the code
My sample project is to build a house. To simplify the process I'll have a project start date, then 5 milestones and a housewarming party as the last milestone.
The goal of this exercise is to forecast the date of the housewarming party, by knowing only the project start date and the duration of each activity.
The engine that we design will update the successor milestones' forecast date if any of the predecessors' forecast date or completed date has changed.
To hold all the milestones which are related to my project, I am creating a table called tblMilestone.
The table stores the name of the milestones, forecast date, completed date and the duration of that activity.
CREATE TABLE tblMilestone (
MilestoneId INT IDENTITY(1, 1) ,
Milestone NVARCHAR(128) NOT NULL ,
ForecastDate DATETIME NULL ,
CompletedDate DATETIME NULL ,
DurationInDays INT NOT NULL
DEFAULT ( 0 ) ,
PRIMARY KEY ( MilestoneId ) )
GO
In this exercise I won't go into details how to fine tune with indexes and unique keys, I try to describe the solution as simple as possible.
At the beginning, the only thing I know is that I will commence the work on a certain date, and the time I allocate to each activity to be finished.
Our plan is to build a process, which updates the forecast date by calculations.
INSERT INTO dbo.tblMilestone
( Milestone, ForecastDate, DurationInDays )<
VALUES ( N'Project Start', '2016 Feb 1', 0 ),
( N'Buy Land', NULL, 30 ),
( N'Build Walls', NULL, 20 ),
( N'Build Roof', NULL, 15 ),
( N'Build Garden', NULL, 30 ),
( N'Key Handover', NULL, 2 ) ,
( N'Housewarming party', NULL, 5 )
If the project milestones were sequential, we would add sum of duration days to the start date to get the date of the housewarming party. But some of the activity can be started at the same time as others.
Only thing I have to do in order to start building the garden is to buy the land, in theory, I can start build the garden at the same time when I build the walls. Even though this is not the most practical approach, but assume for example sake.
In case of normal parent and child relationship we could store the dependencies in tblMilestone. However, it would limit the number of parent milestone to one.
Let assume, at the time when we have bought the land we will start building the walls and the garden too. After we finished the walls, we can start building the roof.
So the time necessary to build the walls and roof is 35 days and the garden would take 30 days. Therefore the earliest date for the housewarming party when both the activities have finished which is 35 days plus 2 days for the key hand.
Therefore, in order to kick off the housewarming party we need to finish all the activities.
In project management terms, the key handover milestone has two predecessors
1., Complete building the roof (please note: building the roof milestone has predecessor of building the walls)
2., Complete building the garden
To map this predecessor-successor relationship we need another table.
CREATE TABLE tblMilestoneDependency (
MilestoneDependencyId INT IDENTITY(1, 1) ,
PredecessorMilestoneId INT NULL --reference of the milestone to be finished before our activity can be started
,
MilestoneId INT NOT NULL --current milestone in question
,
PRIMARY KEY ( MilestoneDependencyId ) )
GO
ALTER TABLE dbo.tblMilestoneDependency
ADD CONSTRAINT FK_tblMilestoneDependency_PredecessorMilestoneId FOREIGN KEY (PredecessorMilestoneId) REFERENCES tblMilestone (MilestoneId)
GO
ALTER TABLE dbo.tblMilestoneDependency
ADD CONSTRAINT FK_tblMilestoneDependency_MilestoneId FOREIGN KEY (MilestoneId) REFERENCES tblMilestone (MilestoneId)
The start milestone has no predecessor, by inserting NULL the process will interpret as start activity
INSERT INTO dbo.tblMilestoneDependency
( PredecessorMilestoneId, MilestoneId )
VALUES ( NULL, 1 ),
( 1, 2 ), -- buy the land when the project started
( 2, 3 ), -- start building the wall when the land has bought
( 2, 5 ), -- start building the garden when the land has bought
( 3, 4 ), -- start building the roof when the walls are standing
( 4, 6 ), -- start organising the key handover when the garden and the roof finished
( 5, 6 ),
( 6, 7 ) -- start organising the housewawrming party
In summary, our project has a "fork" dependency. It includes an example when one milestone has multiple successors, in other words, when one activity finished two other activity can be started.
And also includes example when multiple activities needs to be finished before another can be started. So we can say that our example covers most of the situation in project management.
So lets write a query to get the forecast date of the predecessors, add the duration time of the activity and show the new forecast date of the successor.
An easy way would be to loop through the successor milestones and update them with the new forecast date. However the performance of that query would be very low.
But what would happen when the company runs 50 programme, each has 5-15 projects and each project has about 120 milestones and these milestones needs to be achieved in 7000 different sites? The milestones sometimes have cross dependency between programmes. For example: In order to start a milestone in Project B another milestone needs to be finished in Project A.
By the fact that the number of milestones can easily go over millions an inefficient scheduler can cause delay in triggering the next activities.
The query below shows how the second milestone's ForecastDate could be updated from the first milestone. However this script does not go over on the chain, but updates always the next cascading milestone.
SELECT predm.Milestone PredecessorMilestone ,
m.Milestone ,
predm.ForecastDate PredecessorForecastDate ,
DATEADD(DAY, m.DurationInDays,
CASE WHEN md.PredecessorMilestoneId IS NULL
THEN m.ForecastDate
ELSE predm.ForecastDate
END) ,
m.DurationInDays
FROM dbo.tblMilestoneDependency md
INNER JOIN dbo.tblMilestone m
ON m.MilestoneId = md.MilestoneId -- show all the dependencies of the milestones
LEFT JOIN dbo.tblMilestone predm
ON predm.MilestoneId = md.PredecessorMilestoneId -- show the name of predecessors if there is any
ORDER BY md.MilestoneId
In the result set below, we can see the new forcast date for 'Buy Land' milestone, which is 30 days after 'Project Start' Forecast Date.
Introducing Recursice Common Table Expression
Fortunately recursive common table expression was introduced in SQL, therefore we are able to populate the chain of milestones with one single query.
Let's modify the previous query and use as the source table in our recursive common table expression.
;WITH cteCurrentSchedule
AS ( SELECT predm.Milestone PredecessorMilestone ,
m.Milestone ,
predm.MilestoneId PredecessorMilestoneId ,
m.MilestoneId ,
predm.ForecastDate PredecessorForecastDate ,
m.ForecastDate ,
m.CompletedDate,
m.DurationInDays
FROM dbo.tblMilestoneDependency md
INNER JOIN dbo.tblMilestone m
ON m.MilestoneId = md.MilestoneId -- show all the dependencies of the milestones
LEFT JOIN dbo.tblMilestone predm
ON predm.MilestoneId = md.PredecessorMilestoneId -- show the name of predecessors if there is any
),
cteNewSchedule
AS (
--select the start milestones
SELECT sch.Milestone ,
sch.MilestoneId ,
COALESCE(sch.CompletedDate, sch.ForecastDate) ForecastDate ,
sch.DurationInDays
FROM cteCurrentSchedule sch
WHERE sch.PredecessorMilestoneId IS NULL
UNION ALL
SELECT cs.Milestone ,
cs.MilestoneId ,
COALESCE(cs.CompletedDate,
DATEADD(DAY, cs.DurationInDays,
ns.ForecastDate)) ,
cs.DurationInDays
FROM cteNewSchedule ns
INNER JOIN cteCurrentSchedule cs
ON ns.MilestoneId = cs.PredecessorMilestoneId
)
SELECT u.Milestone ,
u.MilestoneId ,
m.ForecastDate ,
MAX(u.ForecastDate) NewForecastDate ,
m.CompletedDate ,
u.DurationInDays --take the longest parth if multiple available
FROM cteNewSchedule AS u
INNER JOIN dbo.tblMilestone m
ON m.MilestoneId = u.MilestoneId
GROUP BY u.Milestone ,
u.MilestoneId ,
m.ForecastDate ,
m.CompletedDate ,
u.DurationInDays
ORDER BY MAX(u.ForecastDate)
The result set shows the Forecast Date for all the dependant milestones.
With that result, you can update your original table (tblMilestone) with the latest Forecast dates to represent the data on a user control (not covered in this article).
In summary, we can place the logic into the store procedure. When the stored procedure executed it updates the forecast dates if any of the milestone predecessors changed.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author: <Author, Csaba Hegyi>
-- Create date: <Create Date, 2015 Sep 21>
-- Description: <Description, This stored procedure updates milestones forecast dates if any of the predecessors' forecast date od completition date changed>
-- =============================================
ALTER PROCEDURE sp_Milestone_Reschedule
AS
BEGIN
SET NOCOUNT ON;
;WITH cteCurrentSchedule
AS ( SELECT predm.Milestone PredecessorMilestone ,
m.Milestone ,
predm.MilestoneId PredecessorMilestoneId ,
m.MilestoneId ,
predm.ForecastDate PredecessorForecastDate ,
m.ForecastDate ,
m.CompletedDate ,
m.DurationInDays
FROM dbo.tblMilestoneDependency md
INNER JOIN dbo.tblMilestone m
ON m.MilestoneId = md.MilestoneId -- show all the dependencies of the milestones
LEFT JOIN dbo.tblMilestone predm
ON predm.MilestoneId = md.PredecessorMilestoneId -- show the name of predecessors if there is any
),
cteNewSchedule
AS (
--select the start milestones
SELECT sch.Milestone ,
sch.MilestoneId ,
COALESCE(sch.CompletedDate, sch.ForecastDate) ForecastDate ,
sch.DurationInDays
FROM cteCurrentSchedule sch
WHERE sch.PredecessorMilestoneId IS NULL
UNION ALL
SELECT cs.Milestone ,
cs.MilestoneId ,
COALESCE(cs.CompletedDate,
DATEADD(DAY, cs.DurationInDays,
ns.ForecastDate)) ,
cs.DurationInDays
FROM cteNewSchedule ns
INNER JOIN cteCurrentSchedule cs
ON ns.MilestoneId = cs.PredecessorMilestoneId
),
cteLongestPathPerMilestone
as
(
SELECT ns.MilestoneId, MAX(ns.ForecastDate) ForecastDate FROM cteNewSchedule ns
GROUP BY ns.MilestoneId
)
UPDATE m
SET ForecastDate = u.ForecastDate
FROM cteLongestPathPerMilestone AS u
INNER JOIN dbo.tblMilestone m
ON m.MilestoneId = u.MilestoneId
WHERE m.CompletedDate IS NULL OR m.ForecastDate IS NULL
END
GO
The stored procedure reschedules only milestones where completed date is not entered yet. As an example, I'll update the comleted date for two of the milestones to see how the proc calculates the forecast dates for the successor milestones.
It will use the predecessor milestone's completed date as the source of truth rather than its forecast date.
UPDATE dbo.tblMilestone
SET CompletedDate = DATEADD(DAY, 3, ForecastDate)
WHERE MilestoneId = 1
GO
UPDATE dbo.tblMilestone
SET CompletedDate = DATEADD(DAY, 15, ( SELECT ForecastDate
FROM dbo.tblMilestone
WHERE MilestoneId = 1
))
WHERE MilestoneId = 2
To see the result:
EXEC sp_Milestone_Reschedule
SELECT * FROM dbo.tblMilestone m
and the result set is:
I hope you enjoyed the article, and if you have any question, please don't hesitate contacting me.
Points of Interest
One of the most challenging part was in designing the solution, when a milestone has multiple predecessors, which means one of the predecessor already completed, but the milestone's activity still has to wait untill the longest path completed.
I came across with the solution to calculate the forecast dates for all the possible path, and take the maximum date (longest path) to update the milestone. This is represented in the final select statement as NewForecastDate.
The real application is more complex than this sample project. That sample covers only one milestone dependency type which is "Finish to Start". During the implementation I came up with solution for "Start to Start", "Start to Finish" and "Finish to Finish" scenarios.
History
No changes yet.