SQL recursive common table expression in Project Scheduling practice

csabahegyi79

4.44/5 (4 votes)

14 Oct 2015CPOL6 min read

10.9K

Project Scheduling Engine with recursive SQL common table expression

Introduction

The following article describes an implementation of a Project Scheduling Engine in Microsoft SQL Server by using recursive common table expression designed for high performance.

The solution provides project forecasting functionality based on predecessor and successor dependencies.

Background

Sometimes the complexity of project scheduling goes beyond the capabilities of commercial software systems like Primavera P6. Because these solutions were designed to fit most of the companies' requirement, they have limitations. The cost of bypassing these limitations exceeds the cost of building new software which fulfils all the functionality requirement with fairly high performance.

In my case the size and the complexity of the projects ended up with over 3 million milestones in different programmes, different projects and over 7000 different sites. The milestones had cross dependencies between Programmes, Projects, Sites and used different Work Breakdown Structures (WBS).

The solution below shows how to build a single stored procedure to forecast millions of milestones effectively within seconds.

Using the code

My sample project is to build a house. To simplify the process I'll have a project start date, then 5 milestones and a housewarming party as the last milestone.

The goal of this exercise is to forecast the date of the housewarming party, by knowing only the project start date and the duration of each activity.

The engine that we design will update the successor milestones' forecast date if any of the predecessors' forecast date or completed date has changed.

To hold all the milestones which are related to my project, I am creating a table called tblMilestone.
The table stores the name of the milestones, forecast date, completed date and the duration of that activity.

CREATE TABLE tblMilestone (
    MilestoneId INT IDENTITY(1, 1) ,
    Milestone NVARCHAR(128) NOT NULL ,
    ForecastDate DATETIME NULL ,
    CompletedDate DATETIME NULL ,
    DurationInDays INT NOT NULL
                       DEFAULT ( 0 ) ,
    PRIMARY KEY ( MilestoneId ) )
GO

In this exercise I won't go into details how to fine tune with indexes and unique keys, I try to describe the solution as simple as possible.

At the beginning, the only thing I know is that I will commence the work on a certain date, and the time I allocate to each activity to be finished.
Our plan is to build a process, which updates the forecast date by calculations.

INSERT  INTO dbo.tblMilestone
        ( Milestone, ForecastDate, DurationInDays )<
VALUES  ( N'Project Start', '2016 Feb 1', 0 ),
        ( N'Buy Land', NULL, 30 ),
        ( N'Build Walls', NULL, 20 ),
        ( N'Build Roof', NULL, 15 ),
        ( N'Build Garden', NULL, 30 ),
        ( N'Key Handover', NULL, 2 )    ,
        ( N'Housewarming party', NULL, 5 )

If the project milestones were sequential, we would add sum of duration days to the start date to get the date of the housewarming party. But some of the activity can be started at the same time as others.

Only thing I have to do in order to start building the garden is to buy the land, in theory, I can start build the garden at the same time when I build the walls. Even though this is not the most practical approach, but assume for example sake.

In case of normal parent and child relationship we could store the dependencies in tblMilestone. However, it would limit the number of parent milestone to one.

Let assume, at the time when we have bought the land we will start building the walls and the garden too. After we finished the walls, we can start building the roof.

So the time necessary to build the walls and roof is 35 days and the garden would take 30 days. Therefore the earliest date for the housewarming party when both the activities have finished which is 35 days plus 2 days for the key hand.

Therefore, in order to kick off the housewarming party we need to finish all the activities.

In project management terms, the key handover milestone has two predecessors
1., Complete building the roof (please note: building the roof milestone has predecessor of building the walls)
2., Complete building the garden

To map this predecessor-successor relationship we need another table.

CREATE TABLE tblMilestoneDependency (
    MilestoneDependencyId INT IDENTITY(1, 1) ,
    PredecessorMilestoneId INT NULL --reference of the milestone to be finished before our activity can be started
    ,
    MilestoneId INT NOT NULL --current milestone in question
    ,
    PRIMARY KEY ( MilestoneDependencyId ) )
GO

ALTER TABLE dbo.tblMilestoneDependency
ADD CONSTRAINT FK_tblMilestoneDependency_PredecessorMilestoneId FOREIGN KEY (PredecessorMilestoneId) REFERENCES tblMilestone (MilestoneId)
GO

ALTER TABLE dbo.tblMilestoneDependency
ADD CONSTRAINT FK_tblMilestoneDependency_MilestoneId FOREIGN KEY (MilestoneId) REFERENCES tblMilestone (MilestoneId)

The start milestone has no predecessor, by inserting NULL the process will interpret as start activity

INSERT  INTO dbo.tblMilestoneDependency
        ( PredecessorMilestoneId, MilestoneId )
VALUES  ( NULL, 1 ),
        ( 1, 2 ), -- buy the land when the project started    
        ( 2, 3 ), -- start building the wall when the land has bought
        ( 2, 5 ), -- start building the garden when the land has bought
        ( 3, 4 ), -- start building the roof when the walls are standing
        ( 4, 6 ), -- start organising the key handover when the garden and the roof finished
        ( 5, 6 ),
        ( 6, 7 ) -- start organising the housewawrming party

In summary, our project has a "fork" dependency. It includes an example when one milestone has multiple successors, in other words, when one activity finished two other activity can be started.
And also includes example when multiple activities needs to be finished before another can be started. So we can say that our example covers most of the situation in project management.

So lets write a query to get the forecast date of the predecessors, add the duration time of the activity and show the new forecast date of the successor.

An easy way would be to loop through the successor milestones and update them with the new forecast date. However the performance of that query would be very low.

But what would happen when the company runs 50 programme, each has 5-15 projects and each project has about 120 milestones and these milestones needs to be achieved in 7000 different sites? The milestones sometimes have cross dependency between programmes. For example: In order to start a milestone in Project B another milestone needs to be finished in Project A.

By the fact that the number of milestones can easily go over millions an inefficient scheduler can cause delay in triggering the next activities.

The query below shows how the second milestone's ForecastDate could be updated from the first milestone. However this script does not go over on the chain, but updates always the next cascading milestone.

SELECT  predm.Milestone PredecessorMilestone ,
        m.Milestone ,
        predm.ForecastDate PredecessorForecastDate ,
        DATEADD(DAY, m.DurationInDays,
                CASE WHEN md.PredecessorMilestoneId IS NULL
                     THEN m.ForecastDate
                     ELSE predm.ForecastDate
                END) ,
        m.DurationInDays
FROM    dbo.tblMilestoneDependency md
INNER JOIN dbo.tblMilestone m
        ON m.MilestoneId = md.MilestoneId -- show all the dependencies of the milestones
LEFT JOIN dbo.tblMilestone predm
        ON predm.MilestoneId = md.PredecessorMilestoneId -- show the name of predecessors if there is any
ORDER BY md.MilestoneId

In the result set below, we can see the new forcast date for 'Buy Land' milestone, which is 30 days after 'Project Start' Forecast Date.

Calculated forecast for the next milestone only

Introducing Recursice Common Table Expression

Fortunately recursive common table expression was introduced in SQL, therefore we are able to populate the chain of milestones with one single query.

Let's modify the previous query and use as the source table in our recursive common table expression.

;WITH    cteCurrentSchedule
          AS ( SELECT   predm.Milestone PredecessorMilestone ,
                        m.Milestone ,
                        predm.MilestoneId PredecessorMilestoneId ,
                        m.MilestoneId ,
                        predm.ForecastDate PredecessorForecastDate ,
                        m.ForecastDate ,
                        m.CompletedDate,
                        m.DurationInDays
               FROM     dbo.tblMilestoneDependency md
               INNER JOIN dbo.tblMilestone m
                        ON m.MilestoneId = md.MilestoneId -- show all the dependencies of the milestones
               LEFT JOIN dbo.tblMilestone predm
                        ON predm.MilestoneId = md.PredecessorMilestoneId -- show the name of predecessors if there is any
             ),
        cteNewSchedule
          AS (
--select the start milestones
               SELECT   sch.Milestone ,
                        sch.MilestoneId ,
                        COALESCE(sch.CompletedDate, sch.ForecastDate) ForecastDate ,
                        sch.DurationInDays
               FROM     cteCurrentSchedule sch
               WHERE    sch.PredecessorMilestoneId IS NULL
               UNION ALL
               SELECT   cs.Milestone ,
                        cs.MilestoneId ,
                        COALESCE(cs.CompletedDate,
                                 DATEADD(DAY, cs.DurationInDays,
                                         ns.ForecastDate)) ,
                        cs.DurationInDays
               FROM     cteNewSchedule ns
               INNER JOIN cteCurrentSchedule cs
                        ON ns.MilestoneId = cs.PredecessorMilestoneId
             )
    SELECT  u.Milestone ,
            u.MilestoneId ,
            m.ForecastDate ,
            MAX(u.ForecastDate) NewForecastDate ,
            m.CompletedDate ,
            u.DurationInDays  --take the longest parth if  multiple available
    FROM    cteNewSchedule AS u
    INNER JOIN dbo.tblMilestone m
            ON m.MilestoneId = u.MilestoneId
    GROUP BY u.Milestone ,
            u.MilestoneId ,
            m.ForecastDate ,
            m.CompletedDate ,
            u.DurationInDays
    ORDER BY MAX(u.ForecastDate)

The result set shows the Forecast Date for all the dependant milestones.

With that result, you can update your original table (tblMilestone) with the latest Forecast dates to represent the data on a user control (not covered in this article).

In summary, we can place the logic into the store procedure. When the stored procedure executed it updates the forecast dates if any of the milestone predecessors changed.

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author:        <Author, Csaba Hegyi>
-- Create date: <Create Date, 2015 Sep 21>
-- Description:    <Description, This stored procedure updates milestones forecast dates if any of the predecessors' forecast date od completition date changed>
-- =============================================
ALTER PROCEDURE sp_Milestone_Reschedule
AS
BEGIN
    SET NOCOUNT ON;

;WITH    cteCurrentSchedule
      AS ( SELECT   predm.Milestone PredecessorMilestone ,
                    m.Milestone ,
                    predm.MilestoneId PredecessorMilestoneId ,
                    m.MilestoneId ,
                    predm.ForecastDate PredecessorForecastDate ,
                    m.ForecastDate ,
                    m.CompletedDate ,
                    m.DurationInDays
           FROM     dbo.tblMilestoneDependency md
           INNER JOIN dbo.tblMilestone m
                    ON m.MilestoneId = md.MilestoneId -- show all the dependencies of the milestones
           LEFT JOIN dbo.tblMilestone predm
                    ON predm.MilestoneId = md.PredecessorMilestoneId -- show the name of predecessors if there is any
         ),
    cteNewSchedule
      AS (
--select the start milestones
           SELECT   sch.Milestone ,
                    sch.MilestoneId ,
                    COALESCE(sch.CompletedDate, sch.ForecastDate) ForecastDate ,
                    sch.DurationInDays
           FROM     cteCurrentSchedule sch
           WHERE    sch.PredecessorMilestoneId IS NULL
           UNION ALL
           SELECT   cs.Milestone ,
                    cs.MilestoneId ,
                    COALESCE(cs.CompletedDate,
                             DATEADD(DAY, cs.DurationInDays,
                                     ns.ForecastDate)) ,
                    cs.DurationInDays
           FROM     cteNewSchedule ns
           INNER JOIN cteCurrentSchedule cs
                    ON ns.MilestoneId = cs.PredecessorMilestoneId
         ),
         cteLongestPathPerMilestone
         as
         (
         SELECT ns.MilestoneId, MAX(ns.ForecastDate)  ForecastDate FROM cteNewSchedule ns 
         GROUP BY ns.MilestoneId
         )

UPDATE m
SET  ForecastDate = u.ForecastDate
FROM    cteLongestPathPerMilestone AS u
INNER JOIN dbo.tblMilestone m
        ON m.MilestoneId = u.MilestoneId
WHERE m.CompletedDate IS NULL OR m.ForecastDate IS NULL

END
GO

The stored procedure reschedules only milestones where completed date is not entered yet. As an example, I'll update the comleted date for two of the milestones to see how the proc calculates the forecast dates for the successor milestones.

It will use the predecessor milestone's completed date as the source of truth rather than its forecast date.

UPDATE  dbo.tblMilestone
SET     CompletedDate = DATEADD(DAY, 3, ForecastDate)
WHERE   MilestoneId = 1
GO

UPDATE  dbo.tblMilestone
SET     CompletedDate = DATEADD(DAY, 15, ( SELECT    ForecastDate
                                              FROM      dbo.tblMilestone
                                              WHERE     MilestoneId = 1
                                            ))
WHERE   MilestoneId = 2

To see the result:

EXEC sp_Milestone_Reschedule

SELECT * FROM dbo.tblMilestone m

and the result set is:

I hope you enjoyed the article, and if you have any question, please don't hesitate contacting me.

Points of Interest

One of the most challenging part was in designing the solution, when a milestone has multiple predecessors, which means one of the predecessor already completed, but the milestone's activity still has to wait untill the longest path completed.

I came across with the solution to calculate the forecast dates for all the possible path, and take the maximum date (longest path) to update the milestone. This is represented in the final select statement as NewForecastDate.

The real application is more complex than this sample project. That sample covers only one milestone dependency type which is "Finish to Start". During the implementation I came up with solution for "Start to Start", "Start to Finish" and "Finish to Finish" scenarios.

History

No changes yet.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)