Situation
Recently I was on a project where I had an SSIS package that was doing a fair amount of ETL work and based on the projected data load it was expected to take 10-15 minutes for an average load of data. The problem with this is the job had to start based on some user interaction and the user needed the results ASAP. Because of the complexity of the ETL I did not want to rely on transactional boundaries to keep the straight because data is being written to and from multiple databases. The transactions would essentially put locks on almost any table in multiple databases which is not ideal.
Solution
Instead I decided to try and query the system tables to see if an instance of the SQL Agent job in charge of running this package is already started. I had queried system tables for table and database metadata before but never for job statuses. As it turns out this is a fairly straight forward task, after a little search the web and a bit of trial and error I had the following stored procedure.
CREATE PROCEDURE [dbo].[StartOneAtATime]
AS
BEGIN
SET NOCOUNT ON;
DECLARE @JOB_NAME SYSNAME = N'SQL_Agent_Job';
IF NOT EXISTS(
SELECT 1
FROM msdb.dbo.sysjobs_view job
INNER JOIN msdb.dbo.sysjobactivity activity
ON job.job_id = activity.job_id
WHERE
activity.run_Requested_date IS NOT NULL
AND activity.stop_execution_date IS NULL
AND job.name = @JOB_NAME
)
BEGIN
PRINT 'Starting job ''' + @JOB_NAME + '''';
EXEC msdb.dbo.sp_start_job @JOB_NAME;
END
ELSE
BEGIN
PRINT 'Job ''' + @JOB_NAME + ''' already started ';
END
END
It is an extremely straight forward procedure. It looks to see if there are any instances of job that are still running and if it does not find one it starts one. The addition of some very basic print statements helped when going through logs.
Now all we had to do was insert a call to this stored procedure at the end of the user interaction and our conflicting job executions went away. I added a little code at the end of the primary SSIS package to check for additional data from the user that was submitted since the last start and re-started the lengthy ETL process. This way the job was not set on some polling schedule where it would run when it was not needed, and the users had to wait the minimum amount of time for their data to process.
Possible Enhancements
As you can see this is a very basic implementation which does exactly what I needed. There are a couple things you could add to make this more flexible and portable:
- Adding a parameter for the SQL Agent Job Name instead of having it hard coded
- Returning some sort of status code so the calling application knows the state of the job
Final Thoughts
Again my implementation was very simple but It is still a handy little utility that I will be holding onto for future data projects. It took a little forethought to design the primary package and data model to allow users to submit data that would wait in queue if another user already started the job but it was well worth the effort.