Purpose
This basically describes an efficient method for determining holes, or missing values, in a table that contains an IDENTITY()
column. As an example, let's say you have a table with the following definition:
CREATE TABLE Main
(
id int IDENTITIY(1,1),
someOtherValue varchar(20) NULL
)
INSERT INTO Main VALUES (NULL)
INSERT INTO Main VALUES (NULL)
INSERT INTO Main VALUES (NULL)
INSERT INTO Main VALUES (NULL)
INSERT INTO Main VALUES (NULL)
That essentially produces this table structure:
id | someOtherValue |
---|
1 | NULL |
2 | NULL |
3 | NULL |
4 | NULL |
5 | NULL |
Now suppose that you delete rows 2 and 3. You'll be left with IDs 1, 4, and 5. This code essentially tells you that 2 and 3 are missing.
Introduction
Originally, I did some quick research on Google and ran into methods described by pages like this: http://www.sql-server-helper.com/tips/determine-missing-identity-values.aspx. Essentially, they want to create a temporary pivot table containing all of the possible identity values, 1 through MAX()
, and perform a LEFT JOIN
on the pivot and the table in question. Any NULL
values on the main table equates to a missing identity.
That's all well and good, until you want it for something involving a table with more than 10,000 rows.
The Code
SET NOCOUNT ON
IF OBJECT_ID('tempdb..#Main') IS NOT NULL
DROP TABLE #Main
CREATE TABLE #Main
(
id int IDENTITY(1,1),
someOtherValue varchar(10) NULL
)
INSERT INTO #Main
SELECT NULL
UNION ALL SELECT NULL
UNION ALL SELECT NULL
UNION ALL SELECT NULL
UNION ALL SELECT NULL
UNION ALL SELECT NULL
UNION ALL SELECT NULL
UNION ALL SELECT NULL
UNION ALL SELECT NULL
UNION ALL SELECT NULL
SELECT * FROM #Main
DELETE FROM #Main WHERE id IN (1, 2, 4, 5, 7, 9, 10)
SELECT * FROM #Main
DECLARE @temp TABLE (firstMiss int NOT NULL, nextValue int NOT NULL)
DECLARE @maxId INT
DECLARE @lastId INT
DECLARE @increment INT
DECLARE @lowerBound INT
SELECT @maxId = MAX(id) FROM #Main
SELECT @lastId = IDENT_CURRENT('tempdb..#Main')
SELECT @increment = IDENT_INCR('tempdb..#Main')
SELECT @lowerBound = IDENT_SEED('tempdb..#Main')
INSERT INTO @temp
SELECT
@lowerBound,
(SELECT TOP 1 id FROM #Main WHERE id > @lowerBound ORDER BY id ASC)
INSERT INTO @temp
SELECT
id + @increment,
COALESCE(
(
SELECT TOP 1
id
FROM
#Main
WHERE
id > (A.id + @increment)
ORDER BY
id ASC
),
@maxId)
FROM
#Main A
WHERE
id + @increment NOT IN (SELECT id FROM #Main)
AND (id + @increment) <= @maxId
INSERT INTO @temp
SELECT
(SELECT TOP 1 id + @increment FROM #Main WHERE id < @lastId ORDER BY id DESC),
@lastId + @increment
SELECT * FROM @temp
DECLARE @missingId TABLE (id int)
DECLARE @maxFirstMiss INT
DECLARE @firstMiss INT
DECLARE @nextValue INT
SELECT @firstMiss = MIN(firstMiss) FROM @temp
SELECT @maxFirstMiss = MAX(firstMiss) FROM @temp
WHILE @firstMiss <= @maxFirstMiss
BEGIN
SELECT @nextValue = nextValue FROM @temp WHERE firstMiss = @firstMiss
WHILE @firstMiss < @nextValue
BEGIN
INSERT INTO @missingId VALUES (@firstMiss)
SET @firstMiss = @firstMiss + @increment
END
SELECT @firstMiss = MIN(firstMiss) FROM @temp WHERE firstMiss > @firstMiss
END
SELECT * FROM @missingId
DROP TABLE #Main
Assumptions
First, I only tested this using the IDENTITY(1,1)
scenario. I really don't deal with non-1 increment or seed values. It's coded to work with whatever your table allows, but the logic may not pan out.
Second, your performance may vary depending on how many gaps you've got and how large your tables are. My tests were on tables that had around 300,000 rows, and it worked much faster than the pivot-table method. So, be warned.