Introduction
This stored procedure finds duplicate records in tables and views. It is a tool I use frequently; whenever I hear rumors of duplicates being reported by the systems I support. It has evolved to the current state over time, through real-world use on production databases.
Background
What constitutes a "duplicate" in any particular circumstance depends on the data and the needs of the application. For the purpose of this tip, I'll consider records with the same value for some key field (or fields) to be duplicates, even if not all the fields are the same. For example, consider the following simple table:
ID | Name | Race |
1 | Bob | Alien |
1 | Bob | Alien |
1 | Bob | Martian |
1 | Robert | Alien |
- All four records match on ID.
- The first three records match on ID and Name.
- The first two records are the only two that match exactly on all three fields.
No one likes having to track down duplicates. Remembering how to write an SQL query to do it and then writing one for each different table or view that you need to investigate can be a burden, and if "production is down!" and every minute spent investigating costs money, then you will be grateful to have a tool handy.
One other point I'd like to make is that a query that simply provides a list of duplicate IDs isn't generally as helpful as a query that provides the entire records that match the duplicate criteria.
FindDuplicates
Such a tool should be flexible enough to work with pretty much any table or view you throw at it. This procedure accepts parameters to specify the table or view to query and which column (or columns) to use as a key.
EXECUTE dbo.FindDuplicates 'dbo' , 'Mascot' , 'ID'
EXECUTE dbo.FindDuplicates 'dbo' , 'Mascot' , 'Name'
EXECUTE dbo.FindDuplicates 'dbo' , 'Mascot' , 'CAST(ID AS VARCHAR(50)) + Name'
EXECUTE dbo.FindDuplicates 'dbo' , 'Mascot' , 'CAST(ID AS VARCHAR(50)) + Name + Race'
As this is a tool for developers and DBAs to use, I chose to accomplish this with dynamic SQL.
CREATE PROCEDURE [dbo].[FindDuplicates]
@schemaname VARCHAR(128)
, @tablename VARCHAR(128)
, @columnname VARCHAR(128)
, @whereclause VARCHAR(128)='0=0'
AS
BEGIN
DECLARE @sql VARCHAR(MAX)
SET @sql=
'
WITH cte AS ( SELECT __columnname__ __KEY__ , * FROM __schemaname__.__tablename__ WHERE __where_clause__ )
SELECT B.HowMany , A.*
FROM cte A
INNER JOIN
( SELECT __KEY__ , COUNT(*) HowMany FROM cte GROUP BY __KEY__ ) B
ON A.__KEY__=B.__KEY__
WHERE B.HowMany>1
ORDER BY A.__KEY__
'
SET @sql=REPLACE( @sql , '__schemaname__' , @schemaname )
SET @sql=REPLACE( @sql , '__tablename__' , @tablename )
SET @sql=REPLACE( @sql , '__columnname__' , @columnname )
SET @sql=REPLACE( @sql , '__where_clause__' , @whereclause )
EXECUTE (@sql)
END
As I said, I have had to use this many times and I'm always glad I have it handy. Please add comments (or alternatives!) with any suggestions of improvements you have.
History