Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server / SQL-Server-2008R2

SQL Server Procedure to Report Duplicates

5.00/5 (10 votes)
13 Dec 2014CPOL2 min read 25.2K  
A stored procedure I use to find duplicate records in tables and views.

Introduction

This stored procedure finds duplicate records in tables and views. It is a tool I use frequently; whenever I hear rumors of duplicates being reported by the systems I support. It has evolved to the current state over time, through real-world use on production databases.

Background

What constitutes a "duplicate" in any particular circumstance depends on the data and the needs of the application. For the purpose of this tip, I'll consider records with the same value for some key field (or fields) to be duplicates, even if not all the fields are the same. For example, consider the following simple table:

ID Name Race
1 Bob Alien
1 Bob Alien
1 Bob Martian
1 Robert Alien
  • All four records match on ID.
  • The first three records match on ID and Name.
  • The first two records are the only two that match exactly on all three fields.

No one likes having to track down duplicates. Remembering how to write an SQL query to do it and then writing one for each different table or view that you need to investigate can be a burden, and if "production is down!" and every minute spent investigating costs money, then you will be grateful to have a tool handy.

One other point I'd like to make is that a query that simply provides a list of duplicate IDs isn't generally as helpful as a query that provides the entire records that match the duplicate criteria.

FindDuplicates

Such a tool should be flexible enough to work with pretty much any table or view you throw at it. This procedure accepts parameters to specify the table or view to query and which column (or columns) to use as a key.

SQL
EXECUTE dbo.FindDuplicates 'dbo' , 'Mascot' , 'ID'
EXECUTE dbo.FindDuplicates 'dbo' , 'Mascot' , 'Name'
EXECUTE dbo.FindDuplicates 'dbo' , 'Mascot' , 'CAST(ID AS VARCHAR(50)) + Name'
EXECUTE dbo.FindDuplicates 'dbo' , 'Mascot' , 'CAST(ID AS VARCHAR(50)) + Name + Race'

As this is a tool for developers and DBAs to use, I chose to accomplish this with dynamic SQL.

SQL
CREATE PROCEDURE [dbo].[FindDuplicates]
  @schemaname  VARCHAR(128)
, @tablename   VARCHAR(128)
, @columnname  VARCHAR(128)
, @whereclause VARCHAR(128)='0=0'
AS
BEGIN
  DECLARE @sql VARCHAR(MAX)
  SET @sql=
  '
  WITH cte AS ( SELECT __columnname__ __KEY__ , * FROM __schemaname__.__tablename__ WHERE __where_clause__ )
  SELECT B.HowMany , A.* 
  FROM cte A
  INNER JOIN 
  ( SELECT __KEY__ , COUNT(*) HowMany FROM cte GROUP BY __KEY__ ) B
  ON A.__KEY__=B.__KEY__
  WHERE B.HowMany>1
  ORDER BY A.__KEY__
  '
 
  SET @sql=REPLACE( @sql , '__schemaname__'   , @schemaname  )
  SET @sql=REPLACE( @sql , '__tablename__'    , @tablename   )
  SET @sql=REPLACE( @sql , '__columnname__'   , @columnname  )
  SET @sql=REPLACE( @sql , '__where_clause__' , @whereclause )
 
--  PRINT @sql 
 
  EXECUTE (@sql)
END

As I said, I have had to use this many times and I'm always glad I have it handy. Please add comments (or alternatives!) with any suggestions of improvements you have.

History

  • 2014-12-13

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)