Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server / SQL-Server-2012

Identifying and Deleting Duplicate Records from SQL Server Table

3.90/5 (18 votes)
2 Jun 2016CPOL2 min read 21K  
Identifying and deleting duplicate records from SQL Server Table

Introduction

This article/code/tip guides the SQL developers to identify, highlight (count) and delete the duplicate rows in a table.

Setting Up the Stage

Let us create a table that will contain our data. In this case, I am creating a variable table.

SQL
DECLARE @tblLocation AS TABLE (
    ID INT PRIMARY KEY IDENTITY(1, 1),
    Name VARCHAR(50),
    Location VARCHAR(50)
);

Let us insert some data. Data includes employee name and locations he visited. In this case, let me insert the hypothetical locations I have visited!

SQL
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Islamabad, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Jeddah, Saudi Arabia');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Washington DC, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Albuqurque, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Santa Fe, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Albuqurque, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Washington DC, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'New York, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Washington DC, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Ankara, Turkey');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Istanbul, Turkey');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Ankara, Turkey');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Moscow, Russia');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Saint Petersburg, Russia');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Moscow, Russia');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Saint Petersburg, Russia');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Moscow, Russia');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Dubai, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Frankfurt, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Dusseldorf, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Essen, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Dusseldorf, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Wuppertal, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Dusseldorf, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Frankfurt, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Dubai, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Abu Dhabi, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Sharjah, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Abu Dhabi, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Sharjah, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Dubai, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Sharjah, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Abu Dhabi, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');

To give a better understanding of the data, let us add some more data. In this case, say the locations visited by my friend Shariq.

SQL
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Multan, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Abu Dhabi, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Karachi, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Rawalpindi, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Islamabad, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Dubai, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Macau, China');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Harbin, China');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Macau, China');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Dubai, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Lahore, Pakistan');

Let us check what we have? To do so, we execute a simple select statement to fetch the data.

SQL
SELECT    *
FROM    @tblLocation;

This statement will return 56 rows.

Let us also check how many times an employee has visited the locations. To fetch the data, we will execute the following select statement:

SQL
SELECT      Name, Location, COUNT(*)
FROM        @tblLocation
GROUP BY    Name, Location
ORDER BY    Name, COUNT(*) DESC;

The result is as follows:

What We Want?

Well in my case, I wanted to export the list of locations in a new table. However, in this case, we will try to delete the duplicate records.

First, we need to identify the records we want to delete. For this, we are going to use the ROW_NUMBER function.

SQL
SELECT    ID, Name, Location, ROW_NUMBER() OVER (ORDER BY Name, Location) Occurance
FROM      @tblLocation
ORDER BY  Name, Location;;

This select statement will give us the complete data with an incrementing number in the last column named Occurance.

Now we apply the PARTITION BY clause to partition the Occurance column data on the bases of Name and Location. The query will be like this:

SQL
SELECT    ID, Name, Location, ROW_NUMBER() OVER (PARTITION BY Name, _
          Location ORDER BY Name, Location) Occurance
FROM      @tblLocation
ORDER BY  Name, Location;

The result of this statement is as below:

Note that the ROW_NUMBER is partitioned on the basis of each employee visiting different locations. The Green Box highlights that I have visited Abu Dhabi 3 times. And see the Gray Box highlighting my Home Town seven times.

Now just to have unique locations visited by employees, we need to delete all the instances of data where Occurance is greater than 1.

The delete statement will be like this:

SQL
DELETE FROM @tblLocation
WHERE    ID IN (    SELECT    tbl.ID
                FROM    (    SELECT    ID,Name, Location, ROW_NUMBER() OVER (PARTITION BY Name, _
                             Location ORDER BY Name, Location) Occurance
                            FROM    @tblLocation
                            ) tbl
                WHERE    tbl.Occurance > 1
                );

After executing the above script, simply select the data using the select statement and we will get the desired result. The result is as follows:

Points of Interest

In this, we learned how to identify the duplicate records, highlight them to count the occurance and finally to delete the duplicate records.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)