Identifying and Deleting Duplicate Records from SQL Server Table

Aqeeel

3.90/5 (18 votes)

1 Jun 2016CPOL2 min read

21K

Identifying and deleting duplicate records from SQL Server Table

Introduction

This article/code/tip guides the SQL developers to identify, highlight (count) and delete the duplicate rows in a table.

Setting Up the Stage

Let us create a table that will contain our data. In this case, I am creating a variable table.

SQL

DECLARE @tblLocation AS TABLE (
    ID INT PRIMARY KEY IDENTITY(1, 1),
    Name VARCHAR(50),
    Location VARCHAR(50)
);

Let us insert some data. Data includes employee name and locations he visited. In this case, let me insert the hypothetical locations I have visited!

SQL

INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Islamabad, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Jeddah, Saudi Arabia');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Washington DC, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Albuqurque, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Santa Fe, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Albuqurque, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Washington DC, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'New York, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Washington DC, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Ankara, Turkey');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Istanbul, Turkey');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Ankara, Turkey');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Moscow, Russia');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Saint Petersburg, Russia');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Moscow, Russia');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Saint Petersburg, Russia');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Moscow, Russia');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Dubai, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Frankfurt, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Dusseldorf, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Essen, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Dusseldorf, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Wuppertal, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Dusseldorf, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Frankfurt, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Dubai, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Abu Dhabi, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Sharjah, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Abu Dhabi, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Sharjah, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Dubai, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Sharjah, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Abu Dhabi, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');

To give a better understanding of the data, let us add some more data. In this case, say the locations visited by my friend Shariq.

SQL

INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Multan, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Abu Dhabi, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Karachi, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Rawalpindi, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Islamabad, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Dubai, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Macau, China');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Harbin, China');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Macau, China');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Dubai, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Lahore, Pakistan');

Let us check what we have? To do so, we execute a simple select statement to fetch the data.

SQL

SELECT    *
FROM    @tblLocation;

This statement will return 56 rows.

Let us also check how many times an employee has visited the locations. To fetch the data, we will execute the following select statement:

SQL

SELECT      Name, Location, COUNT(*)
FROM        @tblLocation
GROUP BY    Name, Location
ORDER BY    Name, COUNT(*) DESC;

The result is as follows:

What We Want?

Well in my case, I wanted to export the list of locations in a new table. However, in this case, we will try to delete the duplicate records.

First, we need to identify the records we want to delete. For this, we are going to use the ROW_NUMBER function.

SQL

SELECT    ID, Name, Location, ROW_NUMBER() OVER (ORDER BY Name, Location) Occurance
FROM      @tblLocation
ORDER BY  Name, Location;;

This select statement will give us the complete data with an incrementing number in the last column named Occurance.

Now we apply the PARTITION BY clause to partition the Occurance column data on the bases of Name and Location. The query will be like this:

SQL

SELECT    ID, Name, Location, ROW_NUMBER() OVER (PARTITION BY Name, _
          Location ORDER BY Name, Location) Occurance
FROM      @tblLocation
ORDER BY  Name, Location;

The result of this statement is as below:

Note that the ROW_NUMBER is partitioned on the basis of each employee visiting different locations. The Green Box highlights that I have visited Abu Dhabi 3 times. And see the Gray Box highlighting my Home Town seven times.

Now just to have unique locations visited by employees, we need to delete all the instances of data where Occurance is greater than 1.

The delete statement will be like this:

SQL

DELETE FROM @tblLocation
WHERE    ID IN (    SELECT    tbl.ID
                FROM    (    SELECT    ID,Name, Location, ROW_NUMBER() OVER (PARTITION BY Name, _
                             Location ORDER BY Name, Location) Occurance
                            FROM    @tblLocation
                            ) tbl
                WHERE    tbl.Occurance > 1
                );

After executing the above script, simply select the data using the select statement and we will get the desired result. The result is as follows:

Points of Interest

In this, we learned how to identify the duplicate records, highlight them to count the occurance and finally to delete the duplicate records.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)