Introduction
This article/code/tip guides the SQL developers to identify, highlight (count) and delete the duplicate rows in a table.
Setting Up the Stage
Let us create a table that will contain our data. In this case, I am creating a variable table.
DECLARE @tblLocation AS TABLE (
ID INT PRIMARY KEY IDENTITY(1, 1),
Name VARCHAR(50),
Location VARCHAR(50)
);
Let us insert some data. Data includes employee name and locations he visited. In this case, let me insert the hypothetical locations I have visited!
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Islamabad, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Jeddah, Saudi Arabia');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Washington DC, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Albuqurque, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Santa Fe, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Albuqurque, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Washington DC, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'New York, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Washington DC, USA');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Ankara, Turkey');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Istanbul, Turkey');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Ankara, Turkey');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Moscow, Russia');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Saint Petersburg, Russia');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Moscow, Russia');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Saint Petersburg, Russia');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Moscow, Russia');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Dubai, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Frankfurt, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Dusseldorf, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Essen, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Dusseldorf, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Wuppertal, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Dusseldorf, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Frankfurt, Germany');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Dubai, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Abu Dhabi, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Sharjah, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Abu Dhabi, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Sharjah, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Dubai, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Sharjah, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Abu Dhabi, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Aqeel', 'Lahore, Pakistan');
To give a better understanding of the data, let us add some more data. In this case, say the locations visited by my friend Shariq.
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Multan, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Abu Dhabi, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Karachi, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Rawalpindi, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Islamabad, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Lahore, Pakistan');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Dubai, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Macau, China');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Harbin, China');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Macau, China');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Dubai, UAE');
INSERT INTO @tblLocation (Name, Location) VALUES ('Shariq', 'Lahore, Pakistan');
Let us check what we have? To do so, we execute a simple select
statement to fetch the data.
SELECT *
FROM @tblLocation;
This statement will return 56 rows.
Let us also check how many times an employee has visited the locations. To fetch the data, we will execute the following select
statement:
SELECT Name, Location, COUNT(*)
FROM @tblLocation
GROUP BY Name, Location
ORDER BY Name, COUNT(*) DESC;
The result is as follows:
What We Want?
Well in my case, I wanted to export the list of locations in a new table. However, in this case, we will try to delete the duplicate records.
First, we need to identify the records we want to delete. For this, we are going to use the ROW_NUMBER
function.
SELECT ID, Name, Location, ROW_NUMBER() OVER (ORDER BY Name, Location) Occurance
FROM @tblLocation
ORDER BY Name, Location;;
This select
statement will give us the complete data with an incrementing number in the last column named Occurance
.
Now we apply the PARTITION BY
clause to partition the Occurance
column data on the bases of Name
and Location
. The query will be like this:
SELECT ID, Name, Location, ROW_NUMBER() OVER (PARTITION BY Name, _
Location ORDER BY Name, Location) Occurance
FROM @tblLocation
ORDER BY Name, Location;
The result of this statement is as below:
Note that the ROW_NUMBER
is partitioned on the basis of each employee visiting different locations. The Green Box highlights that I have visited Abu Dhabi 3 times. And see the Gray Box highlighting my Home Town seven times.
Now just to have unique locations visited by employees, we need to delete all the instances of data where Occurance
is greater than 1
.
The delete
statement will be like this:
DELETE FROM @tblLocation
WHERE ID IN ( SELECT tbl.ID
FROM ( SELECT ID,Name, Location, ROW_NUMBER() OVER (PARTITION BY Name, _
Location ORDER BY Name, Location) Occurance
FROM @tblLocation
) tbl
WHERE tbl.Occurance > 1
);
After executing the above script, simply select the data using the select
statement and we will get the desired result. The result is as follows:
Points of Interest
In this, we learned how to identify the duplicate records, highlight them to count the occurance and finally to delete the duplicate records.