Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server / SQL-Server-2008

How to remove duplicate rows in SQL Server 2008 when no key is present

4.95/5 (12 votes)
20 Feb 2011CPOL 109.3K  
This tip describes how to delete duplicate rows from a table that doesn't have a key.
An easy way to remove duplicate rows from a table in SQL Server 2008 is to use undocumented feature called %%physloc%%. This pseudo column shows the physical location of a row.
 
Note that this feature is undocumented and unsupported so use at your own risk!
 
A simple test-case. Create a test table:
SQL
CREATE TABLE TestTable (
   Column1 varchar(1),
   Column2 int
);
Add some rows with few duplicates:
SQL
INSERT INTO TestTable VALUES ('A', 1);
INSERT INTO TestTable VALUES ('A', 1); -- duplicate
INSERT INTO TestTable VALUES ('A', 2);
INSERT INTO TestTable VALUES ('B', 1);
INSERT INTO TestTable VALUES ('B', 2);
INSERT INTO TestTable VALUES ('B', 2); -- duplicate
INSERT INTO TestTable VALUES ('C', 2);
You can select the data to see that all seven rows are present:
SQL
SELECT *
FROM   TestTable a
ORDER BY a.Column1, a.Column2;
Now let's delete the two duplicates using the %%physloc%%:
SQL
DELETE
FROM  TestTable
WHERE TestTable.%%physloc%%
      NOT IN (SELECT MIN(b.%%physloc%%)
              FROM   TestTable b
              GROUP BY b.column1, b.Column2);
And if you run the query again you'll see that only five rows remain and duplicates have been deleted.
SQL
SELECT *
FROM   TestTable a
ORDER BY a.Column1, a.Column2;
 
For more information about %%physloc%%, see Physical location of a row in SQL Server[^].

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)