Introduction
How can we tell whether a list contains duplicated elements? In other words, how do we know whether all the elements inside the list are unique?
There are a few algorithms to find this out. In this article, I shall explain two of them.
Method 1: Element by Element Comparison
This should be the easiest method. What you have to do is just compare all the elements inside the list to other elements and once you find any pair of them that are equal, you know that the array elements are not unique. The code to do this is as follows:
public static bool IsUniqueSlow(IList<T> arrs)
{
for (int i = 0; i < arrs.Count; i++)
{
for (int j = i + 1; j < arrs.Count; j++)
{
if (arrs[ i].Equals(arrs[j]))
return false;
}
}
return true;
}
So simple, so easy and does not require much thinking. This should be the obvious option if your list is not too big.
The only problem with this method is its running time which is Ò(n2). This is definitely unacceptable as the list grows to become very large. So we need an alternative method that will cut down the running time.
Method 2: Sort and Compare
This method is a bit more convoluted. Instead of doing element-by-element comparisons, you sort the list before you compare the ¡th and the (¡+1)th element to see whether they are equal, and if they are, then you know that your array contains duplicated elements. If you repeat this for all the elements in the list and there are no equals, then you know the elements are all unique. Here's the code to do the job:
public static bool IsUniqueFast(List<T> arrs)
{
List<T> arrCopy = arrs.ConvertAll<T>(delegate(T tf) { return tf; });
arrCopy.Sort();
for (int i = 0; i < arrCopy.Count-1; i++)
{
if (arrCopy[i].Equals(arrCopy[i + 1]))
return false;
}
return true;
}
The code should be self-explanatory. The first line of the code is to create a copy of the original list, so that when the sorting is done, the original list won't be disturbed.
Let's analyze the running time characteristics of this method a bit. The sort algorithm's running time is Ò(nlogn), and the comparison loop running time is Ò(n). Adding the time together, this method running time is Ò(nlogn)-- a significant improvement over the last method. With a small list, one may not see the difference in performance, but if the list is big, then the performance difference can tip the favor point over to the second method.
Test Run Results
We compare the performance of these two methods under two situations, one is when all the elements are unique, another is when there are duplicated elements. The element numbers are made large enough so that the running time of both methods are at least milliseconds in magnitude order. Here's the code:
private static void IsUniqueRun()
{
int nOfList = 1 * ((int)Math.Pow(10, 4));
List<int> myList = new List<int>();
for (int i = 0; i < nOfList; i++)
myList.Add(i);
DateTime dtStart = DateTime.Now;
Console.WriteLine("Does this array contain duplicated items? "
+ArrayCheck<int>.IsUniqueSlow(myList));
Console.WriteLine("Time taken for slow method " +
(DateTime.Now - dtStart).TotalMilliseconds);
DateTime dtStart1 = DateTime.Now;
Console.WriteLine("Does this array contain duplicated items? "
+ ArrayCheck<int>.IsUniqueFast(myList));
Console.WriteLine("Time taken for fast method " +
(DateTime.Now - dtStart1).TotalMilliseconds);
}
private static void IsNotUniqueRun()
{
int nOfList = 1 * ((int)Math.Pow(10, 4));
List<int> myList = new List<int>();
for (int i = 0; i < nOfList; i++)
myList.Add(i);
myList.Add(myList[myList.Count - 1]/2);
DateTime dtStart = DateTime.Now;
Console.WriteLine("Does this array contain duplicated items? " +
ArrayCheck<int>.IsUniqueSlow(myList));
Console.WriteLine("Time taken for slow method " +
(DateTime.Now - dtStart).TotalMilliseconds);
DateTime dtStart1 = DateTime.Now;
Console.WriteLine("Does this array contain duplicated items? " +
ArrayCheck<int>.IsUniqueFast(myList));
Console.WriteLine("Time taken for fast method " +
(DateTime.Now - dtStart1).TotalMilliseconds);
}
And the results? No surprise, the second method vastly outperforms the first one, and here are the results:
Does this array contain duplicated items? True
Time taken for slow method 1531.299
Does this array contain duplicated items? True
Time taken for fast method 15.6255
Does this array contain duplicated items? False
Time taken for slow method 1187.538
Does this array contain duplicated items? False
Time taken for fast method 0
Press any key to continue . . .
Motivation
I know, I know, it's not too hard to come up with the above algorithms. But as trivial as it seems, there seems to be no explicit code on the Internet that does a fast element unique search. This article is meant to fill this gap and provide a usable method for the purpose.
History
- First written on 8-Nov-2007