Introduction
I have been using Dictionary data containers since VB6. As neat as they are, there are some limits and my latest stabs revealed them in a big ugly way.
The biggest? Changing the dictionary. Do that and the internal keys are no longer enumeral. Toss in the lack of a decent sort and all sorts of sad things happen.
Let us say that you have this set of data:
AAA - VVV
BBB - XXX
CCC - WWW
VVV - AAA
We want to remove that VVV - AAA set, because it is simply
an inverse of the first set. The option of doing this within the values of a dictionary removes many-to-many relationships, whereas the key value of the dictionary would throw an error.
Background
My project is based on the main idea of deduplication. There are many sub tables, with things like phone number and email address, that are very easy to match up.
This way, I can get hard matches without a lot of work. The problem? Sorting them and INVERSE MATCHES!!!
Using the code
This code is simple to use. There are three separate routines. One sorts, the other two look for duplicates; the first checks for inverse
duplicates in the values field, the other in the key/values fields.
The first one is a basic sort of a dictionary, with a string as the key:
Public Function SortDictionaryKeyString(Unsorted As Dictionary(Of String, String)) As Dictionary(Of String, String)
Dim Working As List(Of String)
Dim KeyPair As KeyValuePair(Of String, String)
Dim KeyValue As String
SortDictionaryKeyString = New Dictionary(Of String, String)
Working = New List(Of String)
For Each KeyPair In Unsorted
KeyValue = KeyPair.Key.ToString
Working.Add(KeyValue)
Next
Working.Sort()
For Each Item As String In Working
If Unsorted.ContainsKey(Item) Then
SortDictionaryKeyString.Add(Item, Unsorted.Item(Item).ToString)
End If
Next
End Function
The next one is real clever - you have many to many relationships, so you have to use an index, but the data gets populated into the value with a colon separator.
This allows manipulation of the string value to find inverse duplicates.
Public Function DeDupeDictionaryValues(ByVal Dupe As Dictionary(Of String, String)) As Dictionary(Of String, String)
Dim KeyPair As KeyValuePair(Of String, String)
Dim sValue As String
Dim sTemp As String
Dim iIdx As Int64
Dim sSplit(2) As String
DeDupeDictionaryValues = New Dictionary(Of String, String)
For Each KeyPair In Dupe
sValue = KeyPair.Value
sSplit = Split(sValue, ":")
sTemp = sSplit(1) & ":" & sSplit(0)
If Not DeDupeDictionaryValues.ContainsValue(sTemp) Then
iIdx = iIdx + 1
DeDupeDictionaryValues.Add(iIdx, sValue)
End If
Next
End Function
The last one removes inverse duplications with a string, string dictionary:
Public Function DeDupeDictionary(ByVal Dupe As Dictionary(Of String, String)) As Dictionary(Of String, String)
Dim Working As Dictionary(Of String, String)
Dim KeyPair As KeyValuePair(Of String, String)
Dim sValue As String
Dim sTemp As String
Dim sTemp2 As String
DeDupeDictionary = New Dictionary(Of String, String)
Working = New Dictionary(Of String, String)
For Each KeyPair In Dupe
sTemp = KeyPair.Key
sTemp2 = KeyPair.Key
sValue = KeyPair.Value
If Not DeDupeDictionary.TryGetValue(sValue, sTemp) Then
DeDupeDictionary.Add(sTemp2, sValue)
End If
Next
End Function