Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Dictionary Sorting and Inverse Duplication Removal

0.00/5 (No votes)
10 Sep 2012 1  
A few fun Dictionary utiltities.

Introduction

I have been using Dictionary data containers since VB6. As neat as they are, there are some limits and my latest stabs revealed them in a big ugly way.

The biggest? Changing the dictionary. Do that and the internal keys are no longer enumeral. Toss in the lack of a decent sort and all sorts of sad things happen.

Let us say that you have this set of data:

AAA - VVV

BBB - XXX

CCC - WWW

VVV - AAA

We want to remove that VVV - AAA set, because it is simply an inverse of the first set. The option of doing this within the values of a dictionary removes many-to-many relationships, whereas the key value of the dictionary would throw an error.

Background

My project is based on the main idea of deduplication. There are many sub tables, with things like phone number and email address, that are very easy to match up. This way, I can get hard matches without a lot of work. The problem? Sorting them and INVERSE MATCHES!!!

Using the code

This code is simple to use. There are three separate routines. One sorts, the other two look for duplicates; the first checks for inverse duplicates in the values field, the other in the key/values fields. 

The first one is a basic sort of a dictionary, with a string as the key:

Public Function SortDictionaryKeyString(Unsorted As Dictionary(Of String, String)) As Dictionary(Of String, String)

    Dim Working As List(Of String)
    Dim KeyPair As KeyValuePair(Of String, String)
    Dim KeyValue As String

    SortDictionaryKeyString = New Dictionary(Of String, String)

    Working = New List(Of String)

    For Each KeyPair In Unsorted
        KeyValue = KeyPair.Key.ToString
        Working.Add(KeyValue)
    Next

    Working.Sort()

    For Each Item As String In Working
        If Unsorted.ContainsKey(Item) Then
            SortDictionaryKeyString.Add(Item, Unsorted.Item(Item).ToString)
        End If
    Next

End Function

The next one is real clever - you have many to many relationships, so you have to use an index, but the data gets populated into the value with a colon separator. This allows manipulation of the string value to find inverse duplicates.

Public Function DeDupeDictionaryValues(ByVal Dupe As Dictionary(Of String, String)) As Dictionary(Of String, String)

    Dim KeyPair As KeyValuePair(Of String, String)
    Dim sValue As String
    Dim sTemp As String
    Dim iIdx As Int64
    Dim sSplit(2) As String

    DeDupeDictionaryValues = New Dictionary(Of String, String)

    For Each KeyPair In Dupe
        sValue = KeyPair.Value
        sSplit = Split(sValue, ":")
        sTemp = sSplit(1) & ":" & sSplit(0)
        If Not DeDupeDictionaryValues.ContainsValue(sTemp) Then
            iIdx = iIdx + 1
            DeDupeDictionaryValues.Add(iIdx, sValue)
        End If
    Next

End Function

The last one removes inverse duplications with a string, string dictionary:

Public Function DeDupeDictionary(ByVal Dupe As Dictionary(Of String, String)) As Dictionary(Of String, String)

    Dim Working As Dictionary(Of String, String)
    Dim KeyPair As KeyValuePair(Of String, String)
    Dim sValue As String
    Dim sTemp As String
    Dim sTemp2 As String

    DeDupeDictionary = New Dictionary(Of String, String)

    Working = New Dictionary(Of String, String)

    For Each KeyPair In Dupe
        sTemp = KeyPair.Key
        sTemp2 = KeyPair.Key
        sValue = KeyPair.Value
        If Not DeDupeDictionary.TryGetValue(sValue, sTemp) Then
            DeDupeDictionary.Add(sTemp2, sValue)
        End If
    Next

End Function

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here