Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / Win32

Avoid Out-of-memory Troubles with Very Large Lists(Of T)

5.00/5 (6 votes)
15 Jan 2016CPOL4 min read 23.1K   172  
Big lists (>1,000,000 items) can cause out-of-memory errors raised by .NET. Manage them by a List of List(Of T)

Introduction

Every now and then on .NET code fora, questions are asked about how to manage big collections over a million items. The usual reaction by the guru's is along the line of: "why would you want to show over a million items to the user?" and other implicit reproaches of bad coding practice. Fact is sometimes you do need to browse (quickly) through a huge set of items, because sometimes that's the best way to analyze the data. Or: "Ask not what the data can do for your software; ask what your software can do for the data".

Background

In my professional life, I work a lot with seismic data which comes in the shape of big binary files (> 10 Gb is not exceptional). Sometimes, these data are corrupt and won't load in standard software. It is then efficient to browse in a structured, but quick way through the whole data file to find out where the file falls over upon loading. However, there is a size limit in the Microsoft Common Language Runtime (CLR) of 2Gb which a single object can hold. Also, out-of-memory (OOM) errors occur when a single list gets too big and the software is unable to find a large enough section of contiguous pages in its virtual address space to store the data. Please note: this applies to each single object. And because it applies to single objects, it should not be too difficult to create a work around using a composite list of some sort. There's a very readable article here, from which I quote and for more background. This tip shows a class I wrote to handle composite lists as if it is a single list. It works for my seismic data (especially in combination with a ListView in VirtualMode); it may help you with your 'big data'...

Using the Code

The workaround is a single class which I will walk through in 'blocks'. At the end, I will list the full class. In the attached source code, you will find this class with a little form to show use of the class.

The class is called 'BigList', no surprise, and implements IEnumerator(<span style="color: rgb(0, 0, 255);">Of</span> T) and IEnumerable(<span style="color: rgb(0, 0, 255);">Of</span> T), like:

VB.NET
Public Class BigList(Of T)
    Implements IEnumerator(Of T), IEnumerable(Of T)
    
    'private globals
    Private _listoflists As List(Of List(Of T))
    Private _enumindex As Int64 = -1
    Private _listindex As Integer = 0
    Private _current As T = Nothing
    
    'ctor
    Public Sub New()
        _listoflists = New List(Of List(Of T))
        _listoflists.Add(New list(Of T))
        _enumindex = -1
        _listindex = 0
        _current = CType(Nothing, T)
    End Sub
    
    Public MaxItemsPerSublist As Integer = 500000 'half a million records per list 'Long
    'rest of the code here....
End Class

Implementing these abstracts forces you to implement the GetEnumerator, Reset, Dispose, Current and MoveNext methods and properties. We'll get to that later. Main point here is that I declare a local Private _listoflists <span style="color: rgb(0, 0, 255);">As</span> List(<span style="color: rgb(0, 0, 255);">Of</span> List(<span style="color: rgb(0, 0, 255);">Of</span> T)) which is the composed list of List(Of T). Each of these lists can have a max of MaxItemsPerSublist items, in this case initiated to 500,000, but this can be changed to any suitable number once the class is instantiated. Three more local privates are of importance: _enumindex is the iterator index for the BigList as a whole, _listindex is the iterator of the sublist within the composite list and _current is the object which is stored of a user specified type T. The class is constructed by setting the indexes to their appropriate values and adding the first List(Of T) to the _listoflists composite.

Now the Add function is the next method to discuss because it dives right into managing the different lists and indices. It simply adds the item and then checks if the current list _listoflists(_listindex) is at its maximum. If so, it will add another list and increment the sublist index _listindex.

VB.NET
Public Function Add(ByVal item As T) As Int64
   _listoflists(_listindex).Add(item)
   
   'go to the next list if the first sublist is full
   If _listoflists(_listindex).Count = Me.MaxItemsPerSublist Then
       _listoflists.Add(New list(Of T))
       _listindex += 1
       'Debug.Print("Going to list " & _listindex)
   End If
   
   Return Count
End Function

Public ReadOnly Property Count() As Int64
    Get
        Dim sum As Int64 = 0
        For Each l In _listoflists
            sum += l.Count
        Next
        Return sum
    End Get
End Property

The function returns Count which is the read-only property defined in the above code block; it loops over all sublists and sums the amount of items in each.

Now to retrieve any item back from the list, the method Current is implemented (also needed to do a For Each loop):

VB.NET
Public ReadOnly Property Current As T Implements IEnumerator(Of T).Current
    Get
        Dim li As Integer = 0
        Dim ti As Integer = 0
        getSubindexes(_enumindex,li,ti)
        
        _current = _listoflists(li)(ti)
        Return _current
    End Get
End Property

Now the heart of this property (in fact, of the whole class) is the function getSubindexes which takes in the index of the item in the BigList and returns the sublist and sublist index where it is stored. Here is the code:

VB.NET
Private Function getSubindexes(ByVal index As int64, _
    ByRef listindex As Integer, ByRef itemindex As Integer) As Boolean
    
    Dim li As Integer = 0
    Dim ri As Integer = 0
    Dim nid As Int64 = index
    Dim sum As Int64 = 0
    Dim indexfound As Boolean = False
    
    For l As Integer = 0 To _listindex
        If index < sum + _listoflists(l).Count Then
            listindex = li
            itemindex = CInt(index - sum)
            indexfound = True
            Exit For
        Else
            li +=1
            sum += _listoflists(l).Count
        End If
    Next
    'Debug.Print("returning list[{0}] item[{1}]",listindex,itemindex)
    
    'throw an out of index error if the index could not be returned
    If Not indexfound Then
        Throw New IndexOutOfRangeException
        Return False
    Else
        Return True
    End If
End Function

The method will loop over the sublists and break down the _enumindex into a list and item index (returned ByRef). The method will also throw an IndexOutOfRangeException like any normal List(Of ) would do. Similarly, the Item property (default to any collection class) makes use of getSubindexes:

VB.NET
Default Property Item(index As Int64) As T
    Get
        Dim li As Integer = 0
        Dim ti As Integer = 0
        getSubindexes(index,li,ti)
        Return _listoflists(li)(ti)
    End Get
    Set
        Dim li As Integer = 0
        Dim ti As Integer = 0
        getSubindexes(index,li,ti)
        _listoflists(li)(ti) = value
    End Set
End Property

And this is basically all there is to it...

For the implementation, you do need to code the GetEnumerator, Reset, Dispose and MoveNext methods and properties. They are worked out in the attached source file. I have added myself the Clear, IndexOf and Remove methods to have the basic List(Of ) functionality in. You can extend it with other methods to your heart's desire. As an example, I include here a screen dump of an application I'm coding to view and edit a seismic file (or more popularly known SegY file) showing the tail end of a 13.5Gb file with 2,025,780 seismic records (or traces):

BigList snapshot

Points of Interest

I did try initially to implement the IList(Of ) abstract class but ran into trouble. I needed the Int64 variable to be able to have a huge index over the BigList. I may be wrong... Also, when coding the getSubindexes method, I was wondering if the False is actually returned after throwing the IndexOutOfRangeException? Anybody knows?

History

  • January 16, 2016: First draft

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)