Abstract
With large lists of data, it is always tricky to load and display the data in a UI with reasonable performance. This article talks about using a virtual list and a paging technique to solve
the performance problem. It specifically addresses the combination of a virtual list in UI and the paging technology in the backend data store access layer. The performance
is dramatically improved from o(n) to o(1).
This article is Part 1 of a data display performance optimization series.
All code examples are given in WPF and C#. The example user control in the UI is a WPF ListBox
.
Content
In the example application, you can click on the following three buttons, and each of them will load 10,000 employees by using a different technique. The last button demonstrates
the traditional way of displaying employees, basically, it loads all employees at once and display them in the listbox. The second button demonstrate lazy loading of employees by
using the virtual list technique. Basically, only employees that will be seen on the listbox will be loaded; if the user scrolls down the listbox, additional employees will be loaded then.
The third button demonstrates lazy loading of employees by combining the virtual list technique in the UI and paging technique in the backend data access layer.
The load time will be displayed for each button.
This article talks mostly about the third solution which is lazy loading employee by combining the virtual list technique in the UI and the paging technique in the backend data access layer.
In this example, the last button takes about 11 seconds to load and display employees, the first and second buttons take less than 1 second to load employees.
After the user clicks on the buttons, a list of employess UI will be displayed. The user can scroll down the scrollbar until 10,000 employees
are seen. All three buttons display the same list of employees UI.
First approach: display all records at once
In this approach, the traditional way to fetch data is used, and data is displayed in the listbox. All the data is fetched at once and displayed in the listbox. In our
example, we hardcode 10,000 records, and it takes one second per record to load the records. It takes 10 seconds to load data, and about 1 second to render and display the data in the listbox.
Let's walk through the code from the UI until the data access layer.
In the Employees.xaml UI, a list box is defined and it looks up the Employees
property in the ViewModel.
<ListBox x:Name="listBox1"
ItemsSource="{Binding Employees}"
VerticalAlignment="Center"/>
In the ViewModel class, the Employees
property is defined and its use EmployeeService
to fetch a list of employees.
public List<string> Employees
get { return EmployeeService.GetEmployees(); }
}
Currently we just simulate employee data in EmployeeService
. The total number of employees is hardcoded as 10,000, and it takes 1 millisecond to load an employee.
public const int NumberOfEmployees = 10000;
public static List<string> GetEmployees()
{
return PrepareData(0, NumberOfEmployees);
}
private static List<string> PrepareData(int startIndex, int count)
{
var employees = new List<string>();
for (int index = startIndex; index < count; index++)
{
employees.Add(index.ToString());
Thread.Sleep(TimeSpan.FromMilliseconds(1));
}
return employees;
}
With this approach, it will take 10 seconds to load all employee data, and takes about 11 seconds to wait for the listbox to show up.
The user will see a blank screen and won't know what is going to there. It is unacceptable with such slow performance. The larger the list, the slower it will be.
Two techniques can be used to solve this problem: paging vs. virtual list.
Using paging, we will not display all the employees at once in the UI. Instead, we we will display the datya page by page. Typically, you will
see a page number associated with each page, and you can click on a page number to go directly to that page. A typical example is a Google search result.
Using a virtual list, employees will be displayed as a list and you are still able to scroll the scrollbar to view the list. However, internally, only the employees which are displayed
in the current page will be fetched. If the user scrolls down the list, additional employees will be fetched when needed. I.e., we don't need to load all employees at once
in advance. A typical example is Google Picasa.
This article will only focus on using a virtual list in UI, and it will not talk about paging in UI.
Second approach: display all records in a virtual list
In this approach, the UI will stay exactly the same as in the previous example, the only different is that the ListBox
is bound to VirtualListEmployees
instead of
Employees
. VirtualListEmployees
is a VirtualList
type. The VirtualList
will not load all employees at once, instead the VirtualList
is smart enough to load only data which is displayed in the UI, typically it will be 20 to 30 employees which fit in one screen. If you scroll down the listbox, the
VirtualList
is smart enough to load the required employees on demand.
Code snippet from VirtualListEmployees.xaml:
<ListBox x:Name="listBox1"
ItemsSource="{Binding VirtualListEmployees}"
VerticalAlignment="Center" />
Code snippet from ViewModel.cs:
public VirtualList<string> VirtualListEmployees
{
get { return new VirtualList<string>(new EmployeeObjectGenerator()); }
}
VirtuaList<T>
implements the interface IList<T>
and IList
, the main idea here is that we create a list with a predefined number of items.
However, the content of the list is null. The content will only be loaded when they are needed by the UI. VirtualList.cs originated from CodeProject.com, and I made some
modifications on top of it.
The VirtualList
constructor allocates a null array whose size is defined by the data count in the IObjectGeenrator
interface.
public VirtualList(IObjectGenerator<T> generator)
{
int maxItems = generator.Count;
_generator = generator;
_cachedItems = new T[maxItems];
}
The data will be fetched only when the UI requires the data to be displayed, and the data will be cached in the array to enhance performance.
public T this[int index]
{
get {
if (!IsItemCached(index))
CacheItem(index);
return _cachedItems[index];
}
}
public void CacheItem(int index)
{
_cachedItems[index] = _generator.CreateObject(index);
}
IObjectGenerator
and the interface implementer is used to determine the total number of employees, where an array can be pre-allocated with null values. It also uses
the CreateObject
method to load employees only when needed.
In EmployeeObjectGenerator.cs, it gets the total number of employees in constructor, and it also asks EmployeeService
to get employees based on index.
public EmployeeObjectGenerator()
{
_count = EmployeeService.NumberOfEmployees;
}
public string CreateObject(int index)
{
return EmployeeService.GetEmployee(index);
}
EmployeeService
is our backend data access class, it is used to retrieve employee data.
In our example, no database, no text are hooked up. We simulate the employee service. The employee name will be the same as the index value.
Thread.Sleep
is used to simulate the speed of loading employee objects. In the example, we assume it takes 1 millisecond to load
an employee. We hardcode the number of employees as ten thousand.
public const int NumberOfEmployees = 10000;
public static string GetEmployee(int index)
{
Thread.Sleep(TimeSpan.FromMilliseconds(1));
return index.ToString();
}
With this VirtualList
implementation, it take less than a second to load and display employees in the listbox. The performance dramatically improves from o(n) to o(1).
However, the current EmployeeService
simulation does not reflect with the data access strategy. Most of our data access layer uses the database,
and GetEmployee(int index)
by index will not work with databases because we can not directly access database data by index. One way is to use a query to find
all employees which meet the search criteria, then cache it and use GetEmployee(int index)
by index method on top of the cache. The problem with this approach
is that it has a performance penalty in this query. It will significantly slow down the speed if we are going to find all employees at once, even
if we find only employees that match the search criteria. The speed will change to o(n) again.
Third approach: display all records in a virtual list in the UI with a paging technique in the data access layer
The solution is to use paing technique to fecth data from the datastore, and then use a virtual list to manage the paged data. I.e., if employees in page x are
needed, then page x will be loaded on demand. If employees in page Y are needed, then only page Y will be loaded on demand. The VirtualList
will sit
on top of all pages, and it will manage all the pages regardless of if they are loaded or not.
The idea is demonstrated in the following picture. We have a virtual list that contains n number of pages. Each page is loaded only when it is
needed. CurrentCursor
is used to define the current loaded page index, and each page has a fixed page size.
Let's check the source code from the UI down to the database access level.
The UI and the ViewModel code remain almost the same as in the previous example. The only change is that it is mapped to the PagedVirtualListEmplyees
property. Here is the
code snippet from PagedVirtualListEmployees.xaml.
<ListBox x:Name="listBox1" ItemsSource="{Binding PagedVirtualListEmployees}"/>
Code snippet from ViewModel.cs:
public VirtualList<string> PagedVirtualListEmployees
{
get { return new VirtualList<string>(new PagedEmployeeObjectGenerator()); }
}
PagedEmployeeObjectGenerator
will delegate all the get employee information to PagedEmployeeRepository
. Here is the code snippet:
public PagedEmployeeObjectGenerator()
{
_repository = new PagingEmployeeRepository();
}
public string CreateObject(int index)
{
return _repository.GetAt(index);
}
PagedEmployeeRepository
manages both the cached virtual list and the paging. Count
always gets called first
to know the total number of employees in order to allocate an array cache, and it also initializes the array cache with the first page's value. Here is the code snippet:
public int NumberOfEmployees = 10000;
public int PageCursor { get; set; }
private int _pageSize = 40;
public int Count()
{
var employees = DoLoadPage();
_cache = new string[NumberOfEmployees];
UpdateCache(employees);
return NumberOfEmployees;
}
DoLoadPage
will load the page on PageCursor
, and after the page on PageCursor
is loaded, UpdateCache
is called to copy the value to
the array cache.
Once the user scrolls down, the GetAt(index)
method will be called. If nothing has been loaded, i.e., the cache value is null,
then we need to get the PageCursor
from the index so we know which page has not been loaded yet, and then call
LoadPage()
to load that page and populate the value in the cache. Otherwise, it will return back the cached employee.
public string GetAt(int index)
{
if (_cache[index] == null)
{
PageCursor = index / PageSize;
LoadPage();
}
return _cache[index];
}
With this PagedVirtualList
approach, all the values are displayed in the UI within a second regardless of the list size. The performance is under o(1).
Conclusion
Volia, that is it. Now you can have the paged virtual list to load and display a large list under o(1). This concludes the first article of the Data Display
Performance Optimization series. In the second article, I will talk about how to select an item from a list and remove it, and how to select and deselect all. In the third article,
I will talk about how to search a virtual list multiple times and select and remove multiple times. Hope you enjoyed this article.
Appendix