Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / .NET

On why to use DataViews

5.00/5 (5 votes)
7 Dec 2014CPOL6 min read 16.2K  
Why your methods should accept DataViews; not just DataTables.

As I was writing my Tip Generating constructors at compile time[^], I was reminded that I wanted to write up a little something about how you can better serve the users of your code by accepting and using a DataView[^] rather than a bare DataTable.

I believe DataViews have gathered very little attention, even though they have been part of ADO.net since the very beginning. DataTables seemed to get all the attention, at least until Linq and Entity Framework came along. The truth is that a DataView is like clothing for a DataTable and an effort should be made to leverage what they offer.

Every DataTable comes with a DefaultView property to access a DataView with default properties. If you need another DataView, you can easily instantiate one with:

C#
System.Data.DataView dv = new System.Data.DataView ( dt ) ;

Iterating a DataView is essentially the same as iterating the Rows of the DataTable:

C#
foreach ( System.Data.DataRowView dr in dv ) ...

So if you have a DataTable you could do:

C#
foreach ( System.Data.DataRowView dr in dt.DefaultView ) ...

"OK, why?" I hear you ask.

I'll start with what a DataView provides to a DataTable:

  • The Sort property (default: "")
  • The RowFilter property (default: "")
  • The RowStateFilter property (default: Unchanged | Added | ModifiedCurrent)

The Rows property of a DataTable is similar to a DataView with the default values. That may be exactly what you want, but if your code is accepting a DataTable from outside, it may not be what the sender wants.

So the full reason of why to use the DefaultView is, if the sender knows about the default view and took the effort to set any of the above properties, he probably wants you to honor them and will be upset if you don't. Taking the time to deliver a correct answer is better than delivering an incorrect answer quickly.

Similarly, someone may want to instantiate a new DataView and pass that to your code -- if you don't accept a DataView he can't use your code at all.

"But shouldn't the sender ensure that the DataTable contains only the desired rows in the desired order?"

Certainly that will be the case the vast majority of the time, but it's those edge cases that really get you. Two particular DataTables I frequently work with are the results of DbConnection.GetSchema[^] and IDataReader.GetSchemaTable [^]

Here's another consideration. Remember the RowStateFilter property? Maybe the DataTable has been edited by a DataGridView and the sender wants to use your code to print a report of the Deleted records. Will you force the sender to copy the records to another DataTable? Would you want to have to do that?

"Interesting, what else can be done with the properties?"

Read the fine documentation of the class and its properties. Setting RowFilter can be particularly troublesome, and the documentation isn't obvious, so I'll include a link to it here: DataColumn.Expression[^]

"But does it cost anything?"

Performance-wise, yes actually, a quick experiment with a million-row table and fourteen iteration techniques gave me some interesting results. Please bear in mind that I measured only the iteration of the rows, not the access to the content of the rows. I executed each of the techniques twenty times in random order, dropped the top and bottom five durations, and summed the rest, so the Elapsed seconds is a total over ten executions. Other than some spurious results, these numbers have been pretty consistent all day.

Iteration technique Implementation Elapsed seconds
Decrementing for of reference to DefaultView dv = dt.DefaultView ; for ( int i = dv.Count - 1 ; i >= 0 ; i-- ) 0.0397771
Decrementing for of Rows for ( int i = dt.Rows.Count - 1 ; i >= 0 ; i-- ) 0.0397892
Decrementing for of DefaultView for ( int i = dt.DefaultView.Count - 1 ; i >= 0 ; i-- ) 0.0397962
Incrementing for of reference to DefaultView dv = dt.DefaultView ; for ( int i = 0 ; i < dv.Count ; i++ ) 0.085599
Incrementing for of Rows for ( int i = 0 ; i < dt.Rows.Count ; i++ ) 0.1060673
Incrementing for of DefaultView for ( int i = 0 ; i < dt.DefaultView.Count ; i++ ) 0.1372127
Foreach of Rows foreach ( System.Data.DataRow r in dt.Rows ) 0.7557827
Foreach of Enumerable foreach ( System.Data.DataRow r in dt.AsEnumerable() ) 1.0705395
Foreach of List foreach ( System.Data.DataRow r in dt.AsEnumerable().ToList<system.data.datarow>() ) 1.3790333
Foreach of DefaultView foreach ( System.Data.DataRowView r in dt.DefaultView ) 1.7353884
Foreach of reference to DefaultView dv = dt.DefaultView ; foreach ( System.Data.DataRowView r in dv ) 1.7407836
Decrementing for of new DataView dv = new System.Data.DataView ( dt ) ; for ( int i = dv.Count - 1 ; i >= 0 ; i-- ) 4.0329707
Incrementing for of new DataView dv = new System.Data.DataView ( dt ) ; for ( int i = 0 ; i < dv.Count ; i++ ) 4.2756063
Foreach of new DataView dv = new System.Data.DataView ( dt ) ; foreach ( System.Data.DataRowView r in dv ) 5.976936

I am aware that all these times are pretty quick (that's ten million records iterated!), and many of the differences are very small.

I have lighlighted the rows in blue to show how some developers may be led astray by Linq. All three techniques foreach the Rows of the DataTable. I am quite sure that ToList makes a copy of the Rows, and I suspect that AsEnumerable does as well. So, the question is, why copy the data twice when you don't need to copy it at all?

"So what do you recommend?"

Here are the main nine of those arranged in a matrix, for easier comparison. The important thing to notice is that in general:

  • The decrementing for is faster than the incrementing for, which is faster than the foreach. This is often true regardless of what's being iterated.
  • Using Rows directly is faster than using a DataView, but may not produce the desired result.

 

  Rows Reference to DefaultView DefaultView
Decrementing for 0.0397892 0.0397771 0.0397962
Incrementing for 0.1060673 0.085599 0.1372127
Foreach 0.7557827 1.7407836 1.7353884

I highlighted "Incrementing for of a Reference to DefaultView" in green. That's my recommendation; it's what I should have used in my Tip.

I accept that "Foreach of a reference to DefaultView" (in yellow), which I used in my Tip, is also a good choice, because it has improved readability. Of primary importance is to always remember and never forget that the readability of foreach comes at the cost of being slower than for -- that really is a 20x difference between the two (yellow and green) techniques.

I recommend a decrementing for (blue) only when you're sure that you can safely ignore the DefaultView.

You should also get some measurements using your own data.

"But creating a reference like in your example looks weird."

Ah, but consider how I did it in my Tip. By writing an overload that accepts a reference to a DataView, and calling that from the overload that accepts a DataTable, I get a reference at no extra cost. And having both overloads makes things easier on any callers I may have.

C#
public static System.Collections.Generic.List<T> ToList<T>
( this System.Data.DataTable Table )
{ return Table.DefaultView.ToList<T>() ; }

public static System.Collections.Generic.List<T> ToList<T>
( this System.Data.DataView Rows ) ...

Conclusion

If you always follow this pattern -- whenever you write a method that takes a DataTable, also create an overload that takes a DataView -- then iterate the DataView, rather than the DataTable, you should be OK.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)