As I was writing my Tip Generating constructors at compile time[^], I was reminded that I wanted to write up a little something about how you can better serve the users of your code by accepting and using a DataView[^] rather than a bare DataTable.
I believe DataViews have gathered very little attention, even though they have been part of ADO.net since the very beginning. DataTables seemed to get all the attention, at least until Linq and Entity Framework came along. The truth is that a DataView is like clothing for a DataTable and an effort should be made to leverage what they offer.
Every DataTable comes with a DefaultView property to access a DataView with default properties. If you need another DataView, you can easily instantiate one with:
System.Data.DataView dv = new System.Data.DataView ( dt ) ;
Iterating a DataView is essentially the same as iterating the Rows of the DataTable:
foreach ( System.Data.DataRowView dr in dv ) ...
So if you have a DataTable you could do:
foreach ( System.Data.DataRowView dr in dt.DefaultView ) ...
"OK, why?" I hear you ask.
I'll start with what a DataView provides to a DataTable:
- The Sort property (default: "")
- The RowFilter property (default: "")
- The RowStateFilter property (default: Unchanged | Added | ModifiedCurrent)
The Rows property of a DataTable is similar to a DataView with the default values. That may be exactly what you want, but if your code is accepting a DataTable from outside, it may not be what the sender wants.
So the full reason of why to use the DefaultView is, if the sender knows about the default view and took the effort to set any of the above properties, he probably wants you to honor them and will be upset if you don't. Taking the time to deliver a correct answer is better than delivering an incorrect answer quickly.
Similarly, someone may want to instantiate a new DataView and pass that to your code -- if you don't accept a DataView he can't use your code at all.
"But shouldn't the sender ensure that the DataTable contains only the desired rows in the desired order?"
Certainly that will be the case the vast majority of the time, but it's those edge cases that really get you. Two particular DataTables I frequently work with are the results of DbConnection.GetSchema[^] and IDataReader.GetSchemaTable [^]
Here's another consideration. Remember the RowStateFilter property? Maybe the DataTable has been edited by a DataGridView and the sender wants to use your code to print a report of the Deleted
records. Will you force the sender to copy the records to another DataTable? Would you want to have to do that?
"Interesting, what else can be done with the properties?"
Read the fine documentation of the class and its properties. Setting RowFilter can be particularly troublesome, and the documentation isn't obvious, so I'll include a link to it here: DataColumn.Expression[^]
"But does it cost anything?"
Performance-wise, yes actually, a quick experiment with a million-row table and fourteen iteration techniques gave me some interesting results. Please bear in mind that I measured only the iteration of the rows, not the access to the content of the rows. I executed each of the techniques twenty times in random order, dropped the top and bottom five durations, and summed the rest, so the Elapsed seconds is a total over ten executions. Other than some spurious results, these numbers have been pretty consistent all day.
Iteration technique | Implementation | Elapsed seconds |
Decrementing for of reference to DefaultView | dv = dt.DefaultView ; for ( int i = dv.Count - 1 ; i >= 0 ; i-- ) | 0.0397771 |
Decrementing for of Rows | for ( int i = dt.Rows.Count - 1 ; i >= 0 ; i-- ) | 0.0397892 |
Decrementing for of DefaultView | for ( int i = dt.DefaultView.Count - 1 ; i >= 0 ; i-- ) | 0.0397962 |
Incrementing for of reference to DefaultView | dv = dt.DefaultView ; for ( int i = 0 ; i < dv.Count ; i++ ) | 0.085599 |
Incrementing for of Rows | for ( int i = 0 ; i < dt.Rows.Count ; i++ ) | 0.1060673 |
Incrementing for of DefaultView | for ( int i = 0 ; i < dt.DefaultView.Count ; i++ ) | 0.1372127 |
Foreach of Rows | foreach ( System.Data.DataRow r in dt.Rows ) | 0.7557827 |
Foreach of Enumerable | foreach ( System.Data.DataRow r in dt.AsEnumerable() ) | 1.0705395 |
Foreach of List | foreach ( System.Data.DataRow r in dt.AsEnumerable().ToList<system.data.datarow>() ) | 1.3790333 |
Foreach of DefaultView | foreach ( System.Data.DataRowView r in dt.DefaultView ) | 1.7353884 |
Foreach of reference to DefaultView | dv = dt.DefaultView ; foreach ( System.Data.DataRowView r in dv ) | 1.7407836 |
Decrementing for of new DataView | dv = new System.Data.DataView ( dt ) ; for ( int i = dv.Count - 1 ; i >= 0 ; i-- ) | 4.0329707 |
Incrementing for of new DataView | dv = new System.Data.DataView ( dt ) ; for ( int i = 0 ; i < dv.Count ; i++ ) | 4.2756063 |
Foreach of new DataView | dv = new System.Data.DataView ( dt ) ; foreach ( System.Data.DataRowView r in dv ) | 5.976936 |
I am aware that all these times are pretty quick (that's ten million records iterated!), and many of the differences are very small.
I have lighlighted the rows in blue to show how some developers may be led astray by Linq. All three techniques foreach
the Rows of the DataTable. I am quite sure that ToList makes a copy of the Rows, and I suspect that AsEnumerable does as well. So, the question is, why copy the data twice when you don't need to copy it at all?
"So what do you recommend?"
Here are the main nine of those arranged in a matrix, for easier comparison. The important thing to notice is that in general:
- The decrementing
for
is faster than the incrementing for
, which is faster than the foreach
. This is often true regardless of what's being iterated. - Using Rows directly is faster than using a DataView, but may not produce the desired result.
| Rows | Reference to DefaultView | DefaultView |
Decrementing for | 0.0397892 | 0.0397771 | 0.0397962 |
Incrementing for | 0.1060673 | 0.085599 | 0.1372127 |
Foreach | 0.7557827 | 1.7407836 | 1.7353884 |
I highlighted "Incrementing for of a Reference to DefaultView" in green. That's my recommendation; it's what I should have used in my Tip.
I accept that "Foreach of a reference to DefaultView" (in yellow), which I used in my Tip, is also a good choice, because it has improved readability. Of primary importance is to always remember and never forget that the readability of foreach
comes at the cost of being slower than for
-- that really is a 20x difference between the two (yellow and green) techniques.
I recommend a decrementing for
(blue) only when you're sure that you can safely ignore the DefaultView.
You should also get some measurements using your own data.
"But creating a reference like in your example looks weird."
Ah, but consider how I did it in my Tip. By writing an overload that accepts a reference to a DataView, and calling that from the overload that accepts a DataTable, I get a reference at no extra cost. And having both overloads makes things easier on any callers I may have.
public static System.Collections.Generic.List<T> ToList<T>
( this System.Data.DataTable Table )
{ return Table.DefaultView.ToList<T>() ; }
public static System.Collections.Generic.List<T> ToList<T>
( this System.Data.DataView Rows ) ...
Conclusion
If you always follow this pattern -- whenever you write a method that takes a DataTable, also create an overload that takes a DataView -- then iterate the DataView, rather than the DataTable, you should be OK.