Introduction
One of my recent projects included simple, and I believe, commonly done tasks. We were implementing a project utilizing the Microsoft Visual Studio Professional edition, with C# 3.0, and the task was to compare two snapshots taken from the same table within a particular time interval. The purpose of this task was to determine if any changes where done on the server side within a given time interval.
The first way that came to my mind was to use the DataSet.Merge()
method. Ideally, this is how it should work:
At
some point of the application initialization, we “grab” the first snapshot of data and load it into the original DataSet
(let’s call it dsOriginal
). Then, on the timer event, when the given interval has passed, we take a second snapshot (let’s call it dsChanged
). The rest should be a fairly routine procedure:
- Create an empty
DataSet
(we will call it dsDifferences
).
DataSet dsDifferences = new Dataset();
- Merge our
dsOriginal
with dsChanged
.
dsOriginal.Merge(dsChanged);
- The last step is to get the modified or added rows using the
DataSet.GetChanges()
method.
dsDifferences = dsOriginal.GetChanges();
This is a very “stylish” and simple solution, but not so easy as in concept. We were unpleasantly surprised when we found out that merging datasets did not change the RowSatus
property of the rows in the tables of the datasets. Just to make sure that we did nothing wrong, I looked at the MSDN library again, and the MSDN materials confirmed that the:
The Merge
method combines rows (equivalent to a SQL Union statement) when no Primary Key is present, or updates rows in the target DataSet
using the source DataSet
when a Primary Key is defined.
To resolve this issue, we contacted the Microsoft Support team, and after a few days of email exchange, we got the following message:
I think the root of the issue is that Merge does not change the row state. If you create a DataSet
, then manually change a row value, then the row state is tweaked and the GetChanges()
behaves as expected.”
But Merge
does a “dumb” merge and does not set other flags, but I am trying to find documentation to support that. If that is the case, the only option is to do is a manual compare. I will keep you posted on what I find…
As you can see from the message, the Microsoft suggestion was to go manually through both DataSet
s, comparing values row by row, and manually updating the settings in the resulting DataSet using the Row.SetAdded()
or Row.SetModified()
methods according to the situation – a solution not very elegant. With the fact that the method provided above is available for developers in Framework 2.0 and later for extra flexibility, it is hard to imagine when these methods can be used in practical terms. After a day or two of additional research and conversations with Microsoft, we received another email from the support team, where we got the clue that the issue cannot be resolved: “I have raised the issue with the appropriate personnel and I am waiting to hear back from them. I will let you know something as soon as I find out more."
The LINQ Solution
That’s when we turned our heads to LINQ. I did not have any extensive experience with this new feature, but in a few hours, found it very easy to learn and apply. There are a few ways using LINQ to achieve the same result. For our purposes, we selected the Union method. Therefore, here is the “moment of Zen”:
- During the initialization of our application, we still create an initial snapshot of the data; in the sample below, we will call it
dsOriginal
. - On the
Elapsed
event of the timer, we create another snapshot of the current data state; we will call it dsChanged
. - To use LINQ, we need to get the tables from both the datasets into an enumerable row collection. This can be done using the
AsEnumerable()
method (see the code sample below), and we are ready to use the LINQ magic.
Below is the code which completes the task:
var orig = dsOriginal.Tables[0].AsEnumerable();
var updated = dsChanged.Tables[0].AsEnumerable();
var newRec = from u in dsChanged
where !(from o in orig
select o.Field<decimal>("PRIMARY_KEY_FIELD"))
.Contains(u.Field<decimal>(" PRIMARY_KEY_FIELD"))
select new
{
prim_key = u.Field<decimal>("PRIMARY_KEY_FIELD"),
field1 = u.Field<decimal>("FIELD1"),
field2=u.Field<decimal>("FIELD2"),
field3 = u.Field<decimal>("FIELD3"),
field4 = u.Field<decimal>("FIELD4"),
rec_type="A"
};
var updRec = from u in updated
join o in orig
on u.Field<decimal>("PRIMARY_KEY_FIELD")
equals o.Field<decimal>("PRIMARY_KEY_FIELD")
where (u.Field<decimal>("FIELD1") !=
o.Field<decimal>("FIELD1")) ||
(u.Field<decimal>("FIELD2") !=
o.Field<decimal>("FIELD2"))
select new
{
prim_key = u.Field<decimal>("PRIMARY_KEY_FIELD"),
field1 = u.Field<decimal>("FIELD1"),
field2=u.Field<decimal>("FIELD2"),
field3 = u.Field<decimal>("FIELD3"),
field4 = u.Field<decimal>("FIELD4"),
rec_type = "M"
};
var Union = newRec.Union(updRec);
The code snippet is simple and self-explanatory. In the first select statement, we are getting new records by using the Contains()
method, preceding it with the “!” operator which gives us only the newly added rows. In the second statement, we get the rows with the updated values using comparison with the where clause, and finally, we compare the results using the Union()
method.
You probably noticed the variable rec_type
used in both queries. When you need to know which record was modified and which is new (added), the Union
operator comes in handy because it allows you to create a custom flag on the DataRow
with this information, but if you only need to get the differences without knowing if the row was added or modified, you can get all the data in “one shot”, and the code snippet below demonstrates how:
var AddedAndModif = from u in updated
where !(from o in orig
select o.Field<decimal>("PRIMARY_KEY"))
.Contains(u.Field<decimal>("PRIMARY_KEY"))
|| !(from o in orig
select o.Field<decimal>("FIELD1"))
.Contains(u.Field<decimal>("FIELD1"))
select new
{
prim_key = u.Field<decimal>("PRIMARY_KEY"),
field1 = u.Field<decimal>("FIELD1"),
field2=u.Field<decimal>("FIELD2"),
field3 = u.Field<decimal>("FIELD3"),
field4 = u.Field<decimal>("FIELD4"),
};
As you can see from this example, the Contains()
method used twice solves the task and provides you with a collection of both, modified and newly added rows.
In our opinion, among all the other options, this solution appears to be the fastest. It is easy to read and modify as needed, and showed about 20% better performance when compared with the same result achieved by the method suggested by the MS Support professionals provided at the top of this article (comparing row by row and manually changing RowState
using the Row.SetAdded()
or Row.SetModified()
methods).
Since the situation provided in this discussion is common, we decided to share our solution, thinking that it might make other developers' lives easier.