Introduction
This article describes the implementation of an algorithm that logs DataTable
row and column changes by hooking several events provided by the DataTable
, including the new TableNewRow
event in the .NET 2.0 Framework. The purpose of this work is to provide local undo/redo capability for transactions generated by a UI, and also the ability to log transactions as the application itself manipulates a DataTable
, suitable for serialization and application to a mirror of the table located remotely. In this implementation, each table requires its own transaction log. This will immediately cause problems in cases where the table is part of a data set - a collection of interrelated tables. For that situation, this implementation is insufficient because there is no synchronization of transactions between tables in a data set. Frankly, that problem should only be addressed after properly understanding the issues involved with working with a single DataTable
.
A couple notes about the terminology - I tend to use the word "field" and "column" interchangeably. Also, I talk about "uncommitted rows". I do not, in this context, mean rows not saved to the database. I'm referring specifically to rows that haven't yet been added to the DataTable
's row collection.
Design decisions
The idea of logging DataTable
transactions is much easier said than done. There are several issues that had to be considered and that drove the architecture. In some cases, compromises had to be made that seemed unavoidable. The issues are:
- Deleting a
DataRow
: Once deleted, the DataRow
fields cannot be manipulated (undo/redo issue) nor added back to the owning table.
- A new row can be (and typically is) initialized before being added to the table's row collection, resulting in "change" transactions before the "add" transaction.
- It is impossible to determine whether the application (or a control, like the
DataGrid
) has decided to discard a new row.
What follows is a more detail discussion of each issue and how it affected the architecture.
Deleting a DataRow
One idea with regards to managing a list of DataTable
transactions is to save the DataRow
reference and the type of change. For field changes, this is simple enough - also save the column being changed and the old and new values so that both undo (apply old value) and redo (apply new value) can be used. The problem arises if the DataRow
has been deleted. You can't "undo" a data row that was deleted by adding it back to the table collection, instead, you have to create a new DataRow
instance. Now let's say that you have the following transaction log:
- Add Row
- Change Field
- Change Field
And you undo all transactions. The first transaction, "Add Row" when undone, will delete the row from the DataTable
. If we now attempt to redo all transactions, the deleted row has to be added again to the DataTable
, but as a new instance. Now the DataRow
instances in the Change Field transactions require fixing up. Are all the transactions in the log forward of the add row transaction adjusted? Yes. Originally, I had chosen to decouple the DataRow
instance from the transaction by preserving the primary keys associated with each row. This put a lot of restrictions on when the transactions were logged and when the primary keys were required. I ultimately decided that fixing up the DataRow
references in the transaction log was a better implementation because it decoupled the dependency on primary keys! The primary keys are useful only when serializing the transactions, which really is an autonomous function not related to the undo/redo feature. Also keep in mind that if you undo a deleted row, all the transactions referencing the deleted rows earlier in the transaction log need to be fixed up.
New row field initialization
Once the above design decision was made, it became more obvious that .NET 2.0's TableNewRow
event would be very helpful in logging a "new row" transaction. Note that I'm not calling it "add row", because that is technically incorrect - the row hasn't been added to the DataTable
's row collection yet. This resolved the problem of columns being initialized and changed before the row is added to the collection, as the transaction log will always have a "new row" transaction prior to the field change transaction.
New row initialization
But the story doesn't end here. It's possible for the application to simply discard a new row. For example, using the DataGrid
, click on the row with the "*". This fires a TableNewRow
event. Press the Escape key. The new row is discarded, but I still have a "new row" transaction in the log! Also consider an application that has created a new row and for some reason faults while initializing it. Again, the transaction log is left with a new row that it doesn't know has been discarded.
The work around to this issue has some ramifications. First, I implemented a new row garbage collection method, which deletes all log entries for rows that were never finally added to the DataTable
's row collection. This method should always be called before accepting table changes (more on this later) and before performing undo operations. Essentially, the DataTable
should be stable before accept and undo actions, because neither of these actions make much sense on rows that were not actually added to the DataTable
's row collection. Similarly, this method should be called before serializing the transaction log.
This design decision makes the implementation more complicated because it requires managing whether or not the row has actually been added to the DataTable
. This requires an "uncommitted row list". Also we can use the RowChanged
event and monitor "Add" actions to remove the row from the uncommitted row list. And finally, each entry in this row list should reference each index of the transaction log that references the row. The garbage collection can then walk through the uncommitted row list, add the indices to the transaction log to a sorted buffer, then finally remove the transactions from the log by iterating through the sorted buffer from highest index to lowest (so higher indexes aren't moved when deleting a lower index). All this to prevent having to iterate through the transaction log to find transactions referencing the row being collected!
Other considerations
I was somewhat surprised to note that it is impossible to prevent the ColumnChanged
event to fire, even when setting an error for the column in a ColumnChanging
event handler. In fact, even setting the ProprosedValue back to the column's original value does not prevent the ColumnChanged
event from firing. In the DataGrid
, if you edit a column, typing in exactly the same value, the ColumnChanging
event still fires. Something to note.
My original design
As I mentioned, my initial design used primary key values and was very "smart" (hahaha) about how it handled new rows and field initialization. Basically, it required all the primary keys to be valid before processing a field change. If the application first sets all the required primary keys, it would automatically add a "add row" transaction. It even had an event the application could hook to supply the primary keys. This seemed OK and actually worked quite well in my testing. As I was writing about this design, I got to this sentence: To get the transaction ordering right, a "new row" transaction is logged even though the row hasn't yet been added to the collection. At that point I stopped and realized, that was very, very wrong!
However, it points out the compromise between the two different designs - one having to deal with the new rows never been finally added to the DataTable
and the other anticipating adding the row but possibly ending up with rows that actually were never added. Really, it boils down to the same thing - uncommitted rows need to be removed from the transaction log. However, the additional side effects of requiring a "get the primary keys NOW" made me settle on a different architecture. In fact, the final architecture couldn't care less about whether your DataTable
even has primary keys - the only time primary keys are useful is in serializing the transaction log, and that implementation is outside of the scope of the transaction logger.
Implementation
Event handlers
TableCleared event
This event does not support undo. It clears the internal collections and resets the logging state.
TableNewRow event
In this event handler, we log the new row and add it to our collection of uncommitted rows, initializing the transaction list with the transaction index:
protected void OnTableNewRow(object sender, DataTableNewRowEventArgs e)
{
if (doLogging)
{
int idx;
DataTableTransactionRecord record;
idx = transactions.Count;
record = new DataTableTransactionRecord(idx, e.Row,
DataTableTransactionRecord.TransactionType.NewRow);
OnTransactionAdding(new TransactionEventArgs(record));
transactions.Add(record);
OnTransactionAdded(new TransactionEventArgs(record));
List<int> rowIndices = new List<int>();
rowIndices.Add(idx);
uncomittedRows.Add(e.Row, rowIndices);
}
}
RowChanged event
In this event hander, when Action==DataRowAction.Add
, then the row being added is removed from the uncommitted row collection:
protected void OnRowChanged(object sender, DataRowChangeEventArgs e)
{
if (doLogging)
{
if (e.Action == DataRowAction.Add)
{
if (!uncomittedRows.ContainsKey(e.Row))
{
throw new DataTableTransactionException("Attempting " +
"to commit a row that doesn't exist in the " +
"uncommitted row collection.");
}
uncomittedRows.Remove(e.Row);
}
}
}
ColumnChanging and ColumnChanged events
This event is hooked so that we can get at the current field value and save it in the transaction as the old value. The ColumnChanged
event has already changed the field value to the proposed value by the time we see it there. However, we still need the ColumnChanged
event to get the new row value in case the application changed the proposed value!
if (doLogging)
{
object oldVal = e.Row[e.Column];
int trnIdx;
DataTableTransactionRecord record;
trnIdx = transactions.Count;
record = new DataTableTransactionRecord(
trnIdx, e.Row, e.Column.ColumnName,
oldVal, e.ProposedValue);
OnTransactionAdding(new TransactionEventArgs(record));
transactions.Add(record);
OnTransactionAdded(new TransactionEventArgs(record));
waitingForChangedEventList.Add(record);
if (uncomittedRows.ContainsKey(e.Row))
{
uncomittedRows[e.Row].Add(trnIdx);
}
}
In the following code, the waitForChangedEventList
is reverse searched for the column that is being changed. I decided to buffer the "changing" events that are waiting for completion via the ColumnChanged
event since it is possible (although I certainly wouldn't recommend the practice) that the application changes other fields (possibly in other rows) in a ColumnChanging
event (or in a ColumnChanged
event handler that fired first):
void OnColumnChanged(object sender, DataColumnChangeEventArgs e)
{
if (doLogging)
{
for (int i = 0; i < waitingForChangedEventList.Count; i++)
{
DataTableTransactionRecord r =
waitingForChangedEventList[i];
if ( (r.ColumnName == e.Column.ColumnName) &&
(r.Row == e.Row) )
{
r.NewValue = e.ProposedValue;
waitingForChangedEventList.RemoveAt(i);
break;
}
}
}
}
RowDeleting event
This event handler creates a transaction for deleting a row. You cannot delete a row that isn't associated with a table, so the uncommitted collection won't ever contain this row:
protected void OnRowDeleting(object sender,
DataRowChangeEventArgs e)
{
if (doLogging)
{
DataTableTransactionRecord record;
record = new DataTableTransactionRecord(transactions.Count, e.Row,
DataTableTransactionRecord.TransactionType.DeleteRow);
transactions.Add(record);
SaveRowFields(record, e.Row);
OnTransactionAdded(new TransactionEventArgs(record));
}
}
Undo / Redo
The transaction logger implements Revert (a single transaction rollback) and Apply (per transaction) that can be used to implement undo/redo capability. The transaction logger does not itself provide a complete undo/redo implementation - this is left to the application, and an example follows.
Revert (undo)
The Revert
method performs the following, depending on the transaction type:
- For a field change, the old value is restored.
- For a row that was added, the row is deleted.
- For a row that was deleted, it is re-added, and all previous transactions that reference that row are fixed up to now reference the restored row:
public void Revert(int idx)
{
if ((idx < 0) || (idx >= transactions.Count))
{
throw new ArgumentOutOfRangeException("Idx cannot be " +
"negative or greater than the number of transactions.");
}
SuspendLogging();
DataTableTransactionRecord r = transactions[idx];
DataRow row = r.Row;
switch (r.TransType)
{
case DataTableTransactionRecord.TransactionType.NewRow:
if (!r.WasDeleted)
{
SaveRowFields(r, row);
}
row.Delete();
break;
case DataTableTransactionRecord.TransactionType.DeleteRow:
DataRow newRow = sourceTable.NewRow();
RestoreRowFields(r, newRow);
sourceTable.Rows.Add(newRow);
for (int n = idx; n >= 0; --n)
{
if (transactions[n].Row == row)
{
transactions[n].Row = newRow;
}
}
break;
case DataTableTransactionRecord.TransactionType.ChangeField:
row[r.ColumnName] = r.OldValue;
break;
}
ResumeLogging();
}
Apply
Apply
(redo) performs the following, depending on the transaction type:
- For a field change, the new value is restored.
- For a new row, a new row is added. All later references to that row instance are fixed up to now reference the newly added row.
- For a deleted row, the row is deleted.
The code is very similar to the Revert method
above.
Undo/Redo example
In the demo program, the undo/redo functionality is handled by hooking the logger's TransactionAdding event and the Click events associated with the undo and redo buttons on the UI:
void OnTransactionAdding(object sender, TransactionEventArgs e)
{
btnRedo.Enabled = false;
if (undoRow < tlog.Log.Count - 1)
{
tlog.Log.RemoveRange(undoRow + 1,
tlog.Log.Count - (undoRow + 1));
tlog.AcceptChanges();
}
}
void OnUndo(object sender, EventArgs e)
{
tlog.Revert(undoRow);
--undoRow;
}
void OnRedo(object sender, EventArgs e)
{
++undoRow;
tlog.Apply(undoRow);
}
The above code has had the dgTransactions management and the button state management removed for clarity. The point of interest here is that when a transaction occurs, any redo transactions forward of the current undo transaction index are removed. Also, the AcceptChanges
is called to synchronize the DataTable
with the undo changes. If we don't synchronize at this point, a RejectChanges
call may redo changes that had been undone.
Accept/Reject changes
A DataTable
provides AcceptChanges
and RejectChanges
methods. There are no events associated with these methods, so instead of using them directly, you should use the ones provided by the transaction logger. The logger's methods manage an internal index that is updated in the AcceptChanges
call. If a RejectChanges
call is made, all transactions after this point are removed from the transaction list. This keeps the transaction logger synchronized with the DataTable
's idea of what has and has not been accepted. Important: Calling RejectChanges
while the undo index is not at the top of the undo stack can cause some strange effects.
The implementation illustrates how the transaction log and internal index are managed:
public void AcceptChanges()
{
lastAcceptedChangeIndex = transactions.Count;
sourceTable.AcceptChanges();
}
public void RejectChanges()
{
int numTran = transactions.Count - lastAcceptedChangeIndex;
transactions.RemoveRange(lastAcceptedChangeIndex, numTran);
sourceTable.RejectChanges();
}
The demo program
The demo program is a small application with a MyXaml front end (what did you expect, code???) and an event handler back end that allows you to play with undo/redo functionality, accept/reject table changes, setting column and row errors (which was part of the investigation that I did to see how the DataTable
events were handled) and clearing the table.
The declarative user interface
The UI is straightforward, declaring the control object graph and the checkbox data binding:
="1.0" ="utf-8"
-->
<MyXaml xmlns="System.Windows.Forms, System.Windows.Forms,
Version=2.0.0000.0, Culture=neutral,
PublicKeyToken=b77a5c561934e089"
xmlns:ctd="Clifton.Tools.Data, Clifton.Tools.Data"
xmlns:def="Definition"
xmlns:ref="Reference">
<Form Name="DataTableUndoRedo"
Text="DataTable Undo/Redo Demo"
ClientSize="570, 675"
MinimizeBox="false"
MaximizeBox="false"
StartPosition="CenterScreen">
<Controls>
<Label Text="Edit This Table:" Location="10, 10"
Size="100, 15"/>
<DataGrid def:Name="dgTable" Location="10, 25"
Size="280, 300" DataSource="{view}">
<TableStyles>
<DataGridTableStyle>
<GridColumnStyles>
<DataGridTextBoxColumn MappingName="LastName"
HeaderText="Last Name" Width="120"/>
<DataGridTextBoxColumn MappingName="FirstName"
HeaderText="First Name" Width="120"/>
</GridColumnStyles>
</DataGridTableStyle>
</TableStyles>
</DataGrid>
<Label Text="Transactions:" Location="10, 340" Size="100, 15"/>
<DataGrid def:Name="dgTransactions" Location="10, 355" Size="550, 300"
DataSource="{transactions}" ReadOnly="true"/>
<Button def:Name="btnUndo" Location="300, 25" Size="80, 25" Text="Undo"
Click="{app.OnUndo}"/>
<Button def:Name="btnRedo" Location="300, 55" Size="80, 25" Text="Redo"
Click="{app.OnRedo}"/>
<Button def:Name="btnAccept" Location="400, 25" Size="100, 25"
Text="Accept Changes" Click="{app.OnAcceptChanges}"/>
<Button def:Name="btnReject" Location="400, 55" Size="100, 25"
Text="Reject Changes"
Click="{app.On<CODE>RejectChanges</CODE>}"/>
<CheckBox def:Name="ckColErr" Location="300, 100" Size="200, 20"
Text="Set column error on change"/>
<CheckBox def:Name="ckRevertCol" Location="325, 120" Size="200, 20"
Text="Revert field value" Enabled="{ckColErr.Checked}"/>
<CheckBox def:Name="ckRowErr" Location="300, 140" Size="200, 20"
Text="Set row error on change"/>
<Button Location="300, 170" Size="80, 25" Text="Clear"
Click="{app.OnClear}"/>
<Button Location="300, 200" Size="80, 25" Text="Collect"
Click="{app.OnCollect}"/>
</Controls>
<ctd:BindHelper Source="{ckColErr}" SourceProperty="Checked"
Destination="{app}" DestinationProperty="ColumnErrorOnChange"/>
<ctd:BindHelper Source="{ckRowErr}" SourceProperty="Checked"
Destination="{app}" DestinationProperty="RowErrorOnChange"/>
<ctd:BindHelper Source="{ckRevertCol}" SourceProperty="Enabled"
Destination="{ckColErr}" DestinationProperty="Checked"/>
<ctd:BindHelper Source="{ckRevertCol}" SourceProperty="Checked"
Destination="{app}" DestinationProperty="RevertFieldValue"/>
</Form>
</MyXaml>
Using the demo program, you can see how, as you manipulate the DataGrid
, the transactions are added to the log. For example:
When you click on Undo, the row index in the transaction log is moved up, so that it always shows the last applied transaction. For example, clicking on undo once:
To illustrate the new row not added to the DataTable
's row collection problem, we can cursor down to the row with the "*":
If we cursor back up, the row reverts to the "*" but the NewRow transaction lingers. We can go back and forth doing this any number of times:
If we click on the Collect button, all those uncommitted new row transactions are removed:
Here's an interesting exception that gets thrown when you attempt to undo an uncommitted, empty row (this is thrown by the framework):
The solution, obviously, is to perform a Collect before the Undo. However, I've specifically omitted any "smarts" in the demo code so that you can play with the demo and see how the .NET Framework interacts with the transaction logger, rather than hiding "gotchas" like this one.
Conclusion
The original concept of logging DataTable
transactions seemed like a simple task. However, it was fraught with both incorrect architectural decisions and troublesome implementation. At this point, I think I've achieved the proper balance with regards to the original goal and the complexities of the DataTable
in both a UI and application driven environment. I have decided to keep another aspect of the transaction logger, the ability to serialize the transactions and apply them to separate (mirrored and/or remote) DataTable
, separate from this implementation. And that implementation will require primary key fields as DataRow
instances will be different!