Introduction
Maintaining a version history of SQL Server data has many benefits, but the top three are:
- Undo history
- Auditing
- Soft delete
Record versioning imposes a layer on top of CRUD operations that make them more complex, especially on Update and Delete. No longer can you simply delete a record. Instead, you must flag it as deleted ("soft delete"). No longer can you simply update a record; instead, you must perform a soft delete followed by an insert.
Reporting is also a challenge. Most reporting frameworks do not understand the concept of versioned data. They expect each record to be a distinct data item, not a 'version' of a data item it has already seen.
No versioning solution will circumvent the fundamental challenges to versioned records, but we can greatly improve on the traditional approach to auditing data.
The typical solution
Record versioning is normally accomplished by creating separate audit tables that mirror the schema of the main tables.
Does that look right to you? It has a few bad smells to me.
- I don't like the schema duplication. It adds to the clutter of tables, makes maintenance more difficult, and in general, makes it harder for new developers to digest. I think we can do better.
- It feels unsafe. The fundamental principal of moving data involves deleting from the old destination. I think we can do better.
- It's not friendly with ORM. If you use ORM tools or handle your audit trail with business objects, you are forced to copy each field explicitly, from the old business object to the new 'audit' business object. Again, I think we can do better.
- It doesn't scale. If I want to add a BlogComment table, I have to add another audit table. Now, not only do we have schema duplication, but we have duplicate abstractions of auditing that can grow apart over time. Imagine years later when the BlogComment_Archive table behaves differently than the Blog_Archive table. That makes it even harder to comprehend the schema.
- I don't want to write reports against this schema. I have to point to separate tables when I want historical drill-down, and that seems unnecessary.
The better plan
As it turns out, we indeed can do much, much better! By using some clever entity inheritance, we can solve the audit problem for all tables in the database instead of just one at a time.
The solution involves a base audit table from which all auditable tables inherit. Study the following diagram and then look at the notes below.
- The Audit table contains all the version information. Date stamp, active state, who updated it. Your audit requirements may include other fields here.
- The Audit table contains a PermanentRecordId. Because the PK of Blog records will change as you insert new versions, we need a permanent ID to identify a single blog entry and group versions of the same blog entry.
- The Blog table shares a 1:1 relationship on primary keys. Audit.Id is the PK and Blog.Id is the FK. Notice that Blog.Id is not an identity column.
- The Blog table looks much cleaner without all the audit garbage distraction. When you look at the Blog table, you immediately understand its purpose. Likewise, the Audit table is immediately obvious.
CRUD Operations
Let's try out a few CRUD operations to see how this new approach feels.
Create (Insert) and Read (Select)
Entity inheritance generally requires two more insertions because you must insert into multiple tables for one complete 'record'.
In this case, insertion now involves two operations. First, you must insert a record into the base Audit table. Then, you must get the PK ID of that inserted record for use with the second insertion into the Blog table.
Multiple insertions per operation is one drawback to the entity inheritance strategy, but it can be encapsulated. For example, the following insertion sample could be converted into a Stored Procedure that takes the Blog table values and the value for Audit.Updated_By.
delete from blog
delete from comment
delete from audit
declare @id bigint
insert into Audit(Updated_By) values ('Ben')
select @id = @@identity
insert into Blog (Id, UserId, Title, Body) values ( @id, 1, 'My first entry', 'My blog body')
select * from Blog join Audit on Blog.Id = Audit.Id where IsActive=1
As you can see, the Audit table kicked right in and did its job. We have a PermanentRecordId for this blog entry, and all other information is intact. Notice also that the ID columns are synchronized, and the record is marked as active.
While it's true that you must perform an INNER JOIN
with a WHERE
to select anything using this strategy, the operation is not costly because it is performed on indexed fields.
Update
In versioned recording, an update is really a soft delete followed by an insert.
Soft deletes are performed directly against the audit table. This is one nice feature: to perform a soft delete, you don't even need to know the record type. This means you could make a single Stored Procedure, spSoftDelete(id)
, that accepts the ID of the record to soft delete.
declare @BlogRecordId uniqueidentifier
select @BlogRecordId=PermanentRecordId from Audit where Id = @id
update Audit set IsActive=0 where Id=@id
insert into Audit(Updated_By, PermanentRecordId) values ('Ben', @BlogRecordId)
insert into Blog (Id, UserId, Title, Body)
values ( @@identity, 1, 'My updated first entry', 'My blog body is different now')
select * from Blog join Audit on Blog.Id = Audit.Id
Now we have two entries with the same PermanentRecordId. That means they are different versions of the same logical record. But notice that the old record is no longer active. We could select MAX(Created)
, but an active flag is faster, and as you will see for Undo operations, necessary.
Delete
Deletes retire all versions of a record. This can be performed on the Audit table alone, making it easy to encapsulate.
update Audit set IsActive=0 where PermanentRecordId=@BlogRecordId
select * from Blog join Audit on Blog.Id = Audit.Id
Alternately, if you have the exact Blog.Id or Audit.Id, you can set IsActive that way. As long as all your update operations are done correctly, there should be only one record where IsActive=1. You could even set a constraint to check this.
CRUD Extensions
Versioning opens the possibility of extended operations, such as undo/redo. To use these CRUD extensions, add two more versions for a total of four blog entries.
insert into Audit(Updated_By, PermanentRecordId) values ('Ben', @BlogRecordId)
insert into Blog (Id, UserId, Title, Body)
values ( @@identity, 1, 'My 3rd updated first entry', 'My blog body is different now')
update Audit set IsActive=0 where PermanentRecordId=@BlogRecordId
insert into Audit(Updated_By, PermanentRecordId) values ('Ben', @BlogRecordId)
insert into Blog (Id, UserId, Title, Body)
values ( @@identity, 1, 'My 4th updated first entry', 'My blog body is different now')
Undo/Rollback/Revert
An undo operation involves a few steps:
- Find the version directly preceding the active version, if there is one.
- Soft-delete the active version.
- Activate the version you found in step 1.
In the sample below, we find the version preceding the active version by relying on primary keys for chronological order. Version #3 will always have a PK ID smaller than version #4 of the same record. Some programmers do not like using indexed primary keys to determine the chronological order. If you are among them, you may want to consider using an alternate indexed column to maintain the chronological order. (Note: the SQL Server 'timestamp' data type will not work because records are updated when their Active status changes, and this changes the timestamp value.)
select top 1 @id=Id from Audit where PermanentRecordId=@BlogRecordId
and Id < (select Id from Audit where
PermanentRecordId=@BlogRecordId and IsActive=1) order by Id desc
update Audit set IsActive=0 where PermanentRecordId=@BlogRecordId
update Audit set IsActive=1 where Id = @id
select * from Blog join Audit on Blog.Id = Audit.Id
select top 1 @id=Id from Audit where PermanentRecordId=@BlogRecordId
and Id < (select Id from Audit where
PermanentRecordId=@BlogRecordId and IsActive=1) order by Id desc
update Audit set IsActive=0 where PermanentRecordId=@BlogRecordId
update Audit set IsActive=1 where Id = @id
select * from Blog join Audit on Blog.Id = Audit.Id
select top 1 @id=Id from Audit where PermanentRecordId=@BlogRecordId
and Id < (select Id from Audit where
PermanentRecordId=@BlogRecordId and IsActive=1) order by Id desc
update Audit set IsActive=0 where PermanentRecordId=@BlogRecordId
update Audit set IsActive=1 where Id = @id
select * from Blog join Audit on Blog.Id = Audit.Id
select top 1 @id=Id from Audit where PermanentRecordId=@BlogRecordId
and Id < (select Id from Audit where
PermanentRecordId=@BlogRecordId and IsActive=1) order by Id desc
update Audit set IsActive=0 where PermanentRecordId=@BlogRecordId
update Audit set IsActive=1 where Id = @id
select * from Blog join Audit on Blog.Id = Audit.Id
Redo
Redo is a simple reversal of the undo operation. Find the first version following the active version, and activate it.
select top 1 @id=Id from Audit where PermanentRecordId=@BlogRecordId
and Id > (select Id from Audit where
PermanentRecordId=@BlogRecordId and IsActive=1) order by Id asc
update Audit set IsActive=0 where PermanentRecordId=@BlogRecordId
update Audit set IsActive=1 where Id = @id
select * from Blog join Audit on Blog.Id = Audit.Id
Adding New Auditable Tables
Adding new tables under our version control system is easy, and the CRUD code you already saw above is easily adapted, especially if you choose to encapsulate in Stored Procedures.
Let's add a table to support versionable, nested blog comments to demonstrate how similar the CRUD is for a new table. Also, notice that the CRUD demonstrated above requires zero changes when a new table is added. Outstanding.
At first, supporting multiple records from multiple tables sounds impossibly difficult, but it works with almost no added effort.
- Comment.Id is an FK to Audit.Id, just like Blog.Id.
- Comment.PermanentBlogId will store the PermanentId for the blog entry. You could use an FK directly to Blog.Id, but the downside is that you would have to update all FK references when the active version of the Blog record changes. This way, you give up a little referential integrity (that you could add back with constraints if you wanted to), but you gain simplicity through decoupled revision changes.
- Comment.ParentId points to Comment.Id to allow for nested comments.
So this is getting interesting! Now, comments can have versions just like blog entries, but nothing is ever lost. Let's check our CRUD again to be sure everything works.
Create (Insert)
The only difference here is that we need to reference the PermanentBlogId. That is a requirement only because Comments are owned by Blogs. Fortunately, we have that from earlier in the demo script. If you had the Blog.Id, you could use that to get the PermanentId of the Blog entry.
insert into Audit(Updated_By) values ('Commenter')
select @id = @@identity
insert into Comment (Id, PermanentBlogId, CommentingUserId, Comment)
values ( @id, @BlogRecordId, 2, 'My first comment')
select * from Comment join Audit on Comment.Id = Audit.Id where IsActive=1
If you study this, you will see that this comment version has a different PermanentRecordId. That is correct because this is a different set of versioned data. Each comment will get its own PermanentId.
Update
Update is identical excepting different values for the insertion.
declare @CommentRecordId uniqueidentifier
select @CommentRecordId=PermanentRecordId from Audit where Id = @id
update Audit set IsActive=0 where Id=@id
insert into Audit(Updated_By, PermanentRecordId) values ('Commenter', @CommentRecordId)
insert into Comment (Id, PermanentBlogId, CommentingUserId, Comment)
values ( @@identity, @BlogRecordId, 1, 'My updated first comment')
select * from Comment join Audit on Comment.Id = Audit.Id
Select All Comments for a Blog Entry
Okay, so you're convinced now that the versioning works. You can see that the rollback and restore operations will be performed on the PermanentId values associated with each comment, and you can see each comment gets its own PermanentId. It's confusing to imagine that both Blog entries and Comments have versions!
So, how do you select all the comments for the current version of the blog? Easy.
insert into Audit(Updated_By) values ('Commenter #2')
select @id = @@identity
insert into Comment (Id, PermanentBlogId, CommentingUserId, Comment)
values ( @id, @BlogRecordId, 2, 'I have a comment too')
select * from Comment join Audit on Comment.Id = Audit.Id where IsActive=1
declare @parentId bigint
select @parentId = @id
insert into Audit(Updated_By) values ('Commenter #3')
select @id = @@identity
insert into Comment (Id, PermanentBlogId, CommentingUserId, Comment, ParentId)
values ( @id, @BlogRecordId, 3, 'I have a comment about the comment', @parentId)
select * from Comment join Audit on Comment.Id = Audit.Id where IsActive=1
select * from Comment join Audit on Comment.Id = Audit.Id where IsActive=1 and
PermanentBlogId = @BlogRecordId order by Created asc
Further Study
If you are interested in this approach, I recommend looking into the following advanced topics:
- Using record versioning with your favorite ORM tool
- Using record versioning with code generated DALs
- Entity/table inheritance
- Hierarchical versions (for example, if you wanted a Blog rollback to also roll Comments back)
- Writing views for reports
- Encapsulating and abstracting insert/update operations