Audit Trail – Tracing Data Changes in Database

Petr Kozelek

4.84/5 (17 votes)

30 Aug 2010Public Domain6 min read

177.1K

Logging of data changes in relational databases

Introduction

The common requirement in many enterprise applications is logging of data changes in a database - what data has changed, who changed them and when (audit logging). Bloggers published many articles about this old problem but there are only few general approaches on how to do that in relational databases.

A Shakespearean Hypothetical - Problems for Database Administrators

This is a fiction story that never happened:

Juliet Capulet meets Paul Smith; they fell in love and marry on 25^th August 2009. When Juliet is back at work, she asks the db admin to change her name in a company database. The admin is not happy to have additional work with this stuff but he does it for her.

Juliet's and Paul's marriage is not going well and Juliet divorces Paul after a short period. Soon Juliet feels alone, she meets Romeo Montague and, of course, falls in desperate love with him. Logically, another wedding is arranged and Juliet asks db admin to change her name again on 10^th January 2010.

The tragedy of all Juliet's love relationships finishes when she finds her husband Romeo as a roulette player spending all their money. She divorces again and changes her name to Juliet Singleton on 21st July 2010 with the vision not to marry anybody again. Needless to say, the db admin is close to committing suicide because his database does not allow changing the name more than twice.

What would you suggest the database admin to do better in his next life?

Problem Description

Let us summarize what the db admin was asked to do with Juliet's personal data:

Juliet Capulet gets the job on 2005-03-01
Name Juliet Capulet changes to Juliet Smith on 2009-08-25
Name Juliet Smith changes to Juliet Montague on 2010-01-10
Name Juliet Montague changes to Julie Singleton on 2010-07-21

Solution No. 1: Roll off Out-dated data

The simplest solution for keeping historical data in the database is to create a duplicated record before update. After that, we can update the requested data. In case of tracing Juliet's name, we get these data in the database table:

id	name	created_date	created _by
124	Juliet Capulet	2005-03-01	admin
124	Juliet Smith	2009-08-25	admin
124	Juliet Montague	2010-01-10	admin
124	Juliet Singleton	2010-07-21	admin

In real database schema, id is primary key and has to be unique. So we need an extra column containing information about row id from which data were rolled off. Additionally, it would be helpful to have a column representing status of the record. Because the situation gets more complex, here I show how data evolves in the database:

Juliet Capulet gets the job on 2005-03-01.

id	original_id	status	name	created_date	created _by
124	124	active	Juliet Capulet	2005-03-01	admin

Name Juliet Capulet changes to Juliet Smith on 2009-08-25.

id	original_id	status	name	created_date	created _by
124	124	active	Juliet Smith	2009-08-25	admin
254	124	history	Juliet Capulet	2005-03-01	admin

Name Juliet Smith changes to Juliet Montague on 2010-01-10.

id	original_id	status	name	created_date	created _by
124	124	active	Juliet Montague	2010-01-10	admin
254	124	history	Juliet Capulet	2005-03-01	admin
347	124	history	Juliet Smith	2009-08-25	admin

Name Juliet Montague changes to Julie Singleton on 2010-07-21.

id	original_id	status	name	created_date	created _by
124	124	active	Juliet Singleton	2010-07-21	admin
254	124	history	Juliet Capulet	2005-03-01	admin
347	124	history	Juliet Smith	2009-08-25	admin
489	124	history	Juliet Montague	2010-01-10	admin

You can see that each change to the record with id=124 invokes:

creating duplicated row with status=history and
updating the column name to a requested value with created_date set to current date.

When the active record is going to be "deleted", then only the status flag changes the value from active to history.

Advantages

It is simple to implement audit logging based on rolling out old data. You do not need additional tables. Maintenance is simple, too. If you decide to remove old data, then it is the matter of one SQL.

Disadvantages

But what if you need to implement audit logging in more tables? You need to add columns original_id, status, created_date and created_by. And you have to implement previously introduced roll-off logic, either to database directly (usually with table triggers) or in the application.

The problem can be that when data changes, then the whole record is copied, i.e. also data which does not change. It causes data duplication and, potentially, database disk space can increase markedly if changes occur frequently. For example, if table persons has a column photo with binary data containing the photography then each and every time Juliet's name changes the whole record (including the photo) is copied to rolled-off record.

Another disadvantage is that the complexity of each table supporting audit logging increases. You must have in mind all the time that retrieving the records is not simple. You always have to use the SELECT clause with condition:

SQL

SELECT * FROM persons WHERE status='active'

When you need to roll back historical data for a particular record, you have to copy all columns from the historical record to the active record. It would be easier just to switch status value from "history" to "active", but it would break reference integrity if the record was referenced in other tables.

Solution No. 2: Dedicated Data-Tracing Table

Another approach is based on separate audit log table that is dedicated to logging data changes coming from all tables having audit log feature:

id	table	column	row	changed_date	changed_by	old_value	new_value
1241	persons	name	124	2005-03-01	admin	Juliet Capulet	Juliet Capulet
1654	persons	name	124	2009-08-25	admin	Juliet Capulet	Juliet Smith
2547	persons	name	124	2010-01-10	admin	Juliet Smith	Juliet Montague
3645	persons	name	124	2010-07-21	admin	Juliet Montague	Juliet Singleton

Advantages

This solution separates the concerns much better. There is only one common data store for all historical data. Further, you have to implement audit log feature only once for all tables which should support it.

Compared to the previous solution, when the record changes, only the changed data are logged. Thus it is expected to be a little bit faster but I did not do any tests so I cannot prove it exactly. The real advantage is that you need lower disk space to store data changes. You can improve the table structure when you normalize the log table and you put just integer references to "table" and "column".

You can easily retrieve user activity just by the query

SQL

SELECT * FROM audit_log WHERE changed_by='admin'

Disadvantages

All data changes are logged in a common table so old and new values stored there must be of some generic type. For example, in Microsoft SQL, it can be "sql_variant". Or you can convert the values to "varchar" before storing them in the log table.

Rollback is a rather difficult task to be implemented. If you want to roll data back to 3^rd March 2010, you have to use query:

SQL

SELECT [table], [column], old_value, new_value FROM audit_log WHERE _
  [table]='persons' AND row=124 and changed_date > '2010-03-01' ORDER BY changed_date

and then you iterate through the result set and build the dynamic query because table and column information is in the result. It is more complicated compared to the previous solution. Fortunately, the feature has to be implemented only once and then can be reused across the whole application.

History

30^th August, 2010: Initial post

License

This article, along with any associated source code and files, is licensed under A Public Domain dedication