Data Versioning in SQL Server using Snapshot a.k.a. Row Versions

Wendelius

4.34/5 (23 votes)

7 Feb 2018CPOL5 min read

198

Enable SQL Server internal row versioning to achieve greater concurrency in OLTP systems using Snapshots

This article describes how to enable SQL Server internal row versioning to achieve greater concurrency in OLTP systems. This is done by enabling snapshot isolation.

Download source code - 557 B

Introduction

This article briefly describes how to update the operations block by reading the data being modified. This behaviour can be altered using row versioning. The advantage of using row versioning is greater concurrency, but there are also side effects which are described in this article.

In many cases, I've noticed that applications may use a Read uncommitted isolation level to overcome locking related problems (typically, the amount of time used for an operation). Row versioning significantly reduces the need to use the Read uncommitted isolation level. When the Read committed isolation level can be used without spending too much time on the operation, the correctness of the results is significantly higher.

This behaviour is similar to what is in Oracle databases, although the implementation is somewhat different.

If you want to read more about Read uncommitted and its side effects, have a look at What should be considered when NOLOCK hint is used.

Normal Behaviour

First, we need a simple database for testing, and a table with a row. In the simplest form, this can be created in an existing SQL Server instance using the following command:

SQL

-- Create the database
CREATE DATABASE RVTest;

-- Change the context for the connection
USE RVTest;

-- Create a table for testing
CREATE TABLE Test ( 
Column1 varchar(50)
);

-- Insert a row to the table
INSERT INTO Test (Column1) VALUES ('For testing');

Now, when an application starts modifying the data, when done properly, modifications are wrapped inside a transaction. To simulate this, we modify the test row and leave the transaction pending.

SQL

BEGIN TRANSACTION;

UPDATE Test 
SET Column1 = 'Modified value';

If the situation is observed from the database side by querying sys.dm_tran_locks, we can see that an exclusive lock is acquired on the row being modified. Also, a few intent locks are needed at a higher level, but the main concern is in the row lock. The output would look like this:

type description resource_associated_entity_id request_mode request_status request_session_id
---- ----------- ----------------------------- ------------ -------------- ------------------
PAGE 1:176       72057594043105200              IX          GRANT          53
RID  1:176:0     72057594043105200              X           GRANT          53

So far, no problems. Now, when another connection tries to read the same data, it actually tries to get a shared lock on the row. Because an exclusive lock is already granted to the row, a shared lock cannot be given. In this situation, the connection requesting the shared lock is placed in a queue to wait for the lock.

SQL

SELECT * FROM Test;

When we query sys.dm_tran_locks again to see the situation, the wait state is seen like following:

type description resource_associated_entity_id request_mode request_status request_session_id
---- ----------- ----------------------------- ------------ -------------- ------------------
PAGE 1:176       72057594043105200             IS           GRANT          54
PAGE 1:176       72057594043105200             IX           GRANT          53
RID  1:176:0     72057594043105200             X            GRANT          53
RID  1:176:0     72057594043105200             S            WAIT           54

This situation remains until the transaction from the first session is committed or rolled back. When the transaction ends, exclusive locks are removed, and after that, shared locks can again be granted, and the operation for the second session can continue. In a busy system, the situation in the queues can be like in a gift shop on the day before Christmas.

Using Row Versioning

Row versioning is a database setting which can be modified using the ALTER DATABASE command. To enable row versioning, set READ_COMMITTED_SNAPSHOT to on:

SQL

ALTER DATABASE RVTest SET READ_COMMITTED_SNAPSHOT ON;

When executing the command, the database can be in multi-user mode, but there must be no other connections in the database concurrently. If there is, the command doesn't return until all other connections are closed.

Now, when the behaviour is changed, let's have a look at the previous situation. First, one session modifies the test table again:

SQL

BEGIN TRANSACTION;

UPDATE Test 
SET Column1 = 'Another modified value';

At this point, nothing is changed. The session has the same locks as previously, but when another session executes a SELECT on the same data, we can see the difference:

SQL

SELECT * FROM Test;

Now, the query returns immediately, giving the following result:

Column1
-------
For testing

Most importantly, we see that locks are honored. We don't see the uncommitted data as would happen with the Read uncommitted isolation level. Instead, the result we see is the last committed state of the row.

So, what actually happened? When the modification was done (UPDATE statement), SQL Server took a copy of the data before the data was changed. This copy was placed in tempdb. When the row was queried, the database engine noticed that the row has uncommitted modifications, and based on transaction sequence numbers, it read the original data from tempdb.

Trade-offs

Since there are no free meals, this behaviour has some trade-offs. The most significant of them are:

Tempdb space usage. Because versioned data is stored in tempdb, this database must have enough space for all modified rows in all databases in the same instance which have row versioning on. This may radically increase tempdb usage. Space used in tempdb is not freed immediately when it is not needed anymore. Instead, there is a separate thread which cleans up unneeded data from tempdb once in a while (normally, once in a minute).
Amount of I/O increases. First, the data must be placed in tempdb, and when queried, it must be fetched.
Amount of CPU increases because of management operations for this feature.
Data rows consume more space. Each row must have info about the transaction sequence number along with a pointer to the versioned row.
For LOB-fields, each data fragment has 40 bytes less room because of increased header information. For this reason, the database can grow significantly compared to earlier versions of SQL Server.

Conclusion

Row versioning is easy to set up, and it enables much higher concurrency in an environment where the same data is modified and read simultaneously. It uses more resources, and therefore an existing server setup may be inadequate. When used correctly, it will have a very positive impact on the overall throughput of an application.

If the application is designed based on the earlier fact that reading was impossible while an active transaction was modifying the data, row versioning shouldn't be used, or the application must be redesigned where needed.

References

History

11^th October, 2008: Created
15^th October, 2008: Corrected layout in sp_lock outputs.
3^rd November, 2012: Script added and article checked against SQL Server 2012
7^th February, 2018: Replaced deprecated sp_lock with sys.dm_tran_locks

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)