Introduction
With SQL Server, the NOLOCK
hint is very tempting especially with SELECT
statements where at least one of the tables is frequently updated. Using NOLOCK
removes all queuing problems and the SELECT
statement runs smoothly, but not without side effects.
Why do we need a hint at all
The first question might be that why do we need to add this hint to get the statement running without additional waits. The answer lies on a basic database management system principle: ACID (atomicity, consistency, isolation, and durability). The property isolation defines that other transactions may not see incomplete modifications. Since SQL Server (when the default locking scenario is used) uses exclusive locks which prevent acquiring read (shared) locks, all reading operations must wait until exclusive locks are released. This causes the SELECT
statement to wait if the data to be fetched is locked exclusively.
Using the NOLOCK
hint (which means exactly the same as READUNCOMMITTED
) actually bypasses the check for exclusive locks and does not set shared locks at all. So what it also means is that the query sees data that is not yet committed to the database which may result in a dirty read. This means that the modifications are not isolated so one of the main database principles is ignored. This is why NOLOCK
-like behaviour isn't used by default.
What it actually means
Let's take an example. First let's create the necessary objects.
CREATE SCHEMA LockTest;
GO
IF OBJECT_ID ( 'LockTest.OrderEntry', 'U' ) IS NOT NULL
DROP TABLE LockTest.OrderEntry;
GO
CREATE TABLE LockTest.OrderEntry (
Id int not null identity(1,1) primary key,
Amount int not null,
UnitPrice decimal not null,
Area int not null
);
GO
IF OBJECT_ID ( 'LockTest.AddOrders', 'P' ) IS NOT NULL
DROP PROCEDURE LockTest.AddOrders;
GO
CREATE PROCEDURE LockTest.AddOrders @OrderCount int AS
BEGIN
DECLARE @counter int = 0;
WHILE (@counter < @OrderCount) BEGIN
INSERT INTO LockTest.OrderEntry
(Amount, UnitPrice, Area)
VALUES
(ROUND(RAND()*100, 0), ROUND(RAND()*100, 2), ROUND(RAND()*10, 0));
SET @counter = @counter + 1;
END;
END;
GO
So now we have a single table and a procedure to create some random data. Let's add something to the table.
BEGIN TRANSACTION;
EXEC LockTest.AddOrders @OrderCount=100;
COMMIT;
Now if we get the sales amounts per area, the result is something like the following. Note that your results are different due to the random values.
SELECT TOP(3)
Area, SUM(Amount * UnitPrice) AS TotalSales
FROM LockTest.OrderEntry
GROUP BY Area
ORDER BY Area;
And the results:
Area TotalSales
---- ----------
0 10454
1 22608
2 22038
Simultaneous transactions, normal behaviour
Now if the database is updated and queried at the same time, what happens in the normal situation? The DML operation takes the necessary locks and possibly prevents the query from finishing. This can be tested with the following. The test requires that you have two simultaneous connections to the database.
In session 1, add new rows in a transaction, leave the transaction open.
BEGIN TRANSACTION
EXEC LockTest.AddOrders @OrderCount=10000;
And using session 2, query again for the top 3 areas. This query won't finish but remains running and waiting for the locks to be released.
SELECT TOP(3)
Area, SUM(Amount * UnitPrice) AS TotalSales
FROM LockTest.OrderEntry
GROUP BY Area
ORDER BY Area;
Now when session 1 rolls back, the transaction locks are released and the query in session 2 continues.
ROLLBACK;
And immediately the results are displayed for session 2.
Area TotalSales
---- ----------
0 10454
1 22608
2 22038
As you see, the results are the same as previously since all the modifications were rolled back.
Results when NOLOCK is used
Now we use the exact same scenario as previously, but this time with the NOLOCK
hint.
In session 1, again add new rows in a transaction, and leave the transaction open.
BEGIN TRANSACTION
EXEC LockTest.AddOrders @OrderCount=10000;
And using session 2, query again for the top 3 areas, but this time with the NOLOCK
hint.
SELECT TOP(3)
Area, SUM(Amount * UnitPrice) AS TotalSales
FROM LockTest.OrderEntry WITH (NOLOCK)
GROUP BY Area
ORDER BY Area;
Results are shown immediately (again your results may differ because of the random values).
Area TotalSales
---- ----------
0 1321810
1 2417946
2 2539965
Now the session 1 rolls back the transaction.
ROLLBACK;
And the exact same query is executed in session 2.
SELECT TOP(3)
Area, SUM(Amount * UnitPrice) AS TotalSales
FROM LockTest.OrderEntry WITH (NOLOCK)
GROUP BY Area
ORDER BY Area;
The results are original since session 1 rolled back the whole thing.
Area TotalSales
---- ----------
0 10454
1 22608
2 22038
Conclusions
What we saw was a dirty read. For a moment, we saw data from the database which (in a sense of isolation) wasn't persistent. So is this a bad thing? Well, it depends on the business logic. Imagine that you're creating a banking application, I'd bet that dirty read is something you really want to avoid. On the other hand, if you typically don't have rollbacks and if you do, and the false data you possibly see doesn't have a negative impact, then you can consider using NOLOCK
.
History
- February 11, 2011: Created.