WITH (NOLOCK) SQL’s go Faster button … but is it worth it?

What is the response usually given when you inquire as to why someone is using the WITH (NOLOCK) hint?

I know I’ve heard it justified a few different ways saying it’s a Turbo button or SQL Rocket Fuel, but what it always comes down to is that the NOLOCK hint is used because it makes SQL go faster.  Now, I’ll give people that and I’m not writing this to fight it because it is true; if you use the NOLOCK hint your queries will more than likely run faster

However, I’m fairly sure that I could probably make my car go faster by tearing off the doors, ripping out the seats, and getting rid of all possible, what I perceive as, excess weight.  I might succeed in making my car faster from 0 – 60 but I’m not going to jump in that car to drive to work in the morning.

But why wouldn’t I drive my modified car to work each morning?  Well, I’m no car guy by any means but even I know that by getting rid of those objects  perceived as ‘excess’ weight I’ve also now gotten rid of some safety features that I’ve become accustomed too.  There will now be side effects because of those modifications when driving, like potentially exiting the car unexpectedly on a hard left turn, that I need to think about before jumping behind the wheel.

Well we have to give similar thought to the use of the NOLOCK hint because we’re ultimately telling SQL by using it that we don’t care about data consistency and it can throw away any safety features it sees fit.  We only care about speed.  And with the throwing away of these safety features we now have some possible nasty side effects that can occur with the data being returned.

You’re probably thinking: “Yup, I know what you’re going to say: Dirty Reads and Phantom Reads.  I’ve heard and read all about it before but for my queries I’m not impacted.”  You might be running a query that only cares about the state of the data 5 minutes ago.  Dirty or Phantoms reads have no impact because you don’t care about the present, you’re perfectly safe to use the NOLOCK hint.

However, are even these queries still completely safe?  Well one possible side effect of the NOLOCK hint that isn’t very well known is that it can make a query return duplicate records or miss records entirely.  And this has nothing to do with locks, no modification needs to be happening to the records for this to occur.

Now you might be thinking: “Ah, that won’t really happen will it?  That side effect is just DBA folklore that’s told to scare developers.  That can’t even happen!”

Well, I’ve pasted three scripts below where we will actually make this happen.

The result is kind of sporadic time wise but it will happen at some point after running.  All that needs to be done is running ‘Script 1 Setup’ first to get things ready.  Then open another connection and run ‘Script 2’ Query which will just run a query in a loop breaking if it returns unexpected results.  Finally opening a third connection to run ‘Script 3 Workload’ will generate a workload that causes a lot of index fragmentation/Page Splits and will ultimately cause Script 2 to break.

What happened here?  How did we miss this data?

This couldn’t have had anything to do with bypassing locking as I made sure the data we reported on wasn’t being modified.

Well in my second post in this series I’ll take a very high level look into some SQL internals and the impact the NOLOCK hint has on them to figure out what happened.

Script 1 Setup

— Create a database to use
USE [master];
GO

IF DATABASEPROPERTYEX (N’NOLOCK_Test’, N’Version’) > 0
BEGIN
ALTER DATABASE [NOLOCK_Test] SET SINGLE_USER
WITH ROLLBACK IMMEDIATE;
DROP DATABASE [NOLOCK_Test];
END
GO

— Create the database to use
CREATE DATABASE [NOLOCK_Test] ON PRIMARY (
NAME = N’NOLOCK_Test’,
FILENAME = N’D:\Data\NOLOCK_Test.mdf’)
LOG ON (
NAME = N’NOLOCK_Test_log’,
FILENAME = N’D:\Log\NOLOCK_Test_log.ldf’,
SIZE = 5MB,
FILEGROWTH = 1MB);
GO

USE [NOLOCK_Test]

— Creating a table to hold our data
— Using the UNIQUEIDENTIIFIER to make sure we get some good Index fragmentation
CREATE TABLE dbo.CallDetails

(

CALLID UNIQUEIDENTIFIER PRIMARY KEY DEFAULT ( NEWID() ),
AGENTID INT,
CALLTIME INT,
CALLDETAILS VARCHAR(1000)

)
GO

— Inserting data to create some fragmentation
INSERT INTO dbo.CallDetails (AGENTID, CALLTIME, CALLDETAILS)
SELECT AGENTID, CALLTIME, CALLDETAILS
FROM
(
SELECT CAST(RAND() * 1000 AS INT) AS AGENTID,
10 AS CALLTIME,
REPLICATE(‘CallGoesOn’, 100) AS CALLDETAILS
) AS DETAILS
WHERE AGENTID <> 123
GO 100

— Checking data
SELECT *
FROM dbo.CallDetails

— Creating the Agent we’ll run our report against.
— We want to make sure they have 500 minutes of CALLTIME
INSERT INTO dbo.CallDetails ( AGENTID, CALLTIME, CALLDETAILS )
VALUES
(123,25, REPLICATE(‘CallGoesOn’, 100))
GO 20

— Checking data again, no tricks up our sleeve
SELECT *
FROM dbo.CallDetails

— Checking our Agent has 500 minutes of CALLTIME for our billing report
SELECT AGENTID, SUM(CALLTIME) AS TotalBillableTime
FROM dbo.CallDetails
WHERE AGENTID = 123
GROUP BY AGENTID

Script 2 Query

DECLARE @BillableTime INT

USE [NOLOCK_Test]

WHILE 1 = 1

BEGIN
SET @BillableTime = ( SELECT SUM(CALLTIME) FROM dbo.CallDetails WITH(NOLOCK) WHERE AGENTID = 123)
PRINT ‘Total Billable Time is: ‘ + CAST(@BillableTime AS NVARCHAR(10))
IF @BillableTime <> 500
BEGIN
SELECT @BillableTime
BREAK
END
END

— After this breaks run the query below to see the actual 500 minute value
/*

SELECT AGENTID, SUM(CALLTIME) AS TotalBillableTime
FROM dbo.CallDetails
WHERE AGENTID = 123
GROUP BY AGENTID

*/

Script 3 Workload

USE [NOLOCK_Test]

— We’re now going to randomly generate data for other agents
— to cause more index fragmentation
WHILE 1 = 1
BEGIN
INSERT INTO dbo.CallDetails (AGENTID, CALLTIME, CALLDETAILS)
SELECT AGENTID, CALLTIME, CALLDETAILS
FROM
(
SELECT CAST(RAND() * 1000 AS INT) AS AGENTID,
10 AS CALLTIME,
REPLICATE(‘CallGoesOn’, 100) AS CALLDETAILS
) AS DETAILS
WHERE AGENTID <> 123
END