WITH (NOLOCK) SQL’s go Faster button … but is it worth it? Part 2

What did happen to the example in the last post?  We’ll have to take a very high level look into some SQL internals to see what happened.  A full description is completely out of scope for this entry but we’ll look to get a sense of the workload and what happened.

We’ll be looking into two different aspects that caused this issue to happen.  The first is mentioned in the scripts as the workload created was going to generate a lot of Page Splits.

What is a Page Split?

A page split occurs when an insert/update needs to be performed but there is not enough room on the page for the action to succeed.  The data on the page is then roughly halved, a request for a new page is made to the allocation cache, and the halved data is moved onto the new page.  One thing to note for us is that the allocation of this new page is completely random in where it exists physically on disk.  This comes into play when we look at the next action.

Next, we have to look at how the storage engine will gather the data.  For our query there are two ways that the Storage Engine can gather the data; it can either use an Index Ordered Scan or an Allocation Ordered Scan.  Now, this shouldn’t be confused with the index seek/scan operators that you see in executions plans but these are actions performed by the storage engine to fulfil this operators data requests.

(A very quick description of these two methods is below but a detailed explanation is out of scope for this post.  For more in-depth information on the subject I would suggest looking at Paul Randal’s (Twitter|Blog) blog especially he’s series here: Inside the Storage Engine)

An Index Order Scan is when the storage engine scans the leaf level of the B-Tree following the logical order of the index that is defined by the PrevPage and NextPage fields in the Pages Header.  This is pictured in my crude diagram below, Diagrams are not my specialty.

Index_Strc

An Allocation Order Scan is when the storage engine will use the index IAM pages to scan the index in physical order.  An IAM page contains a bitmap that tracks extents allocated to an index in 4GB chunks.  The below diagram is a very simple contrived example of an Allocation Order Scan.  For more information on IAM pages this blog post on IAM Pages by Paul Randal (Twitter|Blog) will help.

Allocation_Strct

This type of scan can have a substantial performance benefit, however, there are a few rules put in place around the use of this type of scan to ensure data consistency.  One is that the storage engine will only use the scan if it knows that the data underneath won’t change unless, like in our case, it’s been told not to care about data consistency by the use of NOLOCK or the READ_UNCOMMITED isolation level.  SQL server will throw away this security feature and go for performance!   So, with the query in script 2 where we’re performing a Clustered Index Scan, the storage engine is going perform an Allocation Order Scan and start at the first page in the index and continue in physical order until it hits the last one recorded.

So how do these two factors cause this side effect to happen?

Well, the best way I found to explain this is with an example from my little 2 year old.  She loves Legos at the minute and let us suppose I put in front of her 4 buckets full of Legos to match the diagram above.  I’ve made sure that each bucket contains 20 pink pieces.  I now tell her to start at the first bucket and count all the pink Lego pieces in the four buckets going in order like the storage engine will when performing its Allocation Order Scan.  She should come up with 80 pieces.

However, she starts to scan through the buckets for the pink pieces but while she is scanning through the third bucket, I come in like our page split and grab half the contents of bucket two (ten pink piece included) and put that into a bucket just after the fourth bucket.  No when she finishes the forth bucket she continues onto the new fifth bucket and we end up with 90 pink pieces instead of 80.

If we take our query the events in the example would be that when she’s scanning bucket three, I perform our page split on bucket 4 and move half its content (ten pink piece) into a bucket between buckets one and two.  Now when she continues onto bucket 4 she’ll only count ten pink pieces and end up with a count of 70 in total.

Hopefully this will help you put some extra thought into when to use the NOLOCK hint and but with some added weight against the ‘is it worth it?’ part of the question.

WITH (NOLOCK) SQL’s go Faster button … but is it worth it?

What is the response usually given when you inquire as to why someone is using the WITH (NOLOCK) hint?

I know I’ve heard it justified a few different ways saying it’s a Turbo button or SQL Rocket Fuel, but what it always comes down to is that the NOLOCK hint is used because it makes SQL go faster.  Now, I’ll give people that and I’m not writing this to fight it because it is true; if you use the NOLOCK hint your queries will more than likely run faster

However, I’m fairly sure that I could probably make my car go faster by tearing off the doors, ripping out the seats, and getting rid of all possible, what I perceive as, excess weight.  I might succeed in making my car faster from 0 – 60 but I’m not going to jump in that car to drive to work in the morning.

But why wouldn’t I drive my modified car to work each morning?  Well, I’m no car guy by any means but even I know that by getting rid of those objects  perceived as ‘excess’ weight I’ve also now gotten rid of some safety features that I’ve become accustomed too.  There will now be side effects because of those modifications when driving, like potentially exiting the car unexpectedly on a hard left turn, that I need to think about before jumping behind the wheel.

Well we have to give similar thought to the use of the NOLOCK hint because we’re ultimately telling SQL by using it that we don’t care about data consistency and it can throw away any safety features it sees fit.  We only care about speed.  And with the throwing away of these safety features we now have some possible nasty side effects that can occur with the data being returned.

You’re probably thinking: “Yup, I know what you’re going to say: Dirty Reads and Phantom Reads.  I’ve heard and read all about it before but for my queries I’m not impacted.”  You might be running a query that only cares about the state of the data 5 minutes ago.  Dirty or Phantoms reads have no impact because you don’t care about the present, you’re perfectly safe to use the NOLOCK hint.

However, are even these queries still completely safe?  Well one possible side effect of the NOLOCK hint that isn’t very well known is that it can make a query return duplicate records or miss records entirely.  And this has nothing to do with locks, no modification needs to be happening to the records for this to occur.

Now you might be thinking: “Ah, that won’t really happen will it?  That side effect is just DBA folklore that’s told to scare developers.  That can’t even happen!”

Well, I’ve pasted three scripts below where we will actually make this happen.

The result is kind of sporadic time wise but it will happen at some point after running.  All that needs to be done is running ‘Script 1 Setup’ first to get things ready.  Then open another connection and run ‘Script 2’ Query which will just run a query in a loop breaking if it returns unexpected results.  Finally opening a third connection to run ‘Script 3 Workload’ will generate a workload that causes a lot of index fragmentation/Page Splits and will ultimately cause Script 2 to break.

What happened here?  How did we miss this data?

This couldn’t have had anything to do with bypassing locking as I made sure the data we reported on wasn’t being modified.

Well in my second post in this series I’ll take a very high level look into some SQL internals and the impact the NOLOCK hint has on them to figure out what happened.

Script 1 Setup

— Create a database to use
USE [master];
GO

IF DATABASEPROPERTYEX (N’NOLOCK_Test’, N’Version’) > 0
BEGIN
ALTER DATABASE [NOLOCK_Test] SET SINGLE_USER
WITH ROLLBACK IMMEDIATE;
DROP DATABASE [NOLOCK_Test];
END
GO

— Create the database to use
CREATE DATABASE [NOLOCK_Test] ON PRIMARY (
NAME = N’NOLOCK_Test’,
FILENAME = N’D:\Data\NOLOCK_Test.mdf’)
LOG ON (
NAME = N’NOLOCK_Test_log’,
FILENAME = N’D:\Log\NOLOCK_Test_log.ldf’,
SIZE = 5MB,
FILEGROWTH = 1MB);
GO

USE [NOLOCK_Test]

— Creating a table to hold our data
— Using the UNIQUEIDENTIIFIER to make sure we get some good Index fragmentation
CREATE TABLE dbo.CallDetails

(

CALLID UNIQUEIDENTIFIER PRIMARY KEY DEFAULT ( NEWID() ),
AGENTID INT,
CALLTIME INT,
CALLDETAILS VARCHAR(1000)

)
GO

— Inserting data to create some fragmentation
INSERT INTO dbo.CallDetails (AGENTID, CALLTIME, CALLDETAILS)
SELECT AGENTID, CALLTIME, CALLDETAILS
FROM
(
SELECT CAST(RAND() * 1000 AS INT) AS AGENTID,
10 AS CALLTIME,
REPLICATE(‘CallGoesOn’, 100) AS CALLDETAILS
) AS DETAILS
WHERE AGENTID <> 123
GO 100

— Checking data
SELECT *
FROM dbo.CallDetails

— Creating the Agent we’ll run our report against.
— We want to make sure they have 500 minutes of CALLTIME
INSERT INTO dbo.CallDetails ( AGENTID, CALLTIME, CALLDETAILS )
VALUES
(123,25, REPLICATE(‘CallGoesOn’, 100))
GO 20

— Checking data again, no tricks up our sleeve
SELECT *
FROM dbo.CallDetails

— Checking our Agent has 500 minutes of CALLTIME for our billing report
SELECT AGENTID, SUM(CALLTIME) AS TotalBillableTime
FROM dbo.CallDetails
WHERE AGENTID = 123
GROUP BY AGENTID

Script 2 Query

DECLARE @BillableTime INT

USE [NOLOCK_Test]

WHILE 1 = 1

BEGIN
SET @BillableTime = ( SELECT SUM(CALLTIME) FROM dbo.CallDetails WITH(NOLOCK) WHERE AGENTID = 123)
PRINT ‘Total Billable Time is: ‘ + CAST(@BillableTime AS NVARCHAR(10))
IF @BillableTime <> 500
BEGIN
SELECT @BillableTime
BREAK
END
END

— After this breaks run the query below to see the actual 500 minute value
/*

SELECT AGENTID, SUM(CALLTIME) AS TotalBillableTime
FROM dbo.CallDetails
WHERE AGENTID = 123
GROUP BY AGENTID

*/

Script 3 Workload

USE [NOLOCK_Test]

— We’re now going to randomly generate data for other agents
— to cause more index fragmentation
WHILE 1 = 1
BEGIN
INSERT INTO dbo.CallDetails (AGENTID, CALLTIME, CALLDETAILS)
SELECT AGENTID, CALLTIME, CALLDETAILS
FROM
(
SELECT CAST(RAND() * 1000 AS INT) AS AGENTID,
10 AS CALLTIME,
REPLICATE(‘CallGoesOn’, 100) AS CALLDETAILS
) AS DETAILS
WHERE AGENTID <> 123
END