Back in May I kicked off a survey about prevalent latches on SQL Server instances across the world (see here). It’s taken me quite a while to get around to collating and blogging about the results, but here they are at last!sql
I got information back for almost 600 servers, and if you remember, I gave you some code to run that would output the top non-page latches that are being waited on during LATCH_XX waits. Non-page latches are those that are neither PAGELATCH_XX (waiting for access to an in-memory copy of a data-file page) nor PAGEIOLATCH_XX (waiting for a data file page to be read into memory from disk).session
Every non-page data structure in SQL Server that must be thread-safe (i.e. can be accessed by multiple threads) must have a synchronization mechanism associated with it – either a latch or a spinlock. I blogged about spinlocks last year – see – and they’re used for some data structures that are accessed so frequently, and for such a short time, that even the expense of acquiring a latch is too high. Access to all other data structures is controlled through a latch.app
Using wait statistics analysis (see here), if you’re seeing LATCH_EX or LATCH_SH wait types are one of the top 3-4 prevalent wait types, contention for a latch might be contributing to performance issues. If that is the case, you’ll need to use sys.dm_os_latch stats to figure out what the most prevalent latch is – using code similar to that below (with some example output):dom
?ide
1post 2ui 3this 4spa 5rest 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
|
You can also see the latch class of the latches being waited for in the resource_description column of sys.dm_os_waiting_tasks DMV when the wait_type column is LATCH_XX.
The survey results are based on running this code, for latch classes that are the top latch class on 10 or more servers.
Just like the wait statistics survey results from 2010, I’m not surprised by these results as we see these over and over on client systems.
For the remainder of this post I’m going to list the 10 latch classes represented in the data above in descending order and explain what they mean. Some of this information is from my knowledge of the internals and some of it is from catching the point that latches are acquired using Extended Events and callstack dumping.
Note that when I’m describing a latch class, there may be many instances of that particular latch class. For instance, an FGCB_ADD_REMOVE latch exists for each filegroup in each database attached to the instance.
I hope you found this interesting and it helps you out with your performance troubleshooting!