Wait Statistics……What to worry for?

A while back, I once blogged about how to troubleshoot a SQL Server, and one of the first things that people complain regarding servers is that the server is slow.  Now there are a lot of tools and mechanisms within SQL Server for gathering metrics that can help us point towards the issue but the best place to start is by looking at the wait statistics.

There are a lot of ways to obtain information regarding the wait statistics, but the most useful and quick way is to run a query against the sys.dm_os_wait_stats DMV. Again, you can build your own query to pull the information that can help you out find the issues in the server, but if you are new to this, I’d suggest using a query developed by Paul Randal which I have been using since quite a while and it certainly gives a clear picture in the most convenient manner and is quite fast as well. The query is provided below and you can even check the link to the query here

WITH[Waits]AS
    (SELECT
        [wait_type],
        [wait_time_ms]/1000.0 AS[WaitS],
        ([wait_time_ms]-[signal_wait_time_ms])/1000.0 AS[ResourceS],
        [signal_wait_time_ms]/1000.0 AS[SignalS],
        [waiting_tasks_count]AS[WaitCount],
        100.0 *[wait_time_ms]/SUM([wait_time_ms])OVER()AS[Percentage],
        ROW_NUMBER()OVER(ORDERBY[wait_time_ms]DESC)AS[RowNum]
    FROMsys.dm_os_wait_stats
    WHERE[wait_type]NOTIN(
        N'BROKER_EVENTHANDLER',N'BROKER_RECEIVE_WAITFOR',
        N'BROKER_TASK_STOP',N'BROKER_TO_FLUSH',
        N'BROKER_TRANSMITTER',N'CHECKPOINT_QUEUE',
        N'CHKPT',N'CLR_AUTO_EVENT',
        N'CLR_MANUAL_EVENT',N'CLR_SEMAPHORE',
        N'DBMIRROR_DBM_EVENT',N'DBMIRROR_EVENTS_QUEUE',
        N'DBMIRROR_WORKER_QUEUE',N'DBMIRRORING_CMD',
        N'DIRTY_PAGE_POLL',N'DISPATCHER_QUEUE_SEMAPHORE',
        N'EXECSYNC',N'FSAGENT',
        N'FT_IFTS_SCHEDULER_IDLE_WAIT',N'FT_IFTSHC_MUTEX',
        N'HADR_CLUSAPI_CALL',N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
        N'HADR_LOGCAPTURE_WAIT',N'HADR_NOTIFICATION_DEQUEUE',
        N'HADR_TIMER_TASK',N'HADR_WORK_QUEUE',
        N'KSOURCE_WAKEUP',N'LAZYWRITER_SLEEP',
        N'LOGMGR_QUEUE',N'ONDEMAND_TASK_QUEUE',
        N'PWAIT_ALL_COMPONENTS_INITIALIZED',
        N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP',
        N'QDS_SHUTDOWN_QUEUE',
        N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP',
        N'REQUEST_FOR_DEADLOCK_SEARCH',N'RESOURCE_QUEUE',
        N'SERVER_IDLE_CHECK',N'SLEEP_BPOOL_FLUSH',
        N'SLEEP_DBSTARTUP',N'SLEEP_DCOMSTARTUP',
        N'SLEEP_MASTERDBREADY',N'SLEEP_MASTERMDREADY',
        N'SLEEP_MASTERUPGRADED',N'SLEEP_MSDBSTARTUP',
        N'SLEEP_SYSTEMTASK',N'SLEEP_TASK',
        N'SLEEP_TEMPDBSTARTUP',N'SNI_HTTP_ACCEPT',
        N'SP_SERVER_DIAGNOSTICS_SLEEP',N'SQLTRACE_BUFFER_FLUSH',
        N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
        N'SQLTRACE_WAIT_ENTRIES',N'WAIT_FOR_RESULTS',
        N'WAITFOR',N'WAITFOR_TASKSHUTDOWN',
        N'WAIT_XTP_HOST_WAIT',N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG',
        N'WAIT_XTP_CKPT_CLOSE',N'XE_DISPATCHER_JOIN',
        N'XE_DISPATCHER_WAIT',N'XE_TIMER_EVENT')
    AND[waiting_tasks_count]>0
 )
SELECT
    MAX([W1].[wait_type])AS[WaitType],
    CAST(MAX([W1].[WaitS])ASDECIMAL(16,2))AS[Wait_S],
    CAST(MAX([W1].[ResourceS])ASDECIMAL(16,2))AS[Resource_S],
    CAST(MAX([W1].[SignalS])ASDECIMAL(16,2))AS[Signal_S],
    MAX([W1].[WaitCount])AS[WaitCount],
    CAST(MAX([W1].[Percentage])ASDECIMAL(5,2))AS[Percentage],
    CAST((MAX([W1].[WaitS])/MAX([W1].[WaitCount]))ASDECIMAL(16,4))AS[AvgWait_S],
    CAST((MAX([W1].[ResourceS])/MAX([W1].[WaitCount]))ASDECIMAL(16,4))AS[AvgRes_S],
    CAST((MAX([W1].[SignalS])/MAX([W1].[WaitCount]))ASDECIMAL(16,4))AS[AvgSig_S]
FROM[Waits]AS[W1]
INNERJOIN[Waits]AS[W2]
    ON[W2].[RowNum]<=[W1].[RowNum]
GROUPBY[W1].[RowNum]
HAVINGSUM([W2].[Percentage])-MAX([W1].[Percentage])<95;-- percentage threshold
GO
Now, the information returned from this query can be mind boggling if you do not know where to look at and what to look at. This is where today’s blog post comes in help. During my professional career as a SQL Server DBA, I have come across many wait types that causes a server to go slow. But, we should not be worrying about all of those wait types. In-fact, several DBA’s across the globe including me worry about only a handful of those wait types based on their occurrence, severity, etc. Let us look at what those wait types are:
  • ASYNC_NETWORK_IO: The predominant cause of this wait type is row by row processing of results in a client, instead of caching the results client-side and asking SQL Server to send more. A common myth is that this wait type is usually caused by network problems which is rarely the case.
  • CXPACKET: This wait type always occurs when there is parallelism, as the parent thread in a parallel operation waits until all the child thread have completed. However, when parallel threads are given unbalanced workload, the threads that finish early also fall victim to this wait type and adding to the wait type and thus becoming most prevalent. So this wait type could be good with lesser statistics, as the workload has lots of good parallelism, or it could be bad if there’s unwanted parallelism or problems causing skewed distribution of work among parallel threads.
  • LCK_M_X: This wait type commonly occurs when there is lock escalation being seen. It could also be caused by using a restrictive isolation level like REPEATABLE_READ or SERIALIZABLE that requires S and IS locks to be held until the end of a transaction.
  • PAGEIOLATCH_SH: This wait type occurs when a thread is waiting for a data file page to be read into memory. Common causes of this are when the workload doesn’t fit in memory and the buffer pool has to keep evicting pages and reading others in from disk, or when query plans are using table scans instead of index seeks, or when the buffer pool is under memory pressure which reduces the amount of space available for data.
  • WRITELOG: This wait type is common to see in the first few top waits on servers as the transaction log is often one of the chief bottlenecks on a busy server. This could be caused by the I/O subsystem not being able to keep up with the rate of log flushing combined with lots of tiny transactions forcing frequent flushes of minimal-sized log blocks.
  • PAGELATCH_EX: The main reasons for this wait type are TEMPDB allocation bitmap contention and an insert hotspot. Basically meaning, when lots of concurrent threads are creating and dropping temp table combines with a small number of TEMPDB files and not having TF118 enabled, or from lots of concurrent threads inserting small rows into a clustered index with an identity value, leading to contention on the index lead-level pages.
  • LCK_M_IX: This wait type occurs when a thread is waiting for a table or page IX lock so that a row insert or update can occur. It could be from lock escalation to a table X or S lock causing all other threads to wait to be able to insert/update.
  • SOS_SCHEDULER_YIELD: The most common cause of this wait type is that the workload is memory resident and there is no contention for resources, so threads are able to repeatedly exhaust their scheduling quanta, registering SOS_SCHEDULER_YIELD when they voluntarily yield the processor.

Now, even among these wait types mentioned above, which ones are a critical if they seem to be the most prevalent as per the results of the query provided earlier. Personally for me, it would be the LCK_M_IX wait type as most of the other waits are likely to be very small in duration, and although they’d make the overall system slower, the locking waits can easily cause individual queries to time out and fail, leading to a very poor user experience.

It goes without saying that this in no way means that you can neglect the other wait types as every individual system design and requirement can make any of the above wait types to be of concern in a particular server. And so, now that you know how to find why the server is slow by running the query, you even have the answer to what exactly could be the issue in the server.

There are lots of other wait types but are not included in today’s blog. If there is any particular wait type that you would want me to write about or discuss feel free to post a comment. Also, I would love to hear from you if you agree or disagree to the comprehensive wait types that I have discussed today. I would be more that happy to reply in my upcoming blog post.

Advertisements
This entry was posted in ms sql, Performance Tuning, Troubleshooting, Uncategorized and tagged , , , , , , , , . Bookmark the permalink.

One Response to Wait Statistics……What to worry for?

  1. Pingback: Wait Statistics……What to worry for? — My DBA Lounge – Programmer's blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s