knee-jerk performance tuning

Paul Randal on Knee-jerk performance tuning

can be due to blocking (lack of NCI causing table-scans), or transactions taking time to complete, or mirroring preventing commitment. Generally, look for the thread at the head of the blocking chain.

NOT network oriented. PR suggests “WAITING_FOR_APP_ACK” SQL sends data to a client and is waiting for acknowledgement: for example SSMS is a client and if doing RBAR (row by agonizing row) analysis, will generate a lot of async_network_io waits.

Could be linked servers OR post 2005 DMVs Also DBCC CheckDB uses OLE DB rowset

usually see high CPU usage, but not always CPU pressure or spinlock.
Thread scheduling is non-preemptive (SQLOS, not windows OS).
Thread is running, suppended, runnable. A thread that needs a resouce is suppended and is on the waiter List (resource wait time), which is unordered. After being signaled, it moves to the Runnable queue until it is running (Signal wait time) with is FIFO.
A running thread that exhausts the 4ms quantum moves to the bottom of the runnable queue and that time is logged as SOS_SCHEDULER_YIELD. Not CPU pressure, just sustained activity where thread is long-running.
If a thread is using lots of different resources, soon it will generate a wait for resources or because something else is locking a resource. Therefore no SOS_SCHEDULER_YIELD. If you are doing a big table scan and have the table, you can’t finish it in 4 ms and do it over and over. Look for dropped NCI, out of date stats, bad query plan

PAGEIOLATCH_XX can be _SH (share): page of data needs to come from disk to be read (most common) or _EX/_UP for (exclusive or update): waiting for data from disk to be in buffer pool to be changed.
This is not always an I/O problem. If count of baseline waits OK, increasing duration could be I/O such as a) misconfiguration/malfunction at i/o sysbsystem, b) slow network, c) other i/o activy contention, d) replication/mirroring
PR: typically both number and duration increased from baseline. Often too many i/os – SQL issue not I/O. Logical I/O is request to Buffer pool for data. If in buffer, the Access Method gets pointer and reads, or else get a Physical I/O and the thread wait is seen as PATCHIOLATCH_SH.
If perfect, Physical I/O to get everything into memory and no more PATCHIOLATCH_SH, just writes to update dirty data.
sudden increase in reads means changes in workload and/or available bufferpool. Might be Windows OS making demand on memory, bloated plan cache, query plans with table/clustered index scan (not seek) because of more workload, parameter sniffing, change in NCI (dropped or changed), implicit conversion.
look for CXPACKET and PAGEIOLATCH_SHs together – large parallel table/CI scans
use sys.dm_os_waiting_tasks to see which query and query plan is operating when the waits occur.

When we need data that is not in memory, we read it from the disk. How long it stays in memory, in seconds, is the PLE. Or the pressure on the buffer pool to make free space for new data.
PLE 300 means on the average, everything changes ever 5 minutes. OK for low memory, should be higher for more RAM.
Johathan K suggestion: ( Buffer pool memory in GB / 4 ) x 300
PR: measure PLE when running correctly
DBCC CHECKDB and index rebuilds drop PLE. Memory for CHECKDB stolen from buffer pool and can be excessive.
monitor PLE in Buffer Manager, but not good for NUMA architecture. NUMA causes a separate pool for each NUMA node (so use Buffer Node:Page life expectancy counters) , and BM PLE is an average of all the nodes (Buffer Manager:Page life expectancy counter.)
large (new) query will cause drop in PLE
changes in PLE common due to explained things; if cause not known:
1) check sys.dm_os_waiting_tasks for task waiting on PAGEIOLATCH_SH
2) check sys.dm_exec_query_stats for queries with lots of physical reads
3) check timing vs. scheduled tasks/SQL Agent jobs
4) check sys.dm_exec_query_memory_grants for queries with large execution memory grants.

Just Add an SSD
SSD do not have I/O latency (move head, wait for disk)
1) alway put logs and tempdb on SSD IFF they are the bottleneck
but if your environment is read-heavy, t-logs not bottleneck, tempdb might or might not be heavily used
in these cases, the additional speed does not speed up processing
use sys.dm_io_virtual_file_stats DMV to find i/o bottlenecks. There are numerous reasons for slow data file reads and log file write latencies.
2) SSDs are fast, don’t sweat index fragmentation. But a) getting to a fragmented index faster does not mitigate the badness of the index problem*, b) fragmented indexs have lots of reclaimable space which is being wasted on expensive media, and c) Jonathan K ( found an observable penalty,
3) SSDs should still be RAID-1 (pairs) . PUtting two ‘volumes’ on one physical SSD is not a solution.
4) SSD do fail. consumer grade SSD MTBF of 2 million hours, vs. consumer spinning disks of 1.5MH
5) comments that SSD fast for squential writes, 1/10 as fast for sequential reads (still 10x faster than hard disks).
6) comment that you get MORE deadlocks on SSDs because reads get closer together, especially under load.

* fragmentation caused by “page splits from pages needing free space for a random insert or row size increase. A page split generates the same amount of transaction log, resource usage, and potential thread waits regardless of where the data/log files are stored.” (PR)