6 SQL Surprises

Brent Ozar had a redgate whitepaper on 6 dba problems and what to do. The problems are not trivial, but the more important concept is how to approach determine the problems.

First, a simple database with two indexes slows way do. fine, run the appropriate query with check execution plan on. Despite the indexes, the plan say table scan, not index seek. Apparently, for bulk loading or an ETL job, someone remove or here disabled the index. Fine, but they didn’t turn it back on afterwards. there are a couple of approaches. check for ANY changes in the schema. Count the number of disabled indexes and alert if that changes

The server is configured to use a certain amount of memory for SQL, but it is using less. This might happen if OS stuff is overflowing the 4GB or 10% RM rule. The obvious culprit is remote desktop to use SSMS or to copy files/download a service pack. the file will be cached in RAM. eventually it will be purges, we can use an undocumented utility from SysInternals, or we can restart the server. Best to copy from from another machine via a share.

Initially, we have great performance from our SSD disk (fusion IO) but overtime, performance degrades. Best performance is to a clean partition. and as it becomes partially full. we might have to over-write deleted data – which is SLOW. And the utilities also use up a lot of drive space OOTB (under provisioning) . Monitor the speed and when it starts getting slower, copy from one area to another freshly initialized disk so that all the writable space is clean.

Tempdb should be in multiple files. For a single file, all activity is coordinated by the Shared Global Allocation Map (SGAM), and we can have major contention on IT when we have to actively write to tempdb. More files, more SGAM pages and less contention. We need the right number and they need to be the same size.

Page splits occur when a page, for a CI or for the index itself, is full. Page split are IO intensive and lead to fragmentation. More free space, less data in memory per page. and more pages read to get the same data and If we are going to update a table or an index is going to be changing a lot, we might need mroe free space when the page is created (or the page is split). The fill factor does not tell you anything about how full a page will become, only what happens when it is created, rebuilt or reorganized, and can be done on CI or individual NCIs.

As other users or programs modify daatbases (think application upgrades) the number of alerts and warning that are logged might change. You can monitor the number of alerts and warnings over time.

log_reuse_wait_desc

There are database files (.mdf) and log files (.ldf). Both may grow. Database files grow if there is more data being stored in them. they grow when full either by a fixed percentage, or by a constant amount. When they grow, there will a temporary period during which the database will not respond well, since the .mdf file is being heavily used during growth. This same thing is true of transaction log files. Immediately after growth, there will be unused space in both file types.

The size of the files is like a high-water mark – at some point in the past, the file needed to be that size. In the database, unless data was removed, the file NEEDS to be that large, more data will be added and eventually the file will fill and need to grow again. there are ways to mitigate this, but with more data comes bigger (or more) .mdf files.

This is not true of log files. Logs contain information about transactions. After a transaction completes, either the log file space can be reclaimed (truncated, cleared) or the log data will wait until conditions are right, then the log entry can be discarded. In a full recovery model, data in the translation log might become available to clear following a log backup; in the simple recovery model, following a checkpoint which writes the data, which is in cache, to disk. A growing log file means that either transaction data is coming in faster than can be ‘handled’ or the condition to allow reclamation has not recently occurred, or occured yet.

In a steady-state condition, the log file will grow large enough, and then log records can be removed (truncated, cleared) and new records recreated without the need to grow the log, or the container. To make the container smaller arbitrarily, would require the file growth in the future, with temporary but possibly significant performance impact. Ideally, the log file will never growth an when the database is setup, the log will be ‘pregrown” to the correct size.

Some “usual” conditions can cause the log file to accumulate an “unusual” number of log entries. Data deletion, bulk imports, unusual use of the tempdb, and such will log events and traction data rapidly. If we know these types of events have occurred, we MIGHT consider that they will not happen again and that we can safely shrink the logs to a “normal” level. we should do this, rather than shrink the log as much as possible.

Possible descriptions are
NOTHING
CHECKPOINT
LOG_BACKUP
ACTIVE_BACKUP_OR_RESTORE
ACTIVE_TRANSACTION
DATABASE_MIRRORING
REPLICATION
DATABASE_SNAPSHOT_CREATION
LOG_SCAN
AVAILABILITY_REPLICA

If the log files are increase, the query:

select @@servername, name, log_reuse_wait_desc from sys.databases

will tell us why the log was not able to be reduced the last time that happened. This can give us a clue as to what might be running so that the shrink did not occur as expected. if the log is not growing, the excepted value for log)resuse_wait_desc would be “nothing”, although this can be seen at anytime if the log is growing and there is nothing special that caused it not to be cleared.

 

Paul Randal, SQLSkills article.  2009 TechNet Article on Logging by Paul Randal. 

t-logs

Transaction logs

All changes to data are recorded in data pages held in memory in the buffer cache

As soon as possible, they are written sequentially to the transaction log as implicit transactions or as a series of entries into an explicit transaction

Next they are written to disk, possible with random access following a checkpoint or via lazy writer if there is memory pressure

In the transaction log, which typically is a single file per database, there are a number of Virtual log files (VLF) to which the transactions are written.  The order is effectively round-robin as long as the next VLF is marked as inactive and therefore available.  The active loop is the first to the final vlf which have at least one active log entry – which is a transaction that has not been committed and therefore might be rolled back.

In simple recovery, all inactive VLFs are truncated immediately after a checkpoint.

In full or bulk-record recovery, a checkpoint writing data to disk does not permit VLFs to be truncated, so VLFs are preserved so that a chain of LSN are maintained intact – allowing a point in time recovery from the last full database backup.  When the transaction log is backed up, the current transaction log VLF can be truncated if there are no active log entries in them. 

After all LDFs are written to, the transaction log file might be forced to grow.  Best practice is to create the log file at the proper size so that the frequency of backups does not force the log to grow.  Second best practice is to grow the log file at a constant size (one recommendation is to start testing with a size of ½ GB (512MB), rather than a percentage of the file.  If the increase is small, a number of small LDFs are created – if the increase is larger, larger LDFs are created.  There is an impact on the database during log growth, and the new space has to be initialized, whereas database growth could be autoinitialized.

Worse case is to cause or allow the transaction log to grow larger than necessary, then shrink the log to smaller than required and force it to grow again.  One recommendation for full recovery  is to create the log to 1.5x the size of the largest data file – this will allow for transactions to be stored as well as for transactions that occur during index recompile or index rebuilds which are heavily transactional and require log growth.

If the volume for the transaction log gets filled, or the transaction log growth is too slow, a 9002 error will be generated and the database will be converted into read-only mode.  

disk I/O problem with Redgate

this is a DBA Team ‘episode‘ – very clever but way too long for the useful content.  After we get “don’t make Grant, the Scary DBA, angry, we learn:

The RDBMS is slowing down, which seems to follow an application install

  • Physical Disk\Avg. Disk Reads/sec and Physical Disk\Avg. Disk Writes/sec are slow, over 50ms
  • sys.dm_os_wait_stats shows lots of PAGEIOLATCH_SH waits:  the buffer pages are not being latches rapidly becuase the disk is slow to provide the data
  • fact tables, entity-Attribute-Value (EAV) and ORMs are evidence of lack of minding the DBA
  • sys.dm_os_waiting_tasks will show which tasks are waiting, being blocked or blocking, The resource_description field will show the ID of the blocking page and therefore the table to which that data belongs.
  • Virtual files statistics shows how reads and writes are occuring across files and databases.
  • even if the problem is I/O, a bad application can overwhelm a good filesystem and make I/O become the bottleneck
  • Application developers hate RDBMS and love noSQL, which does not need a DBA