tempdb location, size and management

There is a very nice three-part post (123) from several writers, including Glenn Berry, excerpted from a John Wiley book  posted on the Confio website: LogicalRead.

The first part of the post reminds us of the value of separating transction logs, data and tempdb for SQL and then that if we are using a SAN, we probably don’t know if the logical and physical separation will make any difference.  The housekeeping of having a separate volume for logs, making it easier to monitor and manage, still might apply.  If we have a physical server, we might choose to put logs on a local disk, but if we have a cluster, we have to put all the data files on the shared storage.

However, a cluster failover is effectively a restart of the server, including any playforward of logs to update the new server so sharing  tempdb was not logical, since on restart, tempdb is recreated. In MS SQL 2008 and R2,  shared was still the supported configuration.  On MS SQL 2012, we can put tempdb on a local (SSD?) disk to both increase I/O for tempdb and to offload some of that work from the SAN.

The physical paths for the tempdb file(s) and log(s) will have to be identical between machines – same drive ID and path.

the default install of SQL will create an 8MB tempdb file, 1MG log with 10% autogrowth.  Typically not the best sizes, especially the percentage  growth.  A tempdb that is in autogrowth is not usable – your system is not updating, and there can be fragmentation.  If we don’t know the ‘right’ size but doubt is is 8/1, we can change the settings in SSMS.

  1. If tempdb has it own disk, make it very large – almost filling the disk. There will be no contention with other processes
  2. Ideally we should never have to autogrow our database or logs, but as a safety matter, we should leave it on.  Autogrowh should be to a filesize, not to a percentage.  Or else the last time it autogrows will be the very slowest and most painful event.  If it is smaller and fixed, we can  monitor it and see something different then “your disk is full’.
  3.  Instant File Initialization (IFI)  should be on to make tempdb growth faster.  This works if the SQL account is admin or if you have set gpo for the working account to have the right permissions (Manage Volume Maintenance).  It doesn’t help with log files.  Consider autogrowth of 50-500 as a start for tempdg and autogrowth by 10MB for the transaction logs.

Since each database has its own tempdb and since each computer (probably) has mutiple processors and cores, we can be working on different tables at the same time and the problem of one process locking/blocking another process also needing tempdb can easily occur.

We can create new tempdbs and logs from the SSMS GUI or using t-SQL

Multiple tempdb files, as many as one per core, will increase I/O efficiency. One file per core is probably too many – there are formula to get the right number.  The should be the same size, since MS SQL will do a proportional fill algorithm to pick the file to write to.  If they are the same, we get the efficient “round robin” approach.  On a big server, 8 is a good place to start – you can always add more if there is contention.

——————

SQLTips author and blogber Paul Randall also speaks to this issue in the Accidental DBA series posting 27/30.   The word according to Paul is that tempdb contention is not a volume I/O issue nor a general blocking issue, it has to do with multiple threads attempting to access allocation pages in memory. The tempdb database has lots of temporary tables – and each time a new table is created (or an old one destroyed) the PFS page must be accessed to record this.   Only one thread at a time can use the table – therefore it is probable that something is always waiting.  He has a query to monitor this using the sys.dm_os_waiting_tasks DMV.

He talks about (1.) not using tempdb or the effects (all good?) of turning on (2.)  trace flag 1118.  You also can reduce contention by creating (3.) multiple tempdb databases and therefore multiple PFS pages to write to.  Create 8 tempdb, possibly reduce contention to 1/8 of the former.  Bob Ward says make as many pages as you have  logical cores (or up to 8) and then only 8 for bigger servers until you see evidence of contention.

He has a screen shot of increased throughput (transactions/sec)  gained by increasing the number of tempdb databases – one would assume that the dmv would also show fewer pagelatch_xx wait states.

——————-

tempdb is used for temporary user objects (temp tables, table variaables, cursors), work tables (sorts, index rebuilds, hash joins, spills, temp LOB storage, CTEs and Table valued expressions) and version storages for modifications  (read-committed snapshots, snapshot isolation, triggers, MARS).  Idera (ref) has a query to sys.dm_db_file_space_usage to get how much in each category.

When a page is needed for a new tempdb table, the Page Free Space (PFS) table needs to be updated (and updated on removal), One PFS page for each 500MB of data file.

Task waiting for PageIOLatch or PageLatch are experiencing contention – the resource descriptor points to the page.  If

page ID = 1 or (Page ID%8088) = 1, contention allocation page (PFS)
page ID = 2 or (Page ID%511232) = 2, contention for allocation page (GAM)
page ID = 3 or ((Page ID -1)%511232= 3, contention for allocation page (SGAM).

Avoid allocation page contention by proper configuration of tempdb table:  create 8 and monitor.  The maximum number of tempdb files than can be used is the number of concurrent processes that need tempdb files which can’t be mroe than one per logical CPU.

Writing to multiple tempdb files involves the size and free space in the files.  All should be the same size and have the same free space.  SQL will not control this – still need to monitor and presize to reduce growth events (autogrowth will occur on the largest file)

per Allan Hirt

it is:  AlwaysOn (one word) Availability Groups (AGs) and AlwaysOn Failureover clustering Instances (FCI).  AlwaysOn (one word) is a marketing term for AGs and FCIs.

and it is:  A two node Microsoft Failover cluster.  One node is always active in the instance.  You can have a two node, two instance cluster, with each node having an active instance running on it.

Or since you can have several SQL instances on one machine, you can have a three instance Windows Failover cluster running on two nodes.  Exactly where the instances are running at any time is a configuration issue, not a description of the environment.

SQLHA.

 

SQLSaturday NYC

i’m surely HOPE i successfully signed up for this (and Providence).  Aug 17 2013 at Microsoft Office, 1290 Avenue of the Americas, 6th Flr, New York, NY 10104.

5:45 from New Haven to GCT arriving at 7:43 AM.  Leaving at 5:30, 6:07 or 6:34 getting back from 7-8pm $20.50 rt.

Start Time
Track 1 – Room: Room A
Track 2 – Room: Room B
Track 3 – Room: Room C
Track 4 – Room: Room D
Track 5 – Room: Room E
Track 6 – Room: Room F
Track 7 – Room: Room G
Track 8 – Room: Room H
08:30 AM
SQLSaturday 235
Kickoff / Keynote
Non-Technical
SQLSaturday 235
Kickoff / Keynote
Non-Technical
           
09:00 AM
Thomas LaRock
Database Design: Size DOES Matter
Intermediate
Jeffrey Garbus
Indexing for performance
Intermediate
David Klee
Database Health and Performance
Intermediate
Michael Corey
Virtualizing SQL 2012: Doing It Right
Intermediate
Ted Krueger
Upgrading to SQL Server 2012 with limited downtime
Beginner
David Dye
Dealing With Errors in SSIS 2012
Advanced
Carl Berglund
How to Create a Self-Managed SQL Server Utility without SQL Server Enterprise Edition
Intermediate
Timothy McAliley
A Walk Around the SQL Server 2012 Audit Feature
Intermediate
10:15 AM
Kevin Kline
SQL Server Internals and Architecture
Intermediate
Ben DeBow
Surviving Your Peak Database Load
Intermediate
Robert Pearl
HealthySQL
Beginner
Stacia Misner
Using Power View & Hadoop to Unlock Hidden Markets
Beginner
Allan Hirt
Patch Management Strategies for SQL Server
Beginner
Paul Rizza
ETL not ELT! Common mistakes and misconceptions about SSIS
Advanced
Virginia Mushkatblat
Data Flow Architectures in Software Development Life-Cycle
Intermediate
Kevin Feit
SQL Server Sudoku Solver using Transact-SQL
Beginner
11:30 AM
SQLSaturday 235
Lunch / Vendor Presentations
Non-Technical
SQLSaturday 235
Lunch / Vendor Presentations
Non-Technical
Melissa Demsak / Kathi Kellenberger
WIT Panel / Lunch
Non-Technical
         
12:30 PM
Steve Simon
Alright,WHY is the server REALLY running so slow!
Intermediate
Alex Grinberg
Advanced T-SQL Development with latest features.
Intermediate
George Walters
SQL Server 2012 AlwaysOn Availability Groups
Beginner
Wayne Sheffield
Minimally Invasive – Tools to Doctor Up Your Code
Intermediate
Edwin Sarmiento
The Transaction Log and How It Affects HA/DR
Advanced
Martin Schoombee
The 10 Commandments of ETL
Beginner
John Morehouse
SQL Server Databaseology 201
Intermediate
John Miner
How to audit and prevent unwanted user actions.
Intermediate
01:45 PM
Andrew Brust
Big Data-BI Fusion: Microsoft HDInsight & MS BI
Intermediate
Joseph D’Antoni
Into the Blue: Extending AlwaysOn Availability Groups
Intermediate
Troy Gallant
Indexing your Data Warehouse
Intermediate
Ashish Sharma
SQL Server 2014 – Staying Ahead of the curve
Intermediate
Karen Lopez
Windows Azure SQL Database Design: Silver Lining and Dark Side
Intermediate
Steve Hughes
Building BI Solutions with Excel 2013
Beginner
John Abrams
How to Build a Centralized Database Monitoring Solution – Never Miss a Backup
Beginner
SB Chatterjee
New version: Get-PowerShell | Get-SQLServer
Beginner
03:00 PM
Kevin Goff
SSAS Tabular 2012 vs SSAS Multidimensional OLAP
Intermediate
Simon Facer
Ask the Microsoft PFE
Lightning Talk
Kathi Kellenberger
Indexes: The Basics
Beginner
Matt Velic
Automating in the Cloud
Intermediate
Thomas Grohser
SQL Server Storage Engine under the hood
Intermediate
Daniel Bowlin
SSIS templates, configurations and variables, oh my
Intermediate
Mike Hillwig
Recovery and Backup for Beginners
Beginner
Don Gabor
Networking and Building Connections for IT Professionals
Non-Technical
04:15 PM
Edward Pollack
Query Optimization Crash Course
Beginner
Kevin Boles
Common TSQL Mistakes
Intermediate
David Bermingham
Deploying Highly Available SQL Servers in the Amazon EC2 Cloud
Intermediate
Melissa Demsak
Data Career Paths and Decision Making Strategies
Non-Technical
Leonard Lobel
Database Development with SQL Server Data Tools
Intermediate
Samuel Vanga
Biml for Fun and Profit with SSIS
Intermediate
Chris Bell
“Indexing” Encrypted Data
Intermediate
Hilary Cotter
Merge Replicaton Internals
Advanced
05:30 PM
SQLSaturday 235
Closing / Prizes
Non-Technical
SQLSaturday 235
Closing / Prizes
Non-Technical
           

Actors = nouns = tables

Ovid on How to fake DB design.

Database based contain information which is stored in tables.  If you tell yourself a story about the database (“we have orders and users and items” or “we have courses and users and (because of our patents) users of various types in courses”) we have a start on the tables we SHOULD use in the database.  This is intended to make at least an attempt to create a normalized database design.

suspect pages

Pages in an SQL database that are torn or suspect are accomulated in a table and can be queried with

SELECT * FROM [msdb].[dbo].[suspect_pages]

Columns are database_id,  file_id,  page_id,  event_type,  error_count, and  last_update_date

Consider Select count(*) …  to get an overview.  

When SQL Server tries to read a page into memory (during a query, following DBCC checkDB operation or a backup) and can’t, it issues a 823 or 824 error.  It probably gets logged (?) but should also be posted to the suspect_pages table in MSDB.  

This is well-documented in http://technet.microsoft.com/en-us/library/ms191301(v=sql.105).aspx and https://www.simple-talk.com/blogs/2013/07/15/keeping-an-eye-on-your-storage/

simpletable and index

If i had a boat, i’d sail it on the ocean.  And if i had a pony, i’d heap it on that boat

If you have a table, and data therein, it is a heap.  If you have a non-clustered index on said table, it is a heap – with an index.  The index is a table, with one column(s) the index on the table, and the second column a pointer to the location of the data in the table.

If you have a clustered index, you no longer have a heap.  You have a table that is maintained in order of the index, which can be unique and which can be the primary key.  The CI (clustered index) is the table and the table, having a clustered index, is the clustered index.

If you have a query and use the Clustered index (or rather, the Query optimizer decides to create a query plan that uses the clustered index), you get the results of the query back as data from the table.  Assuming the query is on the column that the CI was created on, the QA will (certainly) use the CI and the records that satisfy the query will be quickly returned.  This will happen if we have select (*) or select (index column).  If we query with predicates that are NOT in the CI, there might not be any value in using the clustered index, and QA might well decide on a table scan to find the matching data.

Basically the same thing is true for a heap or CI table if the query does not involve the indexed column.  Use the index if it improves performance, but if the index is on Last Name and you are searching for everyone in a town, the CI which will maintain the order of the records, doesn’t help the present query.  Table Scan Time.  Query on last name, and the CI will very efficiently find the records with the last name, and having located them (hopefully in the data cache), return the data requested.

Assume a non-clustered index on last name.  Query on something other than last name and the index is of no value.  Query on last name, and we quickly locate the last name in the index, quickly find those locations in the table where the required records are found, find them and return the data.

What happens with a non-clustered index, a query on last name, and the only data wanted is the last name (or count lastname).  Once the index is consulted and the records identified, we don’t actually need to access the records since we have enough information to satisfy the query.

For this reason, it is useful to create non-clustered indexes on multiple columns – if we have a index that will satisfy common queries, we can get quick results without having to go to the data – the index informaiton is enough.  There is a cost in index maintenance which increases for multiple column indexes.  As well as more space used for storage.

We can also create covering indexes, which have non-indexed fields associated with the non-clustered index.  (Can’t be the case in a Clustered index, since ALL COLUMNS/DATA already are a part of the CI).  So we have a non-clustered index on firstname and lastname (does the order matter?) and maybe middle initial.  We cover the “dob” column in the index.  This means that we can rapidly find records based on names, and if we frequently are interested in the “dob”, this index will be useful for queries that involve details about the names and also names and date of birth.  If we want information on the other columns, the ‘dob’ column in the index will not help, and we need to pull the records after using the index to locate them.