Pivot function

You have rows and rows and wish you had more columns.  You have 10 clients, 5 years, and a list of activity/orders/contacts.  Lots of rows.  You would rather have 5 rows and a count column for each of the clients.  That is what the pivot function does.  You could probably write code to do this, which is the same as reinventing the wheel, but why work so hard.

The following is annotated syntax for PIVOT from Microsoft

SELECT <non-pivoted column>,
[first pivoted column] AS <column name>,
[second pivoted column] AS <column name>,

[last pivoted column] AS <column name>
FROM
(<SELECT query that produces the data>)
AS source_alias
PIVOT
(
<aggregation function>(<column being aggregated>)
FOR
[<column that contains the values that will become column headers>]
IN ( [first pivoted column], [second pivoted column],
… [last pivoted column])
) AS PivotTable_alias
<optional ORDER BY clause>;

Let’s try to break this down.  What do you want in each row of the first column?  In our example, that would be years; years is the <non-pivoted column>.  Next column in the first row would be the appropriate number for client 1, as [client 1], then client 2 as [client 2] and so on.  So all you need are years, the clients and the count aggregate.  If you have other data, at least one recommendation is to create a CTE or derived table to simplify the data and the syntax.  Otherwise, you could be grouping on years, countries, states, et al as the non-pivoted columns.  You would then have 5 (years) X number of countries x number of states as the number of rows.  Client aggregates remain the pivoted columns.  If you don’t want the extra columns and wanted everything rolled up into years, don’t select the extra columns in the select FROM select query.

Next, the FROM query; give this an alias.

FROM (Select year, client_id from activities) as source.

Now the magic – immediately following the source alias, continue with (next line is OK…)

PIVOT (aggregation FOR … IN … ) as PT.  Run that as you get your desired output.

PIVOT (sum(client_id) FOR client_id IN [client_1], [client_2], [client_3], …[client_10]) as PT

Let try that again – diagram your desired output:  years are the rows, the columns are for each client and the number to display is the count of each client – for each year.  The fiddly bit is defining the column headers and putting that back into the IN clause

update table with Join

it’s easy to update an existing sql table. In T-sql:

update table set column1 = 'something'

This updates all column1 values to ‘something’

More typically, you want to update only SOME rows, so you would use a where predicate

update states set ST to 'NY" where ST = 'New York'

or

update sales set discount = discount + .1 where client_type = 'super saver'

What i need to do is to update table 1 based on a matching value in table 2. The syntax goes all to hell.

Update TABLE set TABLE.column1 = TABLE_2.column1 
FROM theFirstTable as TABLE
JOIN theSecondTable as TABLE_2 on TABLE.ID = TABLE_2.ID
where ...

I have this big old table that takes nearly an hour to build. I needed to add three columns to the schema – which actually involved three separate joins on the same second table (three times since i needed to get three different join conditions based on the id). With the additional joins, we got timeout and resource errors. Might not have been the complexity, but having activity supply disorder, i wanted to get it done asap.

I created the three new columns as “,null ColumnName”. After running the creation event, i had my original table with three new columns, each with the proper Name and each filled with nulls. Next, rather than the three joins to the Name table, i ran an update on the original table three times. Update the big table Name1 column with the matching value from the second table. Then update the big Table Name2 column with the matching value from the second table. Rather than dying from resources after ten minutes on several separate runs, i got my big table with null, and each of the three updates took 5 minutes for 5 millions rows.

My actual code, in dynamic SQL, looks like this

set @cmd = 'update pay
set pay.Referring_dr_id = pm.pm_provider_id 
from ' + @dbname + '.dbo.qt_service_billing_payments3 pay
join ' + @dbname + '.dbo.pm_providers pm 
on pm.abbreviation = pay.Referring_Dr_Abbr
where pm.abbreviation = pay.billing_dr_abbr'
-- exec(@cmd)

In hindsight,  you could do it in one pass with case logic.

load testing

David Klee has a couple of articles about load-testing. That is – running software utilities to simulate various kinds of load on SQL Server and the horse it rode in on – the Server. Since we can have SQL in VMWare\HyperV and VMWare Player can run on a linux box, the server as hardware or as the OS is not strickly defined. Whatever.

David Klee has an article on his blog about using SQLIO Batch, HammerDB, DiskSpd and automating DiskSpd with powershell.

Now, as you run these things (and collect the typical output) you can also start monitoring extended events to look at what MS SQL is doing as I/O or stress levels change.

Mike Fal has a blog post on putting “lines in the sand” (my capitalization, not his). withing with HammerDB.

Brent Ozar champions Adam Machanic’s SQLQueryStress application that runs multiple sessions of a random selection of stored procedures