can’t find the database if you are standing in it

I must have spend about 2 hours on this one.  I had a stored procedure that i was restructuring.  I was changing from one database to another, but both on the same server.  I got it to work in Data_1, but not Data_2.  I THOUGHT it was the same code, but on Data_2  i kept getting an error that the sys.servers could not find Data_2.  I could rebuild the process from the beginning, but as the logic got more complex, i would lost Data_2.  Since i could make it work, it was not the problem, except that i wanted to find the cause.  I tried to compare the two stored procedures in notepad++ with ‘move to other view’ on the second file tab, but i was not finding the problems.  And with usp, you need to both EXECUTE the usp to save it, and then exec it to run it.  Eventually, it would stop working and i was never confident of when it failed.

OK, so the answer is in the error.  I can reference [myTable] or [dbo].[myTable] or [databaseName].[dbo].[myTable] and they all should work.  What i can not do is this:

[myserver].[databaseName].[dbo].[myTable]

for a database on the same server, however i can do this:

[linkedServer].[databaseName].[dbo].[myTable]

Can you see where i am going?  The logic was complex enough that i had resorted to some limited copy\paste at times, and i had accidentally moved code like this

[databaseName].[dbo].[myTable].[myField]

which looks like a four-part name to the parser, which complained it could not find [databaseName] — as a SERVER.

I hope it doesn’t take me two hours to remember this next time.

replace and convert

There are a couple of points in the stuff below.  One is the value of having sensible names.  Perfectly nice people put spaces into file names or urls.  Or mix numbers and letters (like putting a ‘z’ in from of something to indicate not to use that number.)  Just because they can.  There then is need to deal with such-like.  Or make EVERYTHING varchar(max) And a quick but dirty(?) way to create a numbers table.  This one i can even remember.

From Simon Sabin – he wants to parse data into into urls, but the data contains spaces and other naughty bits – which means URL encoding (space becomes %20, et al).  What he does is select the data (which are session titles) and  replace spaces with an underscore character:  “big cat” -> ‘big_cat’.

select cast(cast((
select case when substring(title,n,1) like '[a-z0-9/-]' 
then substring(title,n,1) else '_' end
 from num
 where n <= len(title)
 for xml path('')) as xml) as varchar(max))

So get the substring of one character, check to see if it is appropriate (is this all lowercase?) , if so select it, if not replace with ‘_’.  This assumes a numbers table/tally table of your choice.

From  Derek Dieter,  he has some strings that might contain non-numeric things (123A456) and he was to get rid of the ‘A’ and convert 123456 into an integer.  Similar to the above, he does this

SELECT  CAST ( ( SELECT CASE 
 WHEN SUBSTRING(field,Num,1) LIKE '[0-9]' 
 THEN SUBSTRING(field,Num,1) 
 ELSE '' END
 FROM Numbers
 WHERE Num <= LEN(field)
 FOR XML PATH('')  ) AS INT  )
FROM inputTable
OPTION(MAXRECURSION 32767)

It depends on how long the field is if the maxrecursion is needed – the default is 100 and this applies since we are using a CTE.

The rest of the code is generating the numbers table.  He does that like this

DECLARE @MaxNumber INT = 5000
;WITH Numbers AS
(
    SELECT 1 AS Num 
    UNION ALL
    SELECT Num+1 
    FROM Numbers 
    WHERE Num <= @MaxNumber
)

Select 1, union all and do it again with select Num + 1.  Repeat until cooked.  Actually, repeat as many of 32767 numbers.  As above for the default recursion of 100, this will not work with @maxNumber = 500.  It will work with 100, or you can set the OPTION(MAXRECURSION 5000) .  Or leave it at 32767 and don’t worry until it gives you an error

The statement terminated. The maximum recursion 32767 has been exhausted before statement completion.

identity column

I am instructed to add an identity column to rows of data in a table.  A simple 1, 2, 3 integer would be fine, and of course, when information is inserted, you don’t specify the identity and it will bee automatically added to the table.  Something like

create table myTable (
 id int identity(1,1)  not null,
 , myData varchar(10)
 )
Insert into myTable (myData) "abc"

this results in 1, “abc” in the table

But,  you say “i want to create the table from a select into, NOT an insert?  Now what?

I got this from  on sqlserverplanet.com

SELECT id = IDENTITY(INT,1,1)
 ,FirstName = pc.FirstName
 ,LastName = pc.LastName
INTO #person_contact  FROM Person.Contact pc

You are getting the first and last name into the (in this case) temp table and adding an identify column on the fly.  Notice that subsequent inserts of first and last name will automatically add the id entry, which is of type int, started with 1 and increments by 1 on each row and for each insert. You can also use row_number() over…  like this

SELECT   id  = ROW_NUMBER() OVER (ORDER BY pc.LastName)
    ,FirstName  = pc.FirstName
    ,LastName   = pc.LastName
INTO #person_contact2 FROM Person.Contact pc

66535, but who’s counting?

If you want to look at the description of a view (and that is the field name where in the selection code is stored) you can select the view, open DESIGN and see the code that selects the columns, with all the alias and the joins and froms, oh my.  Maybe

We have some views where that just doesn’t work the way you want.  You can see it, but when I try to select all\copy\paste into a text editor, and all the stuff doesn’t come over.  OK, so i put the cursor in from of the first word, and use the down arrow to select the text, REBAR style.  At some point, the “copy” select goes off and you can’t select the text.  A couple of times, i have been able to copy half and then select the rest, but there are times when that doesn’t work either.

So – how about this:

EXEC sp_helptext 'dbo.myView';

Yup, that works, but in my case, i get 1049 rows of text of varying lenghts.  REALLY ugly.

How about this:

USE [Ntier_MIPU]
GO
SELECT definition,*
FROM sys.sql_modules
WHERE object_id = OBJECT_ID(‘PM.vwGenPatInfo’);

you get the object_id, the definition field, which contains selecta view myView as select … but in my case, the definition is truncated.  There is a character limit for how long the field can be in grid-view.  I had no better (in fact, worse) luck with text view and exactly the same data if i output it to file.

In  Management Studio – Tools, Options, Query Results, SQL Server, Results To Grid, Maximum Characters Retrieved – Non XML Data, there is a setting for the Maximum Characters Retrieved, which can not be set to more than 64K.  You can of course select more data to a table, but the display is limited to the aforementioned 66535 characters.

And for XML RAW was similarly ugly and not terribly readable.  Maybe i could have opened it in MS Internet Explorer or such like with a style sheet that would not show the elements and attributes but that was WAY too much work, given the ease of my workaround.

If you want to build, or duplicate, a big table — i mean a table with a complex schema — you can right-click on the table in SSMS, Script Table as, create to … whatever (file, agent, query window).  Yes, query window.  So find the view, script to create in the query window, trim off the create view front stuff and the closing quote, and you have it.

column names

We have data that comes in from, possibly, multiple data sources.  This data is processed and chopped into our derived tables, from which we are going to work.  The process to combine and manage the data is done by several different user stored procedures  – which are maintained by multiple people.  In order to automate what needs to be done with new data refreshes, there is a single table that holds information about what tables are officially maintained.  On refresh, there is another usp that runs against this table, and uses the stored names to run the necessary usp to create each table.

Thing can change and versions of the usps or even the structure of the table can be modified and not announced.  Of course, this problem is eased by making the central program only necessary to coordinate the sequential running of each of the usps that actually create the tables, but still there can be confusion.

Our approach is to also make each usp (and the author) responsible for undating a central table of metadata with each of the names of the table, the names of the column, and the client database from which it was derived.  This CAN be done manually.  Sure.  If we are not certain where the column comes from, we can get it from the usp code, and put it into the table.  We can also update the metadataTable each time the usp runs to make a new derived table.  Code like this will work to first, clean up the table to remove old entries, then to insert the new values:

 delete from [dbo].[metadataTables] where tableName='table1' 
insert into [dbo].[metadataTables] (tableName, fieldName, 
systemName) values
 ( 'table1','EMR_Provider_id','S1')
,( 'table1','suffix','S1')
,( 'table1','first_name','S1')
,( 'table1','middle_name','S1')
,( 'table1','last_name','S1')
,( 'table1','pm_provider_id','S2')

Of course, this can be made better in cases (well, all cases, but most usefully) when most of the columns come from one source.  Then we can put them all in, and change the ones not correct:

delete from metadataTables where tableName = 'table1'

insert into metadataTables
select 'table1', column_name, 'S1' 
from information_schema.columns 
where table_name = 'table1' order by ordinal_position

update PDS_metadataTables set systemname = 'S2' 
where tableName = 'table1' and fieldName in ('pm_provider_id')

Delete old entries, get all the column names for table1, put them into metadataTables as from SystemName S1, then from your own knowledge, update just ‘pm_provider_id’ to the correct systemName.  Repeat Step three until everything is lined up.

Working with XML

XML data is a universal standard for storing human readable, self-documenting data.  It is useful for configuration files, data interchange files and non-relational data in database servers.  Non-relational in that a XML document might have different sized clusters of data entries, but within a cluster, each entity (think row here) doesn’t have to have the same attributes or values.  The document could be about a business, with part of it referring to the physical plant, a second part to the employees, only one of whom we know their favorite color.  The XML structure can support that.

XML data is text and can be stored in a varchar or nvarchar field.  This might be sufficient if all we want to do is retrieve the data, but MS SQL has a XML Datatype that allows additional functions – there is a built-in XML editor to show the structure of each XML item, and there are other methods that allow querying, updating and inserting XML data.

One approach to working with XML data is to ‘flatten’ it into normal rows and columns. this is straight-forward when you are dealing with non-nested data: groups of users, each with one first name, one second name, one collection of single elements. It becomes more difficult with you have a skills element with zero to many skill elements nested in that element. This is putting you into the situation of skill column one with an entry, and for the second user, two columns for skill one and for skill two.

CROSS APPLY

By using cross-apply, we can get around this. A great article on blogs.msdn.com by Simon Ince documents this well

First the setup:

Save this XML file as as c:\users\rbeck\desktop\new.xml

<?xml version="1.0" ?>
<Root>
 <Person>
 <Name>Simon</Name>
 <Age>20</Age>
 <Skills>
 <Skill>Cooking</Skill>
 <Skill>Cleaning</Skill>
 </Skills>
 </Person>
 <Person>
 <Name>Peter</Name>
 <Age>21</Age>
 <Skills>
 <Skill>Ironing</Skill>
 </Skills>
 </Person>
</Root>

Create a table and import the data

Use MIU
CREATE TABLE XmlSourceTable
(
 RecordId INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
 XmlData XML NOT NULL
)

notice the XML data type.

Import the data and display it

INSERT INTO XmlSourceTable(XmlData)
SELECT *
FROM OPENROWSET(
 BULK 'C:\users\rbeck\desktop\new.xml', SINGLE_BLOB)
AS ImportSource
GO
SELECT * FROM XmlSourceTable

RecordId XmlData
———– ——-
1 <Root><Person><Name>Simon</Name><Age>20</Age… (snip)
(1 row(s) affected)

One row for each file/xml document. If we import this twice, we get the same XML data, and RecordIDs of 1 and 2. The rest of the output below is just duplicated. If the second import was different data (but the same schema) that would just look like more (and not duplicated) data.

Query the data

SELECT
 pref.value('(Name/text())[1]', 'varchar(50)') as PersonName,
 pref.value('(Age/text())[1]', 'int') as PersonAge,
 pref.query('Skills') as PersonSkills
FROM 
 XmlSourceTable 
 CROSS APPLY
 XmlData.nodes('/Root/Person') AS People(pref)

PersonName       PersonAge         PersonSkills
———- ——— ———————-
Simon                          20                <Skills><Skill>Cooking</Skill><Skill>Clean… (snip)
Peter                           21                <Skills><Skill>Ironing</Ski… (snip)

Notice that Simon has more skills than Peter. In this case, try this:

SELECT
 pref.value('(Name/text())[1]', 'varchar(50)') as PersonName,
 sref.value('(text())[1]', 'varchar(50)') as PersonSkill
FROM 
 XmlSourceTable 
 CROSS APPLY
 XmlData.nodes('//Person') AS People(pref) 
 CROSS APPLY
 pref.nodes('Skills/Skill') AS Skill(sref)

PersonName PersonAge PersonSkill
Simon            20                Cooking
Simon            20                Cleaning
Peter             21                 Ironing

The magic is done by CROSS APPLY and by the XMLData.nodes function.

From BOL: “The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query.” I am going to simplify that to “the nodes table-valued function is applied to each pref.value and the one pref.query – the output is concatentated left to right.

When we use xxx.query, we get back XML, even if it is not enclosed in an element. If we use xxx.value, we get back the value – we need to specify the type of data (int, varchar()) and the explict path:

“Although there is only one PersonName or PersonAge attribute in the XML instance, the static typing rules require you to explicitly specify that the path expression returns a singleton. Therefore, the additional [1] is specified at the end of the path expression” (emphasis from rcb)

If we apply the value to the skills element and a path of [1] and a varchar datatype, we get back “CookingCleaning” for Simon

pref.value('Skills[1]', 'varchar(50)') as PersonSkills

The nodes method returns an xml fragment that matches the path supplied. It puts the result into a table with the supplied name and into the column in the parenthesis:

XmlData.nodes('/Root/Person') AS People(pref)

return xml that is found below Root in the or all Person elements

another example of using cross-apply is from The Cheater’s Guide to Mastering XQuery on SQL Server – Part 1.  Unfortunately, there is no Part 2.

His data looks like this

<?xml version="1.0" encoding="UTF-8"?>
<gesmes:Envelope xmlns:gesmes="http://www.gesmes.org/xml/2002-08-01" xmlns="http://www.ecb.int/vocabulary/2002-08-01/eurofxref">
 <gesmes:subject>Reference rates</gesmes:subject>
 <gesmes:Sender>
 <gesmes:name>European Central Bank</gesmes:name>
 </gesmes:Sender>
 <Cube>
 <Cube time="2011-01-14">
 <Cube currency="USD" rate="1.3349" />
 <Cube currency="JPY" rate="110.71" />
 <Cube currency="GBP" rate="0.8422" />
 </Cube>
 <Cube time="2011-01-13">
 <Cube currency="USD" rate="1.3199" />
 <Cube currency="JPY" rate="109.5" />
 <Cube currency="GBP" rate="0.83575" />
 </Cube>
 <Cube time="2011-01-12">
 <Cube currency="USD" rate="1.2973" />
 <Cube currency="JPY" rate="108.17" />
 <Cube currency="GBP" rate="0.83155" />
 </Cube>
 </Cube>
</gesmes:Envelope>

Notice the namespace – there is a problem coming up.

His first select statement looks like this

SELECT
T.rows.value('local-name(.)','nvarchar(100)') as [TagName]
FROM
 ReferenceRates
CROSS APPLY
 xmldata.nodes('//*') T(rows)

This gets  the top level tags from the references table we created and populated with the data, select the value of the XML cross apply the nodes method to the root of the document.

Notice also the irregular tag name cube, which is used at several levels. The cross apply nodes method can be started at some other place than the root (//). For example, we can push it down to the second level of cube with

CROSS APPLY xmldata.nodes('/gesmes:Envelope/Cube//*') T(rows)

However, in his example he gets an error – by starting the search at cube\cube, we lose the namespace reference which is in the envelope element. He gets around that with the WITH XMLNAMESPACE prefix. (You can also specify the namespace in the query or nodes methods of XQuery).

WITH XMLNAMESPACES(
'http://www.gesmes.org/xml/2002-08-01' as gesmes,
DEFAULT 'http://www.ecb.int/vocabulary/2002-08-01/eurofxref'
)

He also has an example of cross apply … cross apply that looks like a pivot table – it puts each cube entry cube values (the different times, currencies and rates) as columns.

 
id TagName Time      Currency Rate     xmlfragment
A neat grid of all currency rates. by date.
1  Cube    2011-01-14 USD     1.3349  <p1:Cube xmlns:p1=”h…
1  Cube    2011-01-14 JPY     110.71  <p1:Cube xmlns:p1=”h…
1  Cube    2011-01-14 GBP     0.8422  <p1:Cube xmlns:p1=”h…
1  Cube    2011-01-13 USD     1.3199  <p1:Cube xmlns:p1=”h…
1  Cube    2011-01-13 JPY     109.5   <p1:Cube xmlns:p1=”h…
1  Cube    2011-01-13 GBP     0.83575 <p1:Cube xmlns:p1=”h…
1  Cube    2011-01-12 USD     1.2973  <p1:Cube xmlns:p1=”h…
1  Cube    2011-01-12 JPY     108.17  <p1:Cube xmlns:p1=”h…
1  Cube    2011-01-12 GBP    0.83155  <p1:Cube xmlns:p1=”h…

Here is the XQUERY standard reference from Microsoft

He also has a link to an IBM developerworks article talking about XML namespaces ().

OPENXML

Another approach is to use OPENXML – which is a rowset provider that makes XML stored in SSMS xml datatypes viewable as if it were relational table or a relational view of the xml data.

sp_xml_preparedocument puts the XML into memory and returns a handle to the location that OPENXML will then use. sp_xml_removedocument will then trash the inmemory entity.

http://thinkingcog.com/2013/10/default.aspx has an example

DECLARE @XML xml = 
'<Students>
 <Student id="1">
 <FName>Parakh</FName>
 <LName>Singhal</LName>
 <Age>30</Age>
 <Courses>
 <Course id="1">Fundamentals of Databases</Course>
 <Course id="10">Fundamentals of Networking</Course>
 <Course id="15">Fundamentals of Security</Course>
 </Courses>
 </Student>
 <Student id="2">
 <FName>Glen</FName>
 <LName>Bennet</LName>
 <Age>31</Age>
 <Courses>
 <Course id="12">Fundamentals of Data Warehousing</Course>
 <Course id="15">Fundamentals of Security</Course>
 </Courses>
 </Student> 
</Students>';
declare @pointer int
EXEC sp_XML_preparedocument @pointer OUTPUT, @XML;
SELECT
'columns'
FROM OPENXML(@pointer,'/Students/Student',2)
WITH
(StudentID int '@id', 
StudentFirstName varchar(50) 'FName', 
StudentLastName varchar(50) 'LName',
StudentAge int 'Age',
EnrolledCourse1 varchar(50) '(Courses/Course)[1]',
...
EnrolledCourseN varchar(50) '(Courses/Course)[N]'
);
EXEC sp_xml_removedocument @docpointer;

In case you are wondering, if student 100 does not have N courses, a null is returned for the missing courses. In the example, @id is an attribute of the Student element. There is also a course_id attribute, but this code does not get it.

I have modified the code to support the course id attribute. First put the item(s) into the select part of the document.

SELECT
StudentID,
StudentFirstName,
..
EnrolledCourse1,
CourseID1, <-- added
EnrolledCourse2,
CourseID2, <-- added
EnrolledCourse3,
CourseID3 <-- added

Notice that the with FROM OPENXML(@pointer,’/Students/Student’,2), you have provided the starting point to query the student element. Courses are nested inside of student, so you need to provide the augmented path to the course element in order to query the course title, which is the entire contents of the course element. Since we can have 1 to N entries (in our example 3), we need the exact path to the expected element [1],[2] or [3].

EnrolledCourse1 varchar(50) '(Courses/Course)[1]'

When we go after the attribute for the first course, we query for @id but still need the path

EnrolledCourse1 varchar(50) '(Courses/Course)[1]',
CourseID1 int '(Courses/Course[1])/@id',

and variously for all three course elements. The second student has only two courses so the ID and the Name are reutnred as null

nodes method

You can also use the nodes method rather than OPENXML. You don’t need anything like EXEC sp_xml_preparedocument and EXEC sp_xml_removedocument since the engine extracts the data from the XML directly with XQuery

select 
columns
FROM @XML.nodes('/Students/Student')
as StudentTable(Student)

We get the structure as a rowset, starting with the students/student path, store it in StudentTable as the column student. From the rowset, we select the individual values (in the very first example we also selected XML with query) using code like this (i edited the original paper to include the course_id attribute

SELECT
Student.value('@id','int') as StudentID,
Student.value('(FName/text())[1]','varchar(50)') as StudentFirstName,
Student.value('(LName/text())[1]','varchar(50)') as StudentLastName,
Student.value('(Age/text())[1]','int') as StudentAge,
Student.value('(Courses/Course/text())[1]','varchar(50)') as EnrolledCourse1,
Student.value('(Courses/Course/@id)[1]','int') as EnrolledCourse1ID, <-- added
Student.value('(Courses/Course/text())[2]','varchar(50)') as EnrolledCourse2,
Student.value('(Courses/Course/@id)[2]','int') as EnrolledCourse1ID, <-- added
Student.value('(Courses/Course/text())[3]','varchar(50)') as EnrolledCourse3
FROM @XML.nodes('/Students/Student')
as StudentTable(Student)

This blog has a stored procedure to examine the schema of an XML table

AD HOC XML File Queries

An interesting idea is suggested by Seth Delconte in SimpleTalk.  Store the XML in the file system (c:\myXML.xml).  Use OPENROWSET to read the xml into a CTE with an XML datatyle.  Or perhaps, create a function to return an XML variable that can be up to 2GB in content.

Now that we have XML data or at least an XML fragment, work with XQuery FLOR, XQuery nodes or XQuery value methods.

XML INDEXES

Still another approach is to leave the XML along and put a primary and perhaps secondard index on the XML column.  This makes XQUERY value, nodes or other methods run twice or more as fast.  Information on “Getting Started With XML Indexes” available and more on Selective XML indexes here.

FOR XML RAW

if we want to get XML format  from SSMS, we can use the for XML syntax as explained in a SimpleTalk article. Options are “for XML raw”, “for XML auto”, “for XML explicit” or “for XML Path”.

Just as a side note, from XML is an excellent way to document the format of a table if the rows and columns are running off the screen – you can return the header as XML and the first line of results also as XML and use this to describe the table.

For XML raw returns each row as an element, each element as an value expressed as an attribute of the column name. Using the simpletalk example

select * from stores for xml raw

<row StoreID=”292″>
<Survey_untyped>
<StoreSurvey>
<AnnualSales>800000</AnnualSales>
<AnnualRevenue>80000</AnnualRevenue>
<BankName>United Security</BankName>
<BusinessType>BM</BusinessType>
<YearOpened>1996</YearOpened>
<Specialty>Mountain</Specialty>
<SquareFeet>21000</SquareFeet>
<Brands>2</Brands>
<Internet>ISDN</Internet>
<NumberEmployees>13</NumberEmployees>
<Products Type=”Bikes”>
<Product>Mountain</Product>
<Product>Road</Product>
<Product>Racing</Product>
</Products>
<Products Type=”Clothes”>
<Product>Jerseys</Product>
<Product>Jackets</Product>
<Product>Shorts</Product>
</Products>
</StoreSurvey>
</Survey_untyped>

SELECT * FROM Stores for XML raw ('root')

just names the root element, in this case the case of column (survey_untyped) as ‘root’. You probably want to go the other way – make root into something more useful. If you can comparing versions, you can ‘for XML raw (‘v1′)’ et al.

And this is returning the attributes as attributes. You cAn force them into elements, for possible ease of querying, by specifying elements to the XML ROW method

 SELECT * FROM Stores for XML raw ('root'), ELEMENTS;

Further, you can wrap the entire output inside an XML element, which makes it valid XML:

SELECT * FROM Stores for XML raw ('survey'), ROOT('VERSION1')

One use for this is to take row column data, which you can query by column name or alias and convert it into valid XML.

FOR XML PATH

For XML path returns differently formatted XML – you get to indicate what the path is and name the root.  If you finished the select with

FOR XML PATH('customer'), ROOT('customers')

you would get a root of customers and a top element of customer.  The individual elements would be named as you select them with aliases or with the field name.  Or you can put in nothing at all.  There is a clever way to use XML Path to string together common separated values in a query to get a field with several values.

SELECT CAT.Name AS [Category],
 STUFF(( SELECT ',' + SUB.Name AS [text()]
 FROM Subcategory SUB
 where SUB.ID = CAT.ID
 FOR XML PATH('') 
 ), 1, 1, '' )
 AS [Sub Categories]
FROM Category CAT

this will give you cat1               subcata, subcatb, subcatc
cat2               sub2a, sub2b, sub2c

so that you no longer have to tell people that ‘it’s too difficult’ if this is the format they want.

Examples

A nice brief tutorial on XML is from Joel Lipman

He postulates an xml datatype column “Event_XML’ in a table “XML_Events” .  The root is Staff

Either of these queries will return a column that will contain ‘male’ for each staff entry. – not the XML data

SELECT 
 CAST(CAST(Event_XML AS XML).query('data(/STAFF/GENDER)') 
 AS VARCHAR(10)) AS Gender 
FROM [dbo].[XML_EVENTS]
SELECT 
   CAST(Event_XML AS XML).value('data(/STAFF/GENDER)[1]',
   'VARCHAR(10)') AS Gender 
FROM [dbo].[XML_EVENTS]

this will return the attribute financial_year for the current_employee element in staff

SELECT CAST(
   CAST(Event_XML AS XML)
   .query('data(/STAFF/CURRENT_EMPLOYEE/@financialyear)') 
   AS INT(4)) AS FinancialYear 
FROM [dbo].[XML_EVENTS]

if you look at the query, understand this is a normal t-sql query that also has support for XML via query or value (or node) with XPAth. So it is not surprising we can use a where clause

SELECT * FROM [dbo].[XML_EVENTS] WHERE 
    CAST(Event_XML AS XML)
   .value('(/STAFF/GENDER="MALE")[1]','VARCHAR(10)')='TRUE' 
AS Gender

Note that this will return all the columns, including an XML datatype which you can access with the XML editor

If we want to return the XML, use nodes as below

-- to get all below nodes from root node - STAFF 
 SELECT Event_XML.query('/STAFF') 
 FROM [dbo].[XML_EVENTS]
 -- to get all element nodes named 'EMPLOYEE_NUMBER' 
 SELECT Event_XML.query('//EMPLOYEE_NUMBER') 
 FROM [dbo].[XML_EVENTS]
 -- to get employee node with ID = 1 
 SELECT Event_XML.query('//EMPLOYEE_NUMBER[@ID = 1]') 
 FROM [dbo].[XML_EVENTS]
 -- to get employee node at position = 1 from root node 
 SELECT Event_XML.query('/STAFF/EMPLOYEE_NUMBER[position()=1]') 
 FROM [dbo].[XML_EVENTS]
 -- to get employee node at last position from root node 
 SELECT Event_XML.query('/STAFF/EMPLOYEE_NUMBER[last()]') 
 FROM [dbo].[XML_EVENTS]
 -- to return unknown nodes from root node - STAFF 
 SELECT Event_XML.query('/STAFF/*') 
 FROM [dbo].[XML_EVENTS]
 -- to return unknown nodes from XML column - Event_XML 
 SELECT Event_XML.query('//* ') 
 FROM [dbo].[XML_EVENTS]

if we want to look for strings in the elements

-- exist() returns 1 or 0 if a node name exists 
-- containing "NAME" or not respectively 
 SELECT Event_XML.exist('(/*/*[1][contains(local-name(.),"NAME")])')
 FROM [dbo].[XML_EVENTS]
-- query() returns XML branch if the node name 
-- contains string "NAME" -- returns <SURNAME>LIPMAN</SURNAME> 
 SELECT Event_XML.query('(/*/*[1][contains(local-name(.),"NAME")])')
 FROM [dbo].[XML_EVENTS]
-- query() returns XML branch if the node value contains
-- string "MAN" -- returns <SURNAME>LIPMAN</SURNAME> 
 SELECT Event_XML.query('(/*/*[1][contains(.,"MAN")])') 
 FROM [dbo].[XML_EVENTS]
-- value() returns value of node if the node name 
-- contains string "NAME" -- returns LIPMAN 
SELECT Event_XML
.value('(/*/*[1][contains(local-name(.),"NAME")])[1]','NVARCHAR(50))
FROM [dbo].[XML_EVENTS]
 -- value() returns value of node if the node value 
-- contains string "MAN" -- returns LIPMAN 
SELECT Event_XML
.value('(/*/*[1][contains(.,"MAN")])[1]','NVARCHAR(50)') 
 FROM [dbo].[XML_EVENTS]

how old is that usp in Windows?

We have a bunch of usp (user stored procedures), some of which are perfectly fine, and others are no longer used.  If y0u want to see when it was created, when it was last modified, and how long SINCE it was last modified, here is some code.

SELECT name
, create_date
, modify_date
, datediff(d, modify_date, getdate()) as [since_last_modification]
FROM sys.objects
WHERE type = 'P'
order by [since_last_modification] desc

row_number, rank, dense rank and ntile

ROW_NUMBER

A t-sql function, row_number() will put a counter on the output of a query. You supply the table via a select statement, include as a column row_number over (order by id) as Row to the select, and you get a column named ROW and a sequence of numbers for each row. Why do you need the Order by (id) part of the function? Do you want (i don’t care, you can’t have it) the order of output different every time> TH efunction needs to have a definite row to number and you just pick one. ID is safe, how about [Last Name] — and you have ties (you have several last names the same) If you did the select, you WOULD get output – and if you are using ROW_NUMBER over (last name), you get an increasing number for each row.

What if you don’t want to see all the rows, just 50 through 60? This is why you are using row_number to get the row count. You can insert a where clause and the predicate of Where row between 50 and 60

select row_number over (order by last_name) as row
, first_name
, last_name
, dob
, gender from table where row is between 50 and 60

So you have thousands in the list, and you just want old women, like with a dob of ‘1947’? You can get them with the where clause, and just row_number them, but say you are a little more lenient with your age category and don’t want to mess with where dob = ‘1947’ or dob = ‘1948’ or …

You can still select females from the where, and row-count has a partition in addition to order by parameter.

select row_number over (partition by dob order by last_name) as row, first_name, last_name, dob, gender from table order by dob desc

You will get back females with each dob age group having a row_number in it, starting over from 1 as the dob changes. This is an awfully lot like a rank except it is based on last_name within the partition of the dob.

RANK

This works very similarly to row_number with the same over(partition order) arguments, except that the rank of the last name Anderson will be the same for all five andersons (all rank 1) and THEN Baker starts at 6 because it is sixth after the top 5.

DENSE RANK

If you think that unfair and that Baker should be 2, then use DENSE RANK. There are still 5 #1, but then Baker is #2 and Carlson is #3.

NTILE

breaks the ranking up into a number of chunks. If you evoke NTILE(5), you have five groups of 4 out of a row_count of 20. If you had 21, you would have one group of 5 and 4 groups of 4 – rows 1-5 in group 1. 22 would create 5 in 1, 5 in 2 and 4 in groups 3 through 5.

SUMMARY:  Row_number orders the table and assigns a line number to each row – this counts the output.  If you include a partition by … argument, you get each partition counted and the count restarts in a new partition.  The sums of the MAX row_number over all partitions should be the count of the table.

If you want a rank, use rank() over (partition, order) .  Things in the same partition are ranked the same.  The next partition starts with the row_number as the rank in RANK, or the next number as DENSE RANK.  Dense Rank has ranks that are consequent and monotonic.  Rank can have non-constitutive number – where ever two or more are ranked together, the next rank will be missing numbers.

Oh, this should be perfectly obvious, but since i just did it, i will mention that if you partition by a field that does not repeat, every ROW NUMBER will be one.  I mean “1”.

sp_rename

Our client wants to see the type of test being examined in output.  That is, instead of a column heading of “Results”, they want to see “Results for LAB TEST.”  Could have been worse – they might have wanted “Results for LAB TEST in UNITS.”  But we already return a column labeled “units” i guess this is not a real issue.

So the lab data is queried by test name, and it comes back without that name being in the result set.  I gcuess we could add that column – which would then be the same for all the 100 or more rows, but instead i create a table, with a column ‘results’, populate the table with data, then rename the results column to include the lab name.  Looks like this:

declare @name varchar(40) = 'result for ''' + @lab + ''''

Execute tempdb..sp_rename 
'[tempdb].[dbo].[##intermediateResults].[result]', 
@name2,'column';

Declare a variable for the new name and construct it from the string and the name of the lab.  Pass it to an instance of tempdb..sp_rename with the FQ table name with column, the name of the new column name, and the hint that we want to work on ‘columns’.

count rows in each table and compare

in an article in MSSQLTips Dattatrey Sindol wrote an article entitled “SQL Server Row Count for all Tables in a Database”  with several ways to get the list of tables. One used sys.partitions catalog, another sys.dm_partition_stats DMV, the third sp_MSForeachtable sp and the last a coalesce function

I used the first one (the first that i came to). It does not reference the database name – you have to select the database when you open the query window or put a “using dbName” statement in it. This means you can’t put that code into a USP.

We are going to get periodic updates of a database. If we store a copy of the older database, we can used somethng like this to count and list the tables in the older and newer databases, along with list of their row counts. This will tell us if a table is dropped in the new database, or has been added, and if the row counts are reasonable – typically increasing, if a reference table, it might remain the same, and if there is an error, the counts might fall. Or if the tabale is a log or audit table which was cleared during maintenance before the time you run the code.

Here is Datta’s code

SELECT
 QUOTENAME(SCHEMA_NAME(sOBJ.schema_id)) + '.' + QUOTENAME(sOBJ.name) AS [TableName] , SUM(sPTN.Rows) AS [RowCount]
FROM  sys.objects AS sOBJ
 INNER JOIN sys.partitions AS sPTN
 ON sOBJ.object_id = sPTN.object_id
WHERE sOBJ.type = 'U'
 AND sOBJ.is_ms_shipped = 0x0
 AND index_id < 2 -- 0:Heap, 1:Clustered
GROUP BY  sOBJ.schema_id , sOBJ.name
ORDER BY [TableName]
GO

What i do is to put the results of each database into a table, union the 2 tables then order by table name. I put both table name into a coalesce function, both row counts and the difference into the data from newTable while left joining to old table, and union that to newtable RIGHT join to the old table. This gets me both new tables (in newTable, not in oldTable) and depreciated tables (in old not in new). This is seen as both a null in oldName or newName and a null in the rowcount.

select coalesce(newTable.tableName, oldTable.tableName) as 
TableName
, newTable.[rowcount] as newCount
, oldTable.[rowcount] as oldCount
, (newTable.[rowcount])-(oldTable.[rowcount]) as difference
from [Ntier_MIPU].dbo.newTable 
left join [Ntier_MIPU_2014].dbo.oldTable 
on newTable.tableName = oldTable.TableName
UNION
select coalesce(newTable.tableName, oldTable.tableName) as 
TableName
, newTable.[rowcount] as newCount
, oldTable.[rowcount] as oldCount
, (newTable.[rowcount])-(oldTable.[rowcount]) as difference
from [Ntier_MIPU].dbo.newTable 
right join [Ntier_MIPU_2014].dbo.oldTable 
on oldTable.TableName= newTable.tableName
    where oldtable.TableName != '[dbo].[oldTable]' 
and where newtable.TableName != '[dbo].[newTable]'
order by tableName