nested replacements

I guess i should check and see if you could use regular expressions here, but we have a column whose values have spaces in them.  We want them out.  In fact, we want them replaced with an underscore.  Easy:

select replace(bad_column_value, ' ','_') into ...

The logic is obvious – take the bad column, get its value, and IF there is a space in the value, replace it with the underscore.  “My Name” becomes “My_Name”.  If there is no problem with the value, no replacement is done and the select continues with the value in the row.

We have another case where the value is like “My Name / My Last Name”.  We can handle the replacement, but what to do if value one has the spaces and value two has the “/”?  One replacement won’t work.  Want to try some smart-ass case statement?

In powershell or c# you can just string replacements one after the other on a variable to first replace A and then replace B.  T-sql doesn’t make it that easy.

What you can do is nest the replacements like this:

select replace(replace(bad_column_value, ' ','_'), '/','') into ...

But wait – what is our expected output?  We want My_Name/My_Last_Name, and this won’t do that.  What we will get is My_Name__My_Last_Name.  What we want to find and replace is ”  /  ” and replace it with “/”.  Right?

But wait – if we run the “replace spaces with _” first, THEN the “find and replace space/space”, we will not find “space/space” since it has become “_/_”  So now we have to replace “_/_” with ???

The issue is that the final clean up, replacing spaces is easy – and ‘destructive’ – we can’t simply reverse the procedure and get back to our starting point.  Our difficult selection, in this case, is finding ‘  /  ‘ and replacing that.  THEN we can find it easy to just replace additional spaces.  So we need to reverse the order of the replacements to do the ‘/’ on the inside (first) like so

select replace(replace(bad_column_value, ' / ','/'),' ','_') into ...

This will work.  It should be obvious that we can continue to string these things deeper than two.  But much more and we ought to write a function to do this.  Of course, that would probably be slower.

One neat approach is to create a temp table or table valued function listing all the bad characters, and use that table to determine which characters to remove:

declare @badStrings table (item varchar(50))
INSERT INTO @badStrings(item)
declare @testString varchar(100), @newString varchar(100)
set @teststring = 'Juliet ro>
set @newString = @testString
SELECT @newString = Replace(@newString, item, '') FROM @badStrings

Don’t know that i would have reassigned @newstring and @teststring, but anyway this would work fine to REPLACE the badstrings with blanks, but not to replace blanks with, for examle, underscores.  That can be done by adding another column to our list of bad strings for the replacement values:

create  TABLE #ReplaceStrings (symb VARCHAR(5),replace_char varchar(5))
INSERT INTO #ReplaceStrings (symb,replace_char) VALUES ('/','|'),(',','|')
SELECT @OMG = Replace(@OMG, symb, replace_char)
FROM   #ReplaceStrings
select @OMG

Next, the author goes on to … write a function.


i have grown in my t-sql stature:  i used cross-apply twice in the last week.

There are a number of relevant articles about apply\cross apply\outer apply.  I particularly like what Rob Farley says about apply: “APPLY, in both its inner (CROSS APPLY) and outer (OUTER APPLY) forms, allows a correlated sub-query or table-valued function to be part of the FROM clause.”  This means that you can have a subquery that returns values keyed to the same value in the outer query.  It is commonly said that you run the apply function for each row of the outer function.

I have used it twice:  one when i had to fix the first value (max) for a selection to add to a table.  Patient A, diagnosis 1, diagnosis 2, diagnosis 3, diagnosis 4 (actually both the idc9 codes and the name or description.  Create the outer table with a select into statement the columns for patient and all the other details.  Include a select column for diagnosis1.code and, and four times.  Obviously, we don’t know what diagnosis1 is, so the query will fail, or rather, work if we comment out the four\eight columns.

For each of the diagnoses, create a outer-apply code block immediately after the joins (and or groups) but before any final where predicates.  In my case, it looks like:

outer apply
(select sd.diagnosis_code_id as code, dc.description as name
FROM Service_Diagnoses sd 
join Diagnosis_codes dc 
on dc.Diagnosis_Code_ID = sd.Diagnosis_Code_ID
where ser.Service_ID = service_id order by service_diagnosis_id
offset 0 rows fetch first 1 rows only
) as diagnosis1

Notice what is happening.  For each diagnosis_code_id, i get the diagnosis code and the name for that service – which is passed in from the main table.  Next – they are ordered and only the first one is returned to the outside table as diagnosis1.code and

Next, the same code for diagnosis2 -except that i return the SECOND value in the list with

offset 1 rows fetch first 1 rows only

and similarly with three and four.

The second case was somewhat similar – vouchers are paid by a carrier (or two), and we wanted to know the number of services on that voucher.  Shouldn’t be an issue, but as i built out the voucher and carrier component, each time i tried to add count(services) into the join, the count of records exploded.  And the counts were all 1 instead of the counts for each voucher, which could be numerous.  I truly think that i was doing something wrong with the group-bys that i had to do to put the count function into the big table, but i never got it to work.

What DID work was to write the table as before, include a column in the initial section that was “a.acount”.  The cross apply select was created “as a”  and i selected the count of services for each voucher as acount.  It didn’t work at first – i kept getting 5 million instead of 12 until i realized that in the cross apply, i joined vouchers to services as a JOIN and not just on the join conditions.  This is

cross apply 
(select count(ser.service_id) as acount 
from ser
join  pm.vouchers vou on 
vou.voucher_id = ser.voucher_id) as a

when it SHOULD BE

cross apply 
(select count(ser.service_id) as acount 
from ser
where vou.voucher_id = ser.voucher_id) as a

Voucher (vou) isn’t even defined in the cross-apply – it gets passed in and a.acount gets passed out when the process is run.