nested replacements

I guess i should check and see if you could use regular expressions here, but we have a column whose values have spaces in them.  We want them out.  In fact, we want them replaced with an underscore.  Easy:

select replace(bad_column_value, ' ','_') into ...

The logic is obvious – take the bad column, get its value, and IF there is a space in the value, replace it with the underscore.  “My Name” becomes “My_Name”.  If there is no problem with the value, no replacement is done and the select continues with the value in the row.

We have another case where the value is like “My Name / My Last Name”.  We can handle the replacement, but what to do if value one has the spaces and value two has the “/”?  One replacement won’t work.  Want to try some smart-ass case statement?

In powershell or c# you can just string replacements one after the other on a variable to first replace A and then replace B.  T-sql doesn’t make it that easy.

What you can do is nest the replacements like this:

select replace(replace(bad_column_value, ' ','_'), '/','') into ...

But wait – what is our expected output?  We want My_Name/My_Last_Name, and this won’t do that.  What we will get is My_Name__My_Last_Name.  What we want to find and replace is ”  /  ” and replace it with “/”.  Right?

But wait – if we run the “replace spaces with _” first, THEN the “find and replace space/space”, we will not find “space/space” since it has become “_/_”  So now we have to replace “_/_” with ???

The issue is that the final clean up, replacing spaces is easy – and ‘destructive’ – we can’t simply reverse the procedure and get back to our starting point.  Our difficult selection, in this case, is finding ‘  /  ‘ and replacing that.  THEN we can find it easy to just replace additional spaces.  So we need to reverse the order of the replacements to do the ‘/’ on the inside (first) like so

select replace(replace(bad_column_value, ' / ','/'),' ','_') into ...

This will work.  It should be obvious that we can continue to string these things deeper than two.  But much more and we ought to write a function to do this.  Of course, that would probably be slower.

One neat approach is to create a temp table or table valued function listing all the bad characters, and use that table to determine which characters to remove:

declare @badStrings table (item varchar(50))
INSERT INTO @badStrings(item)
SELECT '>' UNION ALL
SELECT '<' UNION ALL
SELECT '(' UNION ALL
SELECT ')' UNION ALL
SELECT '!' UNION ALL
SELECT '?' UNION ALL
SELECT '@'
declare @testString varchar(100), @newString varchar(100)
set @teststring = 'Juliet ro>
set @newString = @testString
SELECT @newString = Replace(@newString, item, '') FROM @badStrings

Don’t know that i would have reassigned @newstring and @teststring, but anyway this would work fine to REPLACE the badstrings with blanks, but not to replace blanks with, for examle, underscores.  That can be done by adding another column to our list of bad strings for the replacement values:

create  TABLE #ReplaceStrings (symb VARCHAR(5),replace_char varchar(5))
INSERT INTO #ReplaceStrings (symb,replace_char) VALUES ('/','|'),(',','|')
DECLARE @OMG VARCHAR(200)
SET @OMG = 'ABC,DE/F'
SELECT @OMG = Replace(@OMG, symb, replace_char)
FROM   #ReplaceStrings
select @OMG

Next, the author goes on to … write a function.

Advertisements