Introduction
This is a short article on how to perform URLDecoding in T-SQL. It's something I recently had to do and I struggled to find an adequate solution in the public domain, so I decided to write my own. The way it is done is somewhat unusual in that the hexadecimal numbers that denote encoded characters are not treated as numbers, but rather as strings, most of the way through the process. I'll start by laying out the code in its entirety, as it's not very long, and then I'll go through it step by step, explaining as I go.
The Code
CREATE FUNCTION fnURLDecode
(
@input nvarchar(4000)
)
RETURNS nvarchar(4000)
AS
BEGIN
declare @char nvarchar(2)
declare @asc nvarchar(2)
declare @asc2 nvarchar(2)
while (charindex('%', @input) > 0)
begin
set @char=(select substring(@input, charindex('%',
@input) +1, 2))
if (isnumeric(substring(@char, 1, 1)))>0
begin
set @asc=(select cast(substring(@char, 1, 1)
as int))*16
end
else
begin
set @asc=(select ascii(cast(substring(@char, 1, 1)
as char)))-55
set @asc=(select @asc*16)
end
if (isnumeric(substring(@char, 2, 1)))>0
set @asc=(select cast(@asc as int) +
(select cast(substring(@char, 2, 1) as int)))
else
begin
set @asc2=(select ascii(cast(substring(@char, 2, 1)
as char)))-55
set @asc=(select cast(@asc as int) +
(select cast(@asc2 as int)))
end
set @input=
(select substring(@input, 0, charindex('%', @input)))
+ char(@asc) + (select substring(@input, charindex('%',
@input)+3, len(@input)))
end
return @input
END
As you can see, it's not the most lengthy piece of code ever written, but some parts are quite complex and require some explanation. I could have simplified some lines, but beyond a certain point you end up creating variables for the sake of it. I think it's mostly intelligible enough as it is.
Explanations
Let's begin at the beginning. We start with the function declaration. It's a SQL Server user defined function, and it takes a string as an argument. It also returns a string, once it's been completely decoded. Then we define three variables. It can probably be done with less, but three was what I ended up with. These will store 2 character combinations. In the first instance these will be the hexadecimal parts of the character code, so the "20" in "%20" (space). During the process these will be converted to the characters they represent.
From here we plunge into a while loop. The loop condition is based on a charindex function, specifically, we're looking for the next instance of the "%" character. In a URL string this character identifies the hexadecimal character codes we're looking for. The function adjusts the input string as it goes along, so the first code is processed first, the string is updated and when the loop starts again, we're looking for the second code. Eventually there will be no more instances of "%" in the string and the loop will exit (when the result of the charindex function is 0).
Now we set @char equal to the 2 characters immediately following the current instance of "%". This is the code we're after. We deal with each character of the code separately. A 2-digit hexadecimal number can be converted to decimal by simply multiplying the first digit by 16 (as long as it's numeric to start with) and adding the second digit to it (again as long as it's numeric). This is what we do next. Firstly, is the first digit numeric? If so, we apply the formula above, if not we do some conversions.
This part is a bit more interesting. Instead of trying to convince SQL Server that "A" could be a number as well as a character, I cheated. The ASCII code for "A" is 65. "A" in hexadecimal represents the number 10. Therefore, the decimal value of the hexadecimal digits A-F is simply their ASCII code minus 55. So, if the digit we're currently looking at is not numeric, we apply this rule to get its actual value.
To illustrate, say for example we've been given the following character code: "%3F", which is the code for "?". Our routine looks for the "%", it's right at the start. We get the next two characters "3F". We look at the first character, it's numeric, so we multiply it by 16. This leaves us with 48 so far, and "F" to add on. "F" is not numeric, so we apply the other rules. The ASCII code for F is 70. We take 55 away from that to end up with 15 (the correct value for F). We add 15 to our 48 and we get 63. We use the chr function to convert this number to its ASCII equivalent and we get "?", just as we wanted.
The function works its way through the input string, applying the reasoning above to each encoded character. When it gets to the end of the string it exits the loop and returns the decoded string.
Conclusions
I'm not convinced that this is the best way to convert hexadecimal codes to ASCII characters, but it does work and in the absence of a better way I'm offering it up for use. Because of the way the routine works, it would also be possible to strip bits out of it and use it as a general purpose hex->dec converter. Reversing the process should also allow you to encode characters, although there will be a bit of code to write to stipulate which characters require conversion.
This code is the product of lack of sleep and some inexplicable lateral thinking. I hope it helps someone.