Introduction
It is often needed to remove all HTML tags for a column(s) from a database table. One way to do it is regular expression pattern matching or using a custom function that will hunt down HTML tags and remove them. In this small post, I will demonstrate this second method.
T-Sql Function
Here is a very simple function that replaces/removes HTML tags and returns clean texts without using any patterns. Additionally, you can pass a delimiter that will replace the HTML tags in-case you do not want to mix everything in one big chunk of text.
CREATE FUNCTION [dbo].[CleanHTMLTags] (@HTMLText VARCHAR(MAX),@ReplaceChar char(1))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
SET @Start = CHARINDEX(‘<’,@HTMLText) SET @End = CHARINDEX(‘>’,@HTMLText,
CHARINDEX(‘<’,@HTMLText)) SET @Length = (@End – @Start) + 1
WHILE @Start > 0 AND @End > 0 AND @Length > 0
BEGIN
IF (UPPER(SUBSTRING(@HTMLText, @Start, 4)) <> ‘
‘) AND (UPPER(SUBSTRING(@HTMLText, @Start, 5)) <> ")
begin
SET @HTMLText = RTRIM(LTRIM(STUFF(@HTMLText,@Start,@Length,@ReplaceChar)));
end
ELSE
SET @Length = 0;
SET @Start = CHARINDEX(‘<’,@HTMLText, @End-@Length)
SET @End = CHARINDEX(‘>’,@HTMLText,CHARINDEX(‘<’,@HTMLText, @Start))
SET @Length = (@End – @Start) + 1
END
RETURN isnull(RTRIM(LTRIM(@HTMLText)) ,")
END
And that is it. Say you have a table named “PRODUCTS
” and that contains a column “ProductDetails
” that contains HTML tags. In our scenario, we have to remove all HTML tags and replace them by comma(,). Utilizing the above function, we can make the call like this:
SELECT [dbo].[CleanHTMLTags](ProductDetails,’,') FROM PRODUCTS
Resultant data should be a big text respect to each records in “PRODUCTS
” table with HTML tags removed.