question

kdoyle avatar image
kdoyle asked

suppressing html tags from sql results

Can anyone tell me how to remove/suppress html tags from sql results.
sqlhtml
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

You'd need to be a bit more descriptive here, but first thought would be to use the REPLACE function with your results. It would be pretty ugly as you'd have to exclude for each and every tag. The other thought is that you could create a regex CLR function that would look for and remove those along with everything in between.
1 Like 1 ·
@Katie Doyle As @SirSQL has already mentioned, the CLR would be the best choice. Rather than trying to reinvent the wheel, it could be a good idea to read the article by @Pavel Pawlowski titled [SQL Server Regular Expressions – Replacements][1] where he provides a complete solution to his **SQLRegEx** CLR assembly and related T-SQL functions. [1]: http://www.pawlowski.cz/2011/05/sql-server-regular-expressions-replacements/
0 Likes 0 ·
Thank you for the reference.
0 Likes 0 ·
@kdoyle My comment which is below your answer has some sample script you can use in case if there is only a limited set of html tags in the description field. Optionally, you can post the list of tags used and maybe a part of your query and I can tweak the script from my comment to just do only what it needs to do so you can then use it.
0 Likes 0 ·
Oleg avatar image
Oleg answered
Where do you want to strip the tags? If you have a query which returns some data to the front code, and would like to use some front end code to perform the cleaning then you can use a regular expression to do so. In this case, you will need an appropriate using statement (if you program in C#) using System.Text.RegularExpressions; // and then somewhere in you code Regex.Replace(input, @" ", " ").Trim() For example, string input = "

Welcome to our site

" + "Some text
"; string stripped = Regex.Replace(input, @" ", " ").Trim(); Console.WriteLine(stripped.ToString()); The code above prints **Welcome to our site Some text** after html tags are stripped. This pattern finds and strips 3 types of tags: opening tags, closing tags and self-closing tags which might have some spaces between the name of the tag and its self-closing part, such as in <br /> Oleg
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

kdoyle avatar image
kdoyle answered
Thank you Oleg and SirSQL for responding. I am not a programmer so don't clearly understand your replies. To clarify, I am using SQL to pull data via Query Analyzer then move it to Excel.....an example of the data: <html><body>Clarification is needed among Messaging, Billing, and Self Service teams regarding the handling of bounced email messages. Different teams may be making different assumptions.</body></html> Do you know of any SQL function that would accomplish this? Thank you.,Thank you Oleg and SirSQL for your replies. I am not a programmer so I don't clearly understand your responses. I am running a SQL statement in Query Analyzer to extract data then moving it into excel. One of the fields is a description field, which has these tags as shown below. I was hoping for a way to suppress them in my sql stmt. <html><body>Clarification is needed among Messaging, Billing, and Self Service teams regarding the handling of bounced email messages. Different teams may be making different assumptions.</body></html>
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Without the use of regular expressions, it would be quite cumbersome to handle it using replace function because you have to use 2 calls to it per tag to remove it. Every call to this function will remove all occurences of the specified tag, but you still have to handle all different tags. Only and only if the html values are limited to few tags you can use the replace function (or else the script will become hard to maintain). For example, suppose you have your script like this: select col1, col2, col3 from sometable and you know that the values in col3 column have html tags which you need to remove and the only tags that you might have a limited to, say, the following: **html, body, p, b, i**, and **br** then you can use something like this in your script: select col1, col2, replace(replace(replace(replace(replace(replace(replace( replace(replace(replace(replace(replace(replace(replace( col3, '', ''), '', ''), '
', ' '), '
', ' '), '

', ' '),'', ' '), '', ' '), '

', ' '), ' ', ' '),'', ' '),'
', ' '), '', ''),'', ''), ' ', ' ') as col3 from sometable Even with few tags the script becomes hard to read :(
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.