x

suppressing html tags from sql results

Can anyone tell me how to remove/suppress html tags from sql results.

more ▼

asked Mar 15, 2012 at 07:54 PM in Default

avatar image

kdoyle
1 1 1 2

You'd need to be a bit more descriptive here, but first thought would be to use the REPLACE function with your results. It would be pretty ugly as you'd have to exclude for each and every tag. The other thought is that you could create a regex CLR function that would look for < and > and remove those along with everything in between.

Mar 15, 2012 at 08:29 PM SirSQL

@Katie Doyle As @SirSQL has already mentioned, the CLR would be the best choice. Rather than trying to reinvent the wheel, it could be a good idea to read the article by @Pavel Pawlowski titled SQL Server Regular Expressions – Replacements where he provides a complete solution to his SQLRegEx CLR assembly and related T-SQL functions.

Mar 16, 2012 at 02:11 PM Oleg

Thank you for the reference.

Mar 16, 2012 at 02:48 PM kdoyle

@kdoyle My comment which is below your answer has some sample script you can use in case if there is only a limited set of html tags in the description field. Optionally, you can post the list of tags used and maybe a part of your query and I can tweak the script from my comment to just do only what it needs to do so you can then use it.

Mar 16, 2012 at 04:24 PM Oleg
(comments are locked)
10|1200 characters needed characters left

2 answers: sort voted first

Where do you want to strip the tags? If you have a query which returns some data to the front code, and would like to use some front end code to perform the cleaning then you can use a regular expression to do so. In this case, you will need an appropriate using statement (if you program in C#)

 using System.Text.RegularExpressions;

 // and then somewhere in you code
 Regex.Replace(input, @"</?\w+\s*/?>", " ").Trim()

For example,

 string input = "<html><body><h2>Welcome to our site</h2>" +
     "Some text<br /></body></html>";
 
 string stripped = Regex.Replace(input, @"</?\w+\s*/?>", " ").Trim();
 
 Console.WriteLine(stripped.ToString());

The code above prints Welcome to our site Some text after html tags are stripped. This pattern finds and strips 3 types of tags: opening tags, closing tags and self-closing tags which might have some spaces between the name of the tag and its self-closing part, such as in <br />

Oleg

more ▼

answered Mar 15, 2012 at 08:32 PM

avatar image

Oleg
18.4k 3 7 28

(comments are locked)
10|1200 characters needed characters left

Thank you Oleg and SirSQL for responding. I am not a programmer so don't clearly understand your replies. To clarify, I am using SQL to pull data via Query Analyzer then move it to Excel.....an example of the data:

<html><body>Clarification is needed among Messaging, Billing, and Self Service teams regarding the handling of bounced email messages. Different teams may be making different assumptions.</body></html>

Do you know of any SQL function that would accomplish this?

Thank you.,Thank you Oleg and SirSQL for your replies. I am not a programmer so I don't clearly understand your responses.

I am running a SQL statement in Query Analyzer to extract data then moving it into excel. One of the fields is a description field, which has these tags as shown below. I was hoping for a way to suppress them in my sql stmt.

<html><body>Clarification is needed among Messaging, Billing, and Self Service teams regarding the handling of bounced email messages. Different teams may be making different assumptions.</body></html>

more ▼

answered Mar 16, 2012 at 02:04 PM

avatar image

kdoyle
1 1 1 2

Without the use of regular expressions, it would be quite cumbersome to handle it using replace function because you have to use 2 calls to it per tag to remove it. Every call to this function will remove all occurences of the specified tag, but you still have to handle all different tags. Only and only if the html values are limited to few tags you can use the replace function (or else the script will become hard to maintain). For example, suppose you have your script like this:

 select col1, col2, col3 from sometable

and you know that the values in col3 column have html tags which you need to remove and the only tags that you might have a limited to, say, the following: html, body, p, b, i, and br then you can use something like this in your script:

 select col1, col2,
     replace(replace(replace(replace(replace(replace(replace(
     replace(replace(replace(replace(replace(replace(replace(
     col3, '<html>', ''), '<body>', ''), '<br>', ' '), 
     '<br/>', ' '), '<p>', ' '),'<b>', ' '), '<i>', ' '),
     '</p>', ' '), '</b>', ' '),'</i>', ' '),'<br />', ' '),
     '</body>', ''),'</html>', ''), '  ', ' ') as col3
     from sometable

Even with few tags the script becomes hard to read :(

Mar 16, 2012 at 03:13 PM Oleg
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x1126
x16

asked: Mar 15, 2012 at 07:54 PM

Seen: 3853 times

Last Updated: Mar 16, 2012 at 04:24 PM

Copyright 2017 Redgate Software. Privacy Policy