x

Processing XML from web by SSIS 2005

Hi, I have an assignment to set up a SSIS package to download data from web. Basically, this is financial statistics organized in tables, each with a "Download to Spreadsheet" option. They used to do this manually, but apparently there is a need to automate this process. With HTTP connection manager and a Script task I am able to download the whole XML content of a web page into a file, but then I’m not sure what to do. I hoped to use this file as a XML source, but the XSD schema cannot be generated because it’s not a well-formed XML. What I came up: treating the file’s content as a string I truncate it before and after the table that contains the data needed, and then parse that remaining piece of XML. To do this I actually bulk-load files to the SQL and do all the work there… This looks rather like an artificial and awkward solution for me.
So, my question is: are there smart ways to download and process a piece of XML from a web-page? Thank you.
more ▼

asked May 02 '11 at 06:56 AM in Default

Tatyana gravatar image

Tatyana
413 15 15 18

(comments are locked)
10|1200 characters needed characters left

1 answer: sort voted first
If only all web sites were xhtml compliant...take a look at the html agility pack http://htmlagilitypack.codeplex.com/ It can properly form the html so you can read it into xdocuments and manipulate with linq/xpath/xslt
more ▼

answered May 09 '11 at 09:02 PM

Scot Hauder gravatar image

Scot Hauder
5.9k 13 15 18

Thank you, Scot. I'll try your suggestion and be back with the results in a couple of days...
May 10 '11 at 10:38 AM Tatyana
Let us know how it goes. You could also use a webbrowser control to grab the elements you are interested in, elements = WebBrowser1.Document.GetElementsByTagName("td") then enumerate through them... and then there is always regex
May 10 '11 at 10:48 AM Scot Hauder
@Scot, thank you so much for pointing out to the agility pack. It's working! I would not be able to find this source on my own. Also, let me publish here my thanks to the http://www.4guysfromrolla.com/articles/011211-1.aspx article, because on the codeplex.com they don't give much comments on how to use their pack...
May 19 '11 at 12:33 PM Tatyana
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x1933
x899
x136
x5

asked: May 02 '11 at 06:56 AM

Seen: 1435 times

Last Updated: May 19 '11 at 12:49 AM