Hi, I have an assignment to set up a SSIS package to download data from web. Basically, this is financial statistics organized in tables, each with a "Download to Spreadsheet" option. They used to do this manually, but apparently there is a need to automate this process. With HTTP connection manager and a Script task I am able to download the whole XML content of a web page into a file, but then I’m not sure what to do. I hoped to use this file as a XML source, but the XSD schema cannot be generated because it’s not a well-formed XML. What I came up: treating the file’s content as a string I truncate it before and after the table that contains the data needed, and then parse that remaining piece of XML. To do this I actually bulk-load files to the SQL and do all the work there… This looks rather like an artificial and awkward solution for me. So, my question is: are there smart ways to download and process a piece of XML from a web-page? Thank you.
If only all web sites were xhtml compliant...take a look at the html agility pack
http://htmlagilitypack.codeplex.com/ It can properly form the html so you can read it into xdocuments and manipulate with linq/xpath/xslt