x

xml query very very slow

hi,

i am being given an xml doc to import into a database. my method of doing this is to recevie the xml doc as a xml type param, select the contents out into a user defined table type, and then merge the udt into a physical table. i seem to be having an issue inserting the results into the udt. selecting all the contents out of the xml works fine and fast for around 1000 or so records but if i try and insert @udt select blah blah blah from xml document it basically takes forever and a half. in fact it never finishes i have to stop it. the query looks like this:

select o.value('itmID[1]','nchar(18)') as itmID, o.value('vin[1]','nvarchar(18)') as vin, o.value('engNo[1]','nvarchar(20)') as engNo, o.value('regNo[1]','nvarchar(9)') as regNo, more fields here..... from @ItemsXml.nodes('/Items/item') x(o)

can anyone tell me why the select works fine but a insert @var select from xml is so bad? the xml document is not malformed and looking at it after ive selected everything out i cvnat find anything wrong with it.

more ▼

asked Oct 20, 2010 at 03:32 PM in Default

DavidD gravatar image

DavidD
79 5 5 7

if i insert the xml into a temp table field and then insert into the udt via a select from table cross apply it works. why would that be?
Oct 20, 2010 at 03:55 PM DavidD
The plan has the answer... :) Without looking at the plan it's pretty much anyone's guess...
Oct 20, 2010 at 04:03 PM Matt Whitfield ♦♦
(comments are locked)
10|1200 characters needed characters left

3 answers: sort voted first

Looks like Matt's suggestion is correct. I had exact same problem with large xml inputs, and temp table was the solution. Here is my scenario: huge xml (100s of MBs), is fed into the proc as a parameter. In fact, the input is so big that there is no way to even open it by any available editor (only SlickEdit opens it but still very-very slowly).

Now, the proc has the logic like this: xml has deltas for a day or so, some data is new, some is existing but updated, so for new data:

-- pseudo code
insert into dbo.DestTable 
select * from xml left join dbo.DestTable
    on some_join_condition
    where dbo.DestTable.FirstColumn is null;

showed extremely bad performance. Changing it to:

select
    item.value('SomeNode[1]','varchar(10)') ColName, -- etc
    into #temp from @xml.nodes('//some_path') items(item);

-- and then doing the normal

insert into dbo.DestTable
select * from #temp left join dbo.DestTable
    on some_join_condition
    where dbo.DestTable.FirstColumn is null;

made it incomparably faster. Further attempts showed that it can be even faster than that by simply changing the way to pass xml into the proc. Instead of feeding it xml parameter, I started using just a path to it as a parameter to the proc and then making it read xml contents into local xml variable first before inserting the contents into the temp table. One small problem to overcome is that openrowset does not take a path as a parameter, so

declare @sql nvarchar(500);
declare @xml xml;   

select @sql = '
    select 
        @xmlOut = t.BulkColumn
        from openrowset(bulk ''' + @path + ''', single_blob) t;'

exec sp_executesql @sql, N'@xmlOut xml out', @xmlOut = @xml out;

Once the above is done, the @xml variable is read from disk and is ready to be interrogated. Please note that single_blob for openrowset is faster than single_clob. I read somewhere in BOL that binary represented XML data is the easiest for SQL Server.

Hope this helps,

Oleg
more ▼

answered Oct 20, 2010 at 05:09 PM

Oleg gravatar image

Oleg
15.9k 2 4 24

great post. yes thats pretty similar to what i do. i receive all the xml by param, then insert it into the table var then merge the table with the physical table. it doenst explain why i get bad performance when i insert into a table var or temp table. they are both slow. interestingly if i write the xml param into a temp table first, then select the data of the single xml field in the temp table into another defined temp table with all the correct columns, its fine. id post my exec plans if i new how to its extremely big though
Oct 20, 2010 at 07:17 PM DavidD

Execution plans - post these by exporting the plan (to a .sqlplan file), and then by opening this up in your favourite text editor and copying & pasting (it's just XML under the hood). If it's too big for that, then look at posting it up as a Google Document.

If you've got the screen realestate to make the whole plan visible, then consider a screenshot and posting that up.
Oct 21, 2010 at 12:26 AM ThomasRushton ♦
(comments are locked)
10|1200 characters needed characters left

Have you tried the same into a temporary table instead of a table variable, and compared the execution plans?

Inserting into a table variable can cause parallelism to be disabled, and this might be biting you here.
more ▼

answered Oct 20, 2010 at 03:36 PM

Matt Whitfield gravatar image

Matt Whitfield ♦♦
29.4k 61 65 87

i have, it was the first thing i tried. its exactly the same. eventually have to kill it.
Oct 20, 2010 at 03:49 PM DavidD
Awesome. So what do the plans look like? Can you post some short-ish SQL that replicates it?
Oct 20, 2010 at 03:51 PM Matt Whitfield ♦♦
there's about a thousand nested loops. how do i post it?
Oct 20, 2010 at 04:43 PM DavidD
@David - with great difficulty - in that case... Can you replicate it with some more simple SQL?
Oct 21, 2010 at 12:19 AM Matt Whitfield ♦♦
(comments are locked)
10|1200 characters needed characters left
Nice response Oleg. Two observations: //pizza is different than /pizza. The first searches the whole hierarchy to find elements with pizza name while the second selects the root pizza element only. /pizzas/pizza will select all pizza inside pizzas root element. Maybe you know this, but I am just clarifying as this may have great impact on your query. Second is that while going thru this slowness issue, I broke down my big xml into parts. So instead of having one select /pizzas/pizza/toppings/topping, I selected /pizzas/pizza into a temp table, then, on a second step, I processed each pizza from the temp table. This reduced the processing from 25 seconds to 2 seconds. And that was for creating only 220 rows!
more ▼

answered Jun 12, 2012 at 02:36 PM

Alan gravatar image

Alan
0

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x1838
x150

asked: Oct 20, 2010 at 03:32 PM

Seen: 5500 times

Last Updated: Jun 12, 2012 at 02:36 PM