question

David 2 1 avatar image
David 2 1 asked

How to Quickly Bulk Insert JSON Data?

Hi there, I am trying to bulk insert a large batch of table stored JSON data into a separate table where each element has its own row. I am using a cursor for this bulk insert, illustrated below, because I need to mark each individual JSON row as completed once done. As this is a cursor it is taking some amount of time, so is there a quicker way using a set based query? The script I am using uses the parseJSON function from Phil Factor [ https://www.red-gate.com/simple-talk/sql/t-sql-programming/consuming-json-strings-in-sql-server/ ] : DECLARE @id INT DECLARE @cid INT DECLARE @cur1 as CURSOR; SET @cur1 = CURSOR FOR SELECT ID,CustID FROM Forms WHERE Done IS NULL OPEN @cur1; FETCH NEXT FROM @cur1 INTO @id,@cid WHILE @@FETCH_STATUS = 0 BEGIN DECLARE @json NVARCHAR(MAX) SET @json = (SELECT CAST(JSONDATA AS NVARCHAR(MAX)) FROM Forms WHERE ID = @id AND CustID = @cid) INSERT INTO FormsParseJSON(element_id,sequenceNo,parent_ID,Object_ID,NAME,StringValue,ValueType) SELECT element_id,sequenceNo,parent_ID,Object_ID,NAME,StringValue,ValueType FROM parseJSON(@json) UPDATE FormsParseJSON SET ID = @id, CustID = @cid WHERE ID IS NULL AND CustID IS NULL UPDATE Forms SET Done = 1 WHERE ID = @id AND CustID = @cid FETCH NEXT FROM @cur1 INTO @id,@cid END CLOSE @cur1 DEALLOCATE @cur1 TIA.
sql-server-2012functionsbulk-insertjson
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@David 2 1 Before going any further, it looks like there is no need to separate the INSERT and UPDATE of the FormsParseJSON table. If you add **ID** and **CustID** columns to the insert list and select list then you can remove the update statement: INSERT INTO FormsParseJSON ( element_id, sequenceNo, parent_ID, Object_ID, NAME, StringValue, ValueType, ID, CustID ) SELECT element_id, sequenceNo, parent_ID, Object_ID, NAME, StringValue, ValueType, @id,@cid FROM parseJSON(@json) Also, there might be no need to update the Forms table rows every iteration of the cursor, specifically because the table is used as a source of it. You might want to comment out the update forms statement inside of the cursor loop and issue a single update after it bails out, i.e. UPDATE Forms SET Done = 1 WHERE Done is null Give me some time, I will try to restate your script. I need to know the answer/confirmation to the following questions: - There is only one row in Forms per ID, CustID combination - There are multiple resulting rows from every such row (JSONDATA column)
1 Like 1 ·
David 2 1 avatar image David 2 1 commented ·
Thanks Oleg. To answer your questions yes there only one row per ID and CustID combination, and yes the parseJSON function produces many rows for each row. I think I see what you mean now about the INSERT and UPDATE statement.
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@David 2 1 I will post the script in a minute, please let me know if it helps with performance.
0 Likes 0 ·
David 2 1 avatar image David 2 1 commented ·
@Oleg thanks, no rush.
0 Likes 0 ·

1 Answer

·
Oleg avatar image
Oleg answered
SQL Server 2016 has JSON support via [openjson][1] and [for json][2]. For earlier versions, the function by @Phil Factor is the best option. Because you already use this function, to translate your procedure into a set based script is a relatively straightforward exercise. There are DML operations involving more than one table, and therefore, the script needs to be wrapped into transaction. Here is the script: begin tran; -- insert rows into FormsParseJSON table insert into FormsParseJSON ( element_id, sequenceNo, parent_ID, [Object_ID], [NAME], StringValue, ValueType, ID, CustID ) select parsed.element_id, parsed.sequenceNo, parsed.parent_ID, parsed.[Object_ID], parsed.[NAME], parsed.StringValue, parsed.ValueType, t.ID, t.CustID from Forms t cross apply parseJSON(cast(JSONDATA as nvarchar(max))) parsed where t.Done is null; update Forms set Done = 1 where Done is null; commit tran; Oleg [1]: https://docs.microsoft.com/en-us/sql/relational-databases/json/convert-json-data-to-rows-and-columns-with-openjson-sql-server [2]: https://docs.microsoft.com/en-us/sql/relational-databases/json/format-query-results-as-json-with-for-json-sql-server
5 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

David 2 1 avatar image David 2 1 commented ·
@Oleg many thanks. The script is running on test so far, however, as it's wrapped in a transaction I cannot see any count on rows done as yet to compare time (but it'll be certain your method will be fastest :)).
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@David 2 1 Not necessarily. Just because the cursor loop is translated to set-based script does not unfortunately mean that it is faster. What I would do before running it is run just the select part. How many rows are in the Forms table where Done is null? How many rows on the average does the JsonData value stores? How long does the cursor loop run vs select statement with cross apply? If you have a long running script and would like to "sneak preview" which part of it is currently executing, you can sprinkle the raiserror with severity 10 and nowait. What is does is it prints out the messages in "real time" even though the whole script is still blocked. You can see the messages immediately in the messages tab (or when choosing results to text option). I will post the sample in the next comment.
0 Likes 0 ·
Oleg avatar image Oleg commented ·
Here is example: declare @start datetime, @end datetime, @message varchar(500); select @start = getdate(), @message = cast(@start as varchar); -- this prints out immediately though the script is still blocked raiserror(@message, 10, 1) with nowait; -- force some delay waitfor delay '00:00:05'; select @end = getdate(), @message = 'Script took ' + cast(datediff(second, @start, @end) as varchar) + ' seconds'; print @message; go
0 Likes 0 ·
David 2 1 avatar image David 2 1 commented ·
@Oleg many thanks. Unfortunately I kicked this off before I read your followup postings and so far it's been executing for ~17 hours. I will update you once complete.
0 Likes 0 ·
Oleg avatar image Oleg commented ·
If the amount of data causes the execution times to be excessively long then the alternatives should be considered. @Phil Factor himself says in the article that **"CLR is the best choice for this sort of thing"**, and he also states that he **"wrote this code for a particular purpose that used reasonably short documents"**. If you have excessively big, complex objects and 100,000+ records to be processed in the Forms table then the "cross apply" part of the query kills performance. This is because the parseJSON function cannot be iTVF due to complexity, and also because T-SQL is not the best choice for string manipulations (it does the work, but it takes time). This does not mean that the performance is doomed, but in order to seriously improve the execution times you should consider wheeling out a CLR solution if your environment allows it. Comments by John Galt include the reference to his [solution on GitHub][1], you might want to consider it. [1]: https://github.com/jgcoding/J-SQL.git
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.