question

tstrole avatar image
tstrole asked

Massive task DB2 to SQL migration

Hello experts - I'm embarking on a massive task to import DB2 data into SQL Server 2012. I have most of the tables definitions created and data loaded. Next I need to create a process to refresh the data. I am trying the merge statement, but is that the best approach? I've read there are known issues with merge and I have come across two already. One table loaded some rows twice. Another one loaded all rows and then extra 'bogus' rows (doubling the record) count. In this instance, I **think** it is related to some EBCDIC character. See attached screen shot of the DB2 data. The extra rows have blank or zeros in all of the columns. If merge is not a viable option, what other avenues can I pursue (other than truncate / reload every time).? ![alt text][1] [1]: /storage/temp/3621-db2-table-data.png
mergedata-importdata-processing
db2-table-data.png (19.7 KiB)
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Magnus Ahlkvist avatar image Magnus Ahlkvist commented ·
I'm curious as to which known problems you have read about with the MERGE command, which could have anything to do with your specific problem. The MERGE command is different to the UPDATE command with a FROM clause, in that MERGE will error out if you attempt to UPDATE the same row twice. Your problem seem to be related with data quality - the comparisons don't match though they should logically do that, and you mention EBCDIC conversion being a clue. That should be fixed in queries Reading data from DB2 into SQL Server and can't possibly have anything to do with the MERGE command as such. Since you use the MERGE command, I assume you have identified which columns uniquely identify a row in each table and then I'd say the MERGE command is pretty good.
0 Likes 0 ·
tstrole avatar image tstrole commented ·
Further the columns in the iSeries table that show ++++++++++++++ in the columns (in an ISeries navigator window) are definitely the culprits to being able to merge the data into SQL Server. Does anyone know what those characters represent?
0 Likes 0 ·
JohnM avatar image
JohnM answered
Does the data have some type of unique row identifier? Or a date stamp? If so, I'd look at using SSIS to move this data. It's great for these types of things and if the data has something unique about each row, you can design the package to only load what's new and/or changed (assuming you have a way to identify that). Or you can have the package do a nuke & pave option as a last resort. Hope that helps!
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

tstrole avatar image tstrole commented ·
I have identified the primary key columns for each table. Merge appeared to be working great, and then I executed it for a table with 8 records in the source system, and it inserted 16. (The extra 8 records contained no data but blanks and zeros as I mentioned earlier). I'm grateful this was not a larger table. I don't have the link saved, but I've read someone suggested not using Merge with a linked server, which is what I'm doing. The 2nd instance was randomly inserted duplicate records. I will also look into SSIS, but that option is so much more time consuming, since I can quickly create the merge statements with a script I've developed. I appreciate the comments. I find some smoking gun to the two issues I've encountered, I will post an update for anyone else interested!
0 Likes 0 ·
tstrole avatar image tstrole commented ·
I have discovered this interesting fact. A query against the linked server table from SQL Server does in fact return extra rows that do not exist in the source table. This would be my root cause of the merge problem. Any thoughts on this? At least I know now it appears to be a linked server query problem. There are only 8 records in the source data, when I do a select * from the table.
0 Likes 0 ·
tstrole avatar image
tstrole answered
This attachment shows a link server query that returns double the # of records that actually exist in the source DB2 table. Curious if anyone else has ever seen this.

linked-server.png (22.6 KiB)
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Adrian_1 avatar image
Adrian_1 answered
the ++++ are often numerics that don't fit or dates in the wrong format.... I would really have a look at using SSIS, we use Lookup and Insert/Update accordingly. We have found SSIS to be so much faster than using Linked Server
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.