x

BCP - Text data and Identity columns

I have a table with 70 columns in a prod database and need to work on the contents of a Text type column. My theory is to extract the relevant columns out to a quieter instance to create a Full Text Index and then let rip with the analysis. Records with certain content will be tagged and then their records in live will be updated depending on tags allocated.

So, I tried SSIS and its slow so I am trying BCP. I created a format file for the table and edited it to the 5 columns I am interested in (inc PK column for source table and Text column) and then use BCP with queryout to extract the data (8M rows). When reviewing the data in the destination table the identity column has wrong values and dupes (if I set a PK on the table before insert the bcp in fails with PK violation error.)

my format file

5
1 SQLBIGINT 0 8 "@#\t\t#@" 1 HistoryID ""
2 SQLCHAR 0 7 "@#\t\t#@" 2 Client_Ref SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 1 "@#\t\t#@" 3 Type_Ref SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 1 "@#\t\t#@" 4 Category_Ref SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 0 0 "\\r\\n" 5 Notes SQL_Latin1_General_CP1_CI_AS

bcp out
bcp "SELECT [ch].[HistoryID] , [ch].[Client_Ref] , [ch].[Category_Ref] , [ch].[Type_Ref] , [ch].[Notes] FROM database.[dbo].table AS ch WITH (NOLOCK) where ch.client_ref is not null" queryout E:\CRHist.datx -U username -P pwd -c -S svr1\inst1 -t @#\t\t#@ -r \r\n -C ACP -E
bcp in
bcp doombar.dbo.Table in e:\crhist.datx -S svr2\inst2 -T -f e:\crhist_c.fmt -e e:\BCP_ERR.Log -m 900 -b 100000 -E

I have tried this with the IDENTITY on the source off but it makes no difference. The output file (.dat) has the PK values in it, in order as expected.

[Edit]
I have since been looking at the destination data. In the source a row may have a HistoryID of 1,2,3,4 etc, in the destination is is always 49,50,51,52 etc. I am wondering if the delimiter is affecting the perceived value of the first column? ascii('@')= 64 and ascii('#')=35 so not sure how 1@# #@CLIENTREF@# #@D@# in the .dat file is changing to 49,'CLIENTREF' once its imported... 2177801 becomes 13845291861487922 though so not a straight addition of 48 :(

[Edit II]
Have identified the issue (i think) - the text column has a \r\n string in it and is breaking the format of the data file. What can I use as alternatives to \r\n?

[Edit III]
OK, its working. I changed the row terminator to \\r\\r\\n and the identity values import cleanly.

more ▼

asked Dec 23, 2010 at 06:12 AM in Default

avatar image

Fatherjack ♦♦
43.8k 79 99 118

It will be the delimeter, what was the reason for such a strange one? is it at all possible to go for something a little more "normal", a double-pipe "||" for example - that may help

Dec 23, 2010 at 06:43 AM WilliamD

desperation! I tried double and even triple tab (\\t\\t\\t) as the delimiter but they crop up in the text column and then the import goes all to hell. Will try || and BRB.

Dec 23, 2010 at 06:45 AM Fatherjack ♦♦

using "||" as the delimiter results in Unexpected EOF encountered. Zero rows imported.

Dec 23, 2010 at 06:48 AM Fatherjack ♦♦

Have you tried using -C RAW or -C OEM instead of ACP for the encoding?

Dec 23, 2010 at 06:54 AM WilliamD

wheels fell off when using RAW, not tried OEM ... BRB ... No difference :(

Dec 23, 2010 at 06:56 AM Fatherjack ♦♦
show all comments (comments are locked)
10|1200 characters needed characters left

1 answer: sort voted first

I don't quite understand the problem here:

Your source table has a HistoryID (Bigint, PK) and the other columns. You spew these out into your CRHIST file. You read that into a second instance to do the data crunching, batched in 100k lots.

Are your duplicates occuring in the export to data file, or the insert into target table?

Have you tried running bcp without the batching? Set the target DB into BULK_LOGGED recovery to speed that up. 8 million rows should fly into a table, especially when the data is as small as you have there (130MB by my calculation - or have i misread the table def?).

You can always create the PK afterwards.

Alternatively, have you tried to grab the data via a linked server and SELECT INTO?

EDIT:

I think I got the bugger!

Test data so you can give it a try:

 CREATE TABLE [dbo].[TableA]
 (
 [Col1] [int] NOT NULL,
 [Col2] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
 ) ON [PRIMARY]
 GO
 
 CREATE TABLE [dbo].[TableZ]
 (
 [Col1] [int] NOT NULL,
 [Col2] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
 ) ON [PRIMARY]
 GO


 INSERT INTO Sandbox.dbo.TableA
 SELECT 1,'blah' UNION ALL
 SELECT 2,'hjdf'

 -- export, pipes as column splitter, newline for row
 xp_cmdshell'bcp "SELECT Col1,Col2 FROM Sandbox.dbo.TableA TA" queryout "C:\CRHist.datx" -T -c -S <SERVERNAME> -t"||" -r"\n"  -C OEM'

 -- create a format file according to the target table
 xp_cmdshell'bcp Sandbox.dbo.TableZ format nul -f c:\format.Fmt -n -T '

 -- use the format file with bulk insert
 INSERT SandBox.dbo.TableZ
 SELECT Col1-48,  -- Interprets the int as an ASCII code FAIL!!!
        Col2
 FROM OPENROWSET(bulk N'C:\CRHist.datx',FORMATFILE = 'c:\format.Fmt') AS a

 DROP TABLE SandBox.dbo.TableA
 DROP TABLE SandBox.dbo.TableZ
more ▼

answered Dec 23, 2010 at 06:31 AM

avatar image

WilliamD
26.2k 18 35 48

the .dat file is well formed and accurate ( at least for the 1st few hundred rows that I have inspected so I assume all of it), however the HistoryID value in the destination is altered to random values and includes duplicates. Thus stopping me adding the PK and making sensible use of the data. Somehow the data is being altered during the import. See my edit to the question.

Dec 23, 2010 at 06:37 AM Fatherjack ♦♦

Oh, .dat file is 1.2GB uncompressed. Speed of insert isnt the issue, simply data integrity currently!

Dec 23, 2010 at 06:54 AM Fatherjack ♦♦

tried the linked server, or is that not a possibility?

Dec 23, 2010 at 06:57 AM WilliamD

haven't tried linked servers - was happy with disconnected transfer of data rather than having to wait for quiet time on both source and dest. Rapidly thinking slow perf SSIS is better option. Never had a good experience with BCP :(

Dec 23, 2010 at 07:11 AM Fatherjack ♦♦
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x2190
x53
x8

asked: Dec 23, 2010 at 06:12 AM

Seen: 6820 times

Last Updated: Dec 23, 2010 at 08:02 AM

Copyright 2017 Redgate Software. Privacy Policy