I have a table with 70 columns in a prod database and need to work on the contents of a Text type column. My theory is to extract the relevant columns out to a quieter instance to create a Full Text Index and then let rip with the analysis. Records with certain content will be tagged and then their records in live will be updated depending on tags allocated. So, I tried SSIS and its slow so I am trying BCP. I created a format file for the table and edited it to the 5 columns I am interested in (inc PK column for source table and Text column) and then use BCP with queryout to extract the data (8M rows). When reviewing the data in the destination table the identity column has wrong values and dupes (if I set a PK on the table before insert the bcp in fails with PK violation error.) my format file > 10.0 > 5 > 1 SQLBIGINT 0 8 "@#\t\t#@" 1 HistoryID "" > 2 SQLCHAR 0 7 "@#\t\t#@" 2 Client\_Ref SQL_Latin1_General\_CP1\_CI\_AS > 3 SQLCHAR 0 1 "@#\t\t#@" 3 Type\_Ref SQL_Latin1_General_CP1_CI_AS > 4 SQLCHAR 0 1 "@#\t\t#@" 4 Category\_Ref > SQL_Latin1_General\_CP1\_CI\_AS > 5 SQLCHAR 0 0 "\\r\\n" 5 Notes SQL\_Latin1\_General\_CP1\_CI\_AS **bcp out** bcp "SELECT [ch].[HistoryID] , [ch].[Client_Ref] , [ch].[Category_Ref] , [ch].[Type_Ref] , [ch].[Notes] FROM database.[dbo].table AS ch WITH (NOLOCK) where ch.client_ref is not null" queryout E:\CRHist.datx -U username -P pwd -c -S svr1\inst1 -t @#\t\t#@ -r \r\n -C ACP -E **bcp in** bcp doombar.dbo.Table in e:\crhist.datx -S svr2\inst2 -T -f e:\crhist_c.fmt -e e:\BCP_ERR.Log -m 900 -b 100000 -E I have tried this with the IDENTITY on the source off but it makes no difference. The output file (.dat) has the PK values in it, in order as expected. [Edit] I have since been looking at the destination data. In the source a row may have a HistoryID of 1,2,3,4 etc, in the destination is is always 49,50,51,52 etc. I am wondering if the delimiter is affecting the perceived value of the first column? ascii('@')= 64 and ascii('#')=35 so not sure how `1@# #@CLIENTREF@# #@D@#` in the .dat file is changing to `49,'CLIENTREF'` once its imported... `2177801` becomes `13845291861487922` though so not a straight addition of 48 :( [Edit II] Have identified the issue (i think) - the text column has a \r\n string in it and is breaking the format of the data file. What can I use as alternatives to \r\n? [Edit III] OK, its working. I changed the row terminator to \\r\\r\\n and the identity values import cleanly.
I don't quite understand the problem here: Your source table has a HistoryID (Bigint, PK) and the other columns. You spew these out into your CRHIST file. You read that into a second instance to do the data crunching, batched in 100k lots. Are your duplicates occuring in the export to data file, or the insert into target table? Have you tried running bcp without the batching? Set the target DB into BULK_LOGGED recovery to speed that up. 8 million rows should fly into a table, especially when the data is as small as you have there (130MB by my calculation - or have i misread the table def?). You can always create the PK afterwards. Alternatively, have you tried to grab the data via a linked server and `SELECT INTO`? **EDIT:** I think I got the bugger! Test data so you can give it a try: CREATE TABLE [dbo].[TableA] ( [Col1] [int] NOT NULL, [Col2] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ) ON [PRIMARY] GO CREATE TABLE [dbo].[TableZ] ( [Col1] [int] NOT NULL, [Col2] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ) ON [PRIMARY] GO INSERT INTO Sandbox.dbo.TableA SELECT 1,'blah' UNION ALL SELECT 2,'hjdf' -- export, pipes as column splitter, newline for row xp_cmdshell'bcp "SELECT Col1,Col2 FROM Sandbox.dbo.TableA TA" queryout "C:\CRHist.datx" -T -c -S -t"||" -r"\n" -C OEM' -- create a format file according to the target table xp_cmdshell'bcp Sandbox.dbo.TableZ format nul -f c:\format.Fmt -n -T ' -- use the format file with bulk insert INSERT SandBox.dbo.TableZ SELECT Col1-48, -- Interprets the int as an ASCII code FAIL!!! Col2 FROM OPENROWSET(bulk N'C:\CRHist.datx',FORMATFILE = 'c:\format.Fmt') AS a DROP TABLE SandBox.dbo.TableA DROP TABLE SandBox.dbo.TableZ