x

How to build an XML format file for Bulk Insert

This is a coninuation of a previous question. I have since given up on finding a way to accomplish this task in a non-XML fashion. The non-XML is so much easier to comprehend for a novice but apparently it has limitations.

Ok, so here is the landscape. I have a CSV with 29 columns. Four of them are numeric (5,2). Below are example data, table schema, and an example that I found on a microsoft site. To keep it simple, I am going to concoct an example that is not so complex so as to learn the concept. The table is named import.csv at the moment is stored in the documents folder on the local machine. At the very bottom is my first attempt at writing an XML Format File for Bulk Insert.

Sample Data

  "xyz",32,01,02,000006,12.51,512.59,1
  "Somebody,someone,nobody",32,01,02,000009,13.54,538.64,0
  "Monsters, Inc.",32,01,03,000011,14.25,601.23,1

Schema name char(50) state char(2) areatype char(2) areavalue char(2) area char(6) avgwage numeric(5,2) avgmonth numeric(5,2) suppress char(1)

I am getting the following error and can't figure out what the issue could be. All examples of this that I find online seems to have theirs set up the same way. Any insights will greatly appreciated.

           Msg 4857, Level 16, State 1, Line 1
           Line 18 in format file "C:\Users\myname\Documents\XML.xml": Attribute "precision" could not be specified for 
           this type. 


         <?xml version="1.0"?>
         <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" 
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
         <RECORD>
         <FIELD ID="1" xsi:type="CharTerm" TERMINATOR='","' MAX_LENGTH="50"/>
         <FIELD ID="2" xsi:type="CharTerm" TERMINATOR=',' MAX_LENGTH="2"/>
         <FIELD ID="3" xsi:type="CharTerm" TERMINATOR=',' MAX_LENGTH="2"/>
         <FIELD ID="4" xsi:type="CharTerm" TERMINATOR=',' MAX_LENGTH="2"/>
         <FIELD ID="5" xsi:type="CharTerm" TERMINATOR=',' MAX_LENGTH="6"/>
         <FIELD ID="6" xsi:type="CharTerm" TERMINATOR=',' MAX_LENGTH="6"/>
         <FIELD ID="7" xsi:type="CharTerm" TERMINATOR=',' MAX_LENGTH="6"/>
         <FIELD ID="8" xsi:type="CharTerm" TERMINATOR='/r/n' MAX_LENGTH="1"/>
         </RECORD>
         <ROW>
         <COLUMN SOURCE="1" NAME="NAME" xsi:type="SQLNVARCHAR"/>
         <COLUMN SOURCE="2" NAME="STATE" xsi:type="SQLNVARCHAR"/>
         <COLUMN SOURCE="3" NAME="AREATYPE" xsi:type="SQLNVARCHAR"/>
        <COLUMN SOURCE="4" NAME="AREAVALUE" xsi:type="SQLNVARCHAR"/>
        <COLUMN SOURCE="5" NAME="AREA" xsi:type="SQLNVARCHAR"/>
        <COLUMN SOURCE="6" NAME="AVGWAGE" xsi:type="SQLDECIMAL" precision="5" scale="2" nullable 
         ="YES"/>
       <COLUMN SOURCE="7" NAME="AVGMONTH" xsi:type="SQLDECIMAL" precision ="5" scale ="2" nullable 
        ="YES"/>
       <COLUMN SOURCE="8" NAME="SUPPRESS" xsi:type="SQLNVARCHAR"/>
       </ROW>
       </BCPFORMAT>
more ▼

asked Apr 13 at 06:04 PM in Default

avatar image

nevada1978
51 1 6

There are seven columns in the format file and in the data file, but you have only six columns in the table? Moreover, the sequence of the columns in the format file mismatches the file's columns sequence. If you can specify the details about these anomalies, then we can come up with a precise solution quickly.

Apr 15 at 10:47 AM Usman Butt

To elaborate, the following format file works fine while reading the file

 <?xml version="1.0"?>
 <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <RECORD>
   <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="2" />
   <FIELD ID="2" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="2" />
   <FIELD ID="3" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="2" />
   <FIELD ID="4" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="6" />
   <FIELD ID="5" xsi:type="CharTerm" TERMINATOR="," />
   <FIELD ID="6" xsi:type="CharTerm" TERMINATOR="," />
   <FIELD ID="7" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="18"/>
  </RECORD>
  <ROW>
   <COLUMN SOURCE="1" NAME="state" xsi:type="SQLNVARCHAR"/>
   <COLUMN SOURCE="2" NAME="areatype" xsi:type="SQLNVARCHAR"/>
   <COLUMN SOURCE="3" NAME="areavalue" xsi:type="SQLNVARCHAR"/>
   <COLUMN SOURCE="4" NAME="area" xsi:type="SQLNVARCHAR"/>
   <COLUMN SOURCE="5" NAME="avgwage" xsi:type="SQLNUMERIC" PRECISION="5" SCALE="2"/>
   <COLUMN SOURCE="6" NAME="avgmonth" xsi:type="SQLNUMERIC" PRECISION="5" SCALE="2"/>
   <COLUMN SOURCE="7" NAME="Unknown" xsi:type="SQLNVARCHAR"/>
  </ROW>
 </BCPFORMAT>
Apr 15 at 11:48 AM Usman Butt

To read the file I used

 SELECT * FROM OPENROWSET(BULK 'E:\BulkInsertData.txt', FORMATFILE ='E:\BulkInsertData.xml')a
Apr 15 at 11:48 AM Usman Butt

@nevada1978 I thought that the information in the answer to the question about non-xml format file was sufficient to get the insert of the data into database table from CSV up an running, but I guess that somehow it was not. I am not sure what can possibly cause the script to fail. The script to insert data from CSV file does not really need any format file, it always works without it provided that the data is suitable:

 bulk insert dbo.TestTypes from 'c:\temp\TestTypes.csv'
 with ( fieldterminator = ',', firstrow = 2, rowterminator = '0x0d0a');

If the format file is really needed for whatever reason then the script can be easily restated to use it. The bottom line is that the text in the CSV file is just that: comma-delimited text, that is all. Thus, there is never a problem to insert the data into database table from CSV provided that the data is suitable (there are no invalid values, such as date in incorrect format, or alphas where numbers are expected, etc.).

Apr 16 at 12:31 PM Oleg

@Oleg, This is another case. In this case, my colleague has a file that has quotes that have to be left alone, For example, John, Jane, and somebody law firm has to be left whole and not delimited. If not for the quotes left on some fields but not all, the simple bulk insert command would be more than sufficient.

Apr 16 at 03:54 PM nevada1978
show all comments (comments are locked)
10|1200 characters needed characters left

1 answer: sort voted first

I borrowed format file from @Usman Butt comment because it is already a working sample, I just needed to add a couple of touch ups to handle the text qualifier.

If the values in the first column are qualified with double quotes then there should be 3 FIELD nodes in the RECORD part of the file. This is because the first double quote and the comma after the closing double quote should be ignored. The COLUMN nodes remain intact except that the references to the FIELD need to be modified accordingly, in this specific case to 2, 4, 5, 6, 7, 8, 9 and 10. Most importantly, it appears that the row terminator is not represented properly. It MUST be backslash r backslash n, NOT the forward slash r forward slash n. This change needs to be done. Forward slashes don't make any sense in this context so I am not sure why they are used in the sample file in question.

Here is the text of the format file, which should be named with XML extension:

<BCPFORMAT 
  xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <RECORD>
    <FIELD ID="1" xsi:type="CharFixed" LENGTH="1" />
    <FIELD ID="2" xsi:type="CharTerm" TERMINATOR='"' MAX_LENGTH="50" />
    <FIELD ID="3" xsi:type="CharFixed" LENGTH="1" />
    <FIELD ID="4" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="2" /> 
    <FIELD ID="5" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="2" />
    <FIELD ID="6" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="2" />
    <FIELD ID="7" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="6" />
    <FIELD ID="8" xsi:type="CharTerm" TERMINATOR="," />
    <FIELD ID="9" xsi:type="CharTerm" TERMINATOR="," />
    <FIELD ID="10" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="1" />
  </RECORD>
  <ROW>
    <COLUMN SOURCE="2" NAME="name" xsi:type="SQLNVARCHAR"/>
    <COLUMN SOURCE="4" NAME="state" xsi:type="SQLNVARCHAR"/>
    <COLUMN SOURCE="5" NAME="areatype" xsi:type="SQLNVARCHAR"/>
    <COLUMN SOURCE="6" NAME="areavalue" xsi:type="SQLNVARCHAR"/>
    <COLUMN SOURCE="7" NAME="area" xsi:type="SQLNVARCHAR"/>
    <COLUMN SOURCE="8" NAME="avgwage" xsi:type="SQLNUMERIC" PRECISION="5" SCALE="2" />
    <COLUMN SOURCE="9" NAME="avgmonth" xsi:type="SQLNUMERIC" PRECISION="5" SCALE="2" />
    <COLUMN SOURCE="10" NAME="suppress" xsi:type="SQLNVARCHAR"/>
  </ROW>
</BCPFORMAT>

Please note how the first 3 FIELD nodes which are used to describe the quoted column values. Suppose I have some temp table defined like this:

 create table #test (
     [name] varchar(50) not null,
     [state] char(2) not null,
     areatype char(2) not null,
     areavalue char(2) not null,
     area char(6) not null,
     avgwage decimal(5,2) not null,
     avgmonth decimal(5,2) not null,
     suppress char(1) not null
 );
 go

Here is the script to insert records into this table from CSV file, stored, for example in Temp folder on C drive:

 bulk insert #test from 'C:\Temp\quoted_src.csv'
 with (formatfile = 'C:\Temp\quoted.xml');

Based on the sample data in question, the select all from that temp table produces the following result:

 name                 state areatype areavalue area   avgwage avgmonth suppress
 -------------------- ----- -------- --------- ------ ------- -------- --------
 xyz                  32    01       02        000006 12.51   512.59   1
 Somebody,someone,nob 32    01       02        000009 13.54   538.64   0
 Monsters, Inc.       32    01       03        000011 14.25   601.23   1

Hope this helps.

Oleg

more ▼

answered Apr 16 at 08:16 PM

avatar image

Oleg
20.6k 3 7 29

@oleg, for sake of experimentation if the second field ,state, was to have quotes around, how would it change your XML script?

Apr 17 at 10:30 PM nevada1978

@nevada1978 If the second field is quoted then you could produce 3 similar FIELD nodes for the second column and then take field # 5, so in case of first 2 quoted columns you would use 2,5,7,8,9,10,11,12 for COLUMN nodes. If the CSV file included column headers then you would always get the error because the bulk insert will assume that the first row is not the headers but are values and therefore, will error out because the column names for columns 6 and 7 in the header row are not decimals, so the error will be triggered. This is no big deal though because the actual data rows still get inserted. This is why I prefer NON-XML format files, specifying every column as SQLCHAR and instructing the bulk insert to begin from second row (disregard the headers). In this case, if any of the columns are quoted, I would simply specify the delimiter as double quote followed by comma for the quoted column and its preceding column. If the first column is quoted then it would leave me with necessity to update the first column of the destination table (remove the first double quote), but this is not difficult to do.

Apr 18 at 12:28 PM Oleg

@Oleg,

Last one honestly. Can we perform this operation if the data is delimited by | (ctrl+ \)? Put another way, can the field terminator in the top half of the XML format file by terminator = "|". We are getting arithmetic overflow errors.

Apr 18 at 10:41 PM nevada1978

@nevada1978 Yes, of course. You probably meant Shift + backslash not ctrl + backslash to produce the pipe character (decimal code is 124):

 select 'I' PipeLiteral, char(124) PipeViaDec, char(0x7C) PipeViaHex;

The terminator attributes' values needs to be changed to "|" from ",", that is all. If you are getting arithmetic overflow errors then this is probably due to some other reason. For example, your format file specifies 5 precision 2 scale (largest number it can fit is 999.99), so the process can accommodate only small numbers which are less than 1000 (this one or greater will not fit into decimal with precision 5 and scale 2).

6 days ago Oleg
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x208
x169
x44
x6

asked: Apr 13 at 06:04 PM

Seen: 56 times

Last Updated: 6 days ago

Copyright 2018 Redgate Software. Privacy Policy