nevada1978 avatar image
nevada1978 asked

How to build an XML format file for Bulk Insert

This is a coninuation of a previous question. I have since given up on finding a way to accomplish this task in a non-XML fashion. The non-XML is so much easier to comprehend for a novice but apparently it has limitations. Ok, so here is the landscape. I have a CSV with 29 columns. Four of them are numeric (5,2). Below are example data, table schema, and an example that I found on a microsoft site. To keep it simple, I am going to concoct an example that is not so complex so as to learn the concept. The table is named import.csv at the moment is stored in the documents folder on the local machine. At the very bottom is my first attempt at writing an XML Format File for Bulk Insert. Sample Data "xyz",32,01,02,000006,12.51,512.59,1 "Somebody,someone,nobody",32,01,02,000009,13.54,538.64,0 "Monsters, Inc.",32,01,03,000011,14.25,601.23,1 Schema name char(50) state char(2) areatype char(2) areavalue char(2) area char(6) avgwage numeric(5,2) avgmonth numeric(5,2) suppress char(1) I am getting the following error and can't figure out what the issue could be. All examples of this that I find online seems to have theirs set up the same way. Any insights will greatly appreciated. Msg 4857, Level 16, State 1, Line 1 Line 18 in format file "C:\Users\myname\Documents\XML.xml": Attribute "precision" could not be specified for this type.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Usman Butt avatar image Usman Butt commented ·
To elaborate, the following format file works fine while reading the file
1 Like 1 ·
Usman Butt avatar image Usman Butt commented ·
There are seven columns in the format file and in the data file, but you have only six columns in the table? Moreover, the sequence of the columns in the format file mismatches the file's columns sequence. If you can specify the details about these anomalies, then we can come up with a precise solution quickly.
0 Likes 0 ·
Usman Butt avatar image Usman Butt commented ·
To read the file I used SELECT * FROM OPENROWSET(BULK 'E:\BulkInsertData.txt', FORMATFILE ='E:\BulkInsertData.xml')a
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@nevada1978 I thought that the information in the answer to the [question about non-xml format file][1] was sufficient to get the insert of the data into database table from CSV up an running, but I guess that somehow it was not. I am not sure what can possibly cause the script to fail. The script to insert data from CSV file does not really need any format file, it always works without it provided that the data is suitable: bulk insert dbo.TestTypes from 'c:\temp\TestTypes.csv' with ( fieldterminator = ',', firstrow = 2, rowterminator = '0x0d0a'); If the format file is really needed for whatever reason then the script can be easily restated to use it. The bottom line is that the text in the CSV file is just that: comma-delimited text, that is all. Thus, there is never a problem to insert the data into database table from CSV provided that the data is suitable (there are no invalid values, such as date in incorrect format, or alphas where numbers are expected, etc.). [1]:
0 Likes 0 ·
nevada1978 avatar image nevada1978 commented ·
@Oleg, This is another case. In this case, my colleague has a file that has quotes that have to be left alone, For example, John, Jane, and somebody law firm has to be left whole and not delimited. If not for the quotes left on some fields but not all, the simple bulk insert command would be more than sufficient.
0 Likes 0 ·
Show more comments

1 Answer

Oleg avatar image
Oleg answered
I borrowed format file from @Usman Butt comment because it is already a working sample, I just needed to add a couple of touch ups to handle the text qualifier. If the values in the first column are qualified with double quotes then there should be 3 FIELD nodes in the RECORD part of the file. This is because the first double quote and the comma after the closing double quote should be ignored. The COLUMN nodes remain intact except that the references to the FIELD need to be modified accordingly, in this specific case to 2, 4, 5, 6, 7, 8, 9 and 10. Most importantly, it appears that the row terminator is not represented properly. It **MUST** be ***backslash r backslash n***, **NOT** the ***forward slash r forward slash n***. This change needs to be done. Forward slashes don't make any sense in this context so I am not sure why they are used in the sample file in question. Here is the text of the format file, which should be named with XML extension:
    <FIELD ID="1" xsi:type="CharFixed" LENGTH="1" />
    <FIELD ID="2" xsi:type="CharTerm" TERMINATOR='"' MAX_LENGTH="50" />
    <FIELD ID="3" xsi:type="CharFixed" LENGTH="1" />
    <FIELD ID="4" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="2" /> 
    <FIELD ID="5" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="2" />
    <FIELD ID="6" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="2" />
    <FIELD ID="7" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="6" />
    <FIELD ID="8" xsi:type="CharTerm" TERMINATOR="," />
    <FIELD ID="9" xsi:type="CharTerm" TERMINATOR="," />
    <FIELD ID="10" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="1" />
    <COLUMN SOURCE="2" NAME="name" xsi:type="SQLNVARCHAR"/>
    <COLUMN SOURCE="4" NAME="state" xsi:type="SQLNVARCHAR"/>
    <COLUMN SOURCE="5" NAME="areatype" xsi:type="SQLNVARCHAR"/>
    <COLUMN SOURCE="6" NAME="areavalue" xsi:type="SQLNVARCHAR"/>
    <COLUMN SOURCE="7" NAME="area" xsi:type="SQLNVARCHAR"/>
    <COLUMN SOURCE="8" NAME="avgwage" xsi:type="SQLNUMERIC" PRECISION="5" SCALE="2" />
    <COLUMN SOURCE="9" NAME="avgmonth" xsi:type="SQLNUMERIC" PRECISION="5" SCALE="2" />
    <COLUMN SOURCE="10" NAME="suppress" xsi:type="SQLNVARCHAR"/>
Please note how the first 3 FIELD nodes which are used to describe the quoted column values. Suppose I have some temp table defined like this: create table #test ( [name] varchar(50) not null, [state] char(2) not null, areatype char(2) not null, areavalue char(2) not null, area char(6) not null, avgwage decimal(5,2) not null, avgmonth decimal(5,2) not null, suppress char(1) not null ); go Here is the script to insert records into this table from CSV file, stored, for example in Temp folder on C drive: bulk insert #test from 'C:\Temp\quoted_src.csv' with (formatfile = 'C:\Temp\quoted.xml'); Based on the sample data in question, the select all from that temp table produces the following result: name state areatype areavalue area avgwage avgmonth suppress -------------------- ----- -------- --------- ------ ------- -------- -------- xyz 32 01 02 000006 12.51 512.59 1 Somebody,someone,nob 32 01 02 000009 13.54 538.64 0 Monsters, Inc. 32 01 03 000011 14.25 601.23 1 Hope this helps. Oleg
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

nevada1978 avatar image nevada1978 commented ·
@oleg, for sake of experimentation if the second field ,state, was to have quotes around, how would it change your XML script?
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@nevada1978 If the second field is quoted then you could produce 3 similar FIELD nodes for the second column and then take field # 5, so in case of first 2 quoted columns you would use 2,5,7,8,9,10,11,12 for COLUMN nodes. If the CSV file included column headers then you would always get the error because the bulk insert will assume that the first row is not the headers but are values and therefore, will error out because the column names for columns 6 and 7 in the header row are not decimals, so the error will be triggered. This is no big deal though because the actual data rows still get inserted. This is why I prefer NON-XML format files, specifying every column as SQLCHAR and instructing the bulk insert to begin from second row (disregard the headers). In this case, if any of the columns are quoted, I would simply specify the delimiter as double quote followed by comma for the quoted column and its preceding column. If the first column is quoted then it would leave me with necessity to update the first column of the destination table (remove the first double quote), but this is not difficult to do.
0 Likes 0 ·
nevada1978 avatar image nevada1978 commented ·
@Oleg, Last one honestly. Can we perform this operation if the data is delimited by | (ctrl+ \\)? Put another way, can the field terminator in the top half of the XML format file by terminator = "|". We are getting arithmetic overflow errors.
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@nevada1978 Yes, of course. You probably meant Shift + backslash not ctrl + backslash to produce the pipe character (decimal code is 124): select 'I' PipeLiteral, char(124) PipeViaDec, char(0x7C) PipeViaHex; The terminator attributes' values needs to be changed to "|" from ",", that is all. If you are getting arithmetic overflow errors then this is probably due to some other reason. For example, your format file specifies 5 precision 2 scale (largest number it can fit is 999.99), so the process can accommodate only small numbers which are less than 1000 (this one or greater will not fit into decimal with precision 5 and scale 2).
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.