question

nevada1978 avatar image
nevada1978 asked

How to create a Format File for a Bulk Insert

I have a series of fixed width text files. A sample of the data is given below. The files are originally given names such as xyzjob123456.678x. 320100000120170203 320100000320170203 320100000520170203 The end result should look like this 32 01 000001 2017 02 03 With the schema being as such stfips char(2) Areatype char (2) Area char(6) periodyear char (4) periodtype char(2) period char (2) I am reasonably sure that I have to use a format file and so would start out as such: BULK INSERT TableA FROM 'C:\Temp\ xyzjob123456678x.txt' WITH FORMATFILE = 'C:\Temp\Format.xml' SELECT * FROM TableA Would the following format file layout be correct? Also, I know that the 6th column (1,2,1,2..) is incorrect. Does anybody know what this represents? 9.0 6 1 SQLCHAR 0 02 "" 1 stfips "" 2 SQLCHAR 0 02 "" 2 areatype "" 3 SQLCHAR 0 06 "" 1 area "" 4 SQLCHAR 0 04 "" 2 periodyear "" 5 SQLCHAR 0 02 "" 1 periodtype "" 6 SQLCHAR 0 02 "/r/n" 2 period ""
ssmsinsertbulk-insertformat
8 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

@nevada1978 I submitted the answer, but it went into moderation, so it will become available once one of the moderators approves it. To sum up, your file has few errors: replace forward slash with backslash in column 6 of the last rows to spell out the new line correctly (**\\r\\n**), change the values in column 6 to match those in column 1 (1, 2, 3, 4, 5, 6) and also change the fist line to match your SQL Server version. 9.0 means SQL Server 2005, so if you have 2012 then you should have 11.0 instead of 9.0, if you have 2014 then you should have 12.0 Oh, most importantly, if you use non-XML format file, don't name it with XML extension, use FMT instead, so the file name should be format.fmt, not format.xml.
0 Likes 0 ·
@Oleg. Would the following be correct? If I needed to expand upon this to 22 fields, would the 6 that is below 11.0 be changed to 22. Second, if the field is numeric (9), how would it change the format file, if at all? 11.0 6 1 SQLCHAR 0 02 "" 1 stfips "" 2 SQLCHAR 0 02 "" 2 areatype "" 3 SQLCHAR 0 06 "" 3 area "" 4 SQLCHAR 0 04 "" 4 periodyear "" 5 SQLCHAR 0 02 "" 5 periodtype "" 6 SQLCHAR 0 02 "\\r\\n" 6 period ""
0 Likes 0 ·
@nevada1978 Because the file is fixed length, I assume that the numeric field values have the same number of digits. Is this assumption correct? If not then it would be impossible to figure out where the next column starts without a delimiter, but you mentioned that there is no delimiter, the file is fixed length. Usually, such files which include numeric fields come in with numbers left padded with zeroes to preserver the order and position of all columns. In this case, SQLCHAR is the best bet because the char value will still fit into the numeric column in SQL Server database table so long as the number can be converted. Yes, second line is the total number of columns, so it should be 22 in your case. The terminator of the last column is the carriage return and line feed, spelled out with single backslash. The reason it is double in my comment is due to the way the comments are parsed, i.e. single backslash gets "goubled up" and not shown at all while 2 of them in a row (which should in theory escape the fist and show the second) still shows both. The last line should be like this: SQLCHAR 0 02 "\r\n" 6
0 Likes 0 ·
@Oleg, Your assumptions are correct. I kept it simple for learning purposes. In the actual file, each row is 139 characters long to be divided up into 22 fields. All of the "numeric" fields are left-padded with zeros. For example, employment is numeric (9) and shows up as 000004567. Of the 22, 11 are numeric but none have decimals.
0 Likes 0 ·
@Kev Riley ♦♦ @ThomasRushton ♦♦ Could you please help me out with this answer? I posted it few hours ago, and it went to moderation. Maybe there is a way to alter my account somehow so that my answers don't end up in moderation prior to becoming available. Thank you!
0 Likes 0 ·
@nevada1978 In this case, the SQLCHAR for all fields will work just fine. This is because the engine will not have any problems to implicitly convert the left padded string to whatever numeric data type of the destination column. So, for the column in your comment, you can do this in the corresponding row of the format file: 7 SQLCHAR 0 09 "" 7 someNumericColumn "" where 7 is the example of the column number and 09 is the number of characters including leading zeroes.
0 Likes 0 ·
Hi @Oleg - answer released from moderation. I'll have a word with Dave about your account, see if we can't identify some solution to this problem...
0 Likes 0 ·
@ThomasRushton ♦♦ Thank you for looking into it. I believe that whenever I post the answer which includes any "dangerous" words, such as drop, format, etc then the answer lands in moderation for approval.
0 Likes 0 ·

1 Answer

· Write an Answer
Oleg avatar image
Oleg answered
Here is the list of problems with your format file: 1. If you are using the non-XML format file, please avoid naming the file with XML extension. The most common extension used to format files is FMT, so the file should be named something lie **format.fmt** 2. The 6th column in the format file is "Server column order". Thus, the values in your format file for this column are incorrect, they need to mimic the values in the first column (host file field order) because it looks like you would like to load the data from the file in the matching order. 3. The terminator field in the file is incorrect. It needs to specify the carriage return and line feed which should be represented as **\\r\\n** but in the file the **/r/n** is used. Please change the it to what it needs to be:
\r\n
4. The first line in the format file (reading 9.0) is very probably incorrect. The number in the first line needs to match the SQL Server version, so 9.0 is for SQL Server 2005, 10.0 - for 2008 and 2008 R2, 11.0 - for 2012, 12.0 for 2014 and 13.0 for 2016. Here is the format file which works with your sample data. In my case the SQL Server version is 2016 so I use 13.0 in the first line. Also, the syntax of the bulk insert in question is invalid, it misses parentheses around the format file specifications. Here is the script and format file which work: BULK INSERT dbo.TableA FROM 'C:\Temp\ xyzjob123456678x.txt' WITH (FORMATFILE = 'C:\Temp\Format.fmt'); Format file: 13.0 6 1 SQLCHAR 0 02 "" 1 stfips "" 2 SQLCHAR 0 02 "" 2 areatype "" 3 SQLCHAR 0 06 "" 3 area "" 4 SQLCHAR 0 04 "" 4 periodyear "" 5 SQLCHAR 0 02 "" 5 periodtype "" 6 SQLCHAR 0 02 "\r\n" 6 period "" Hope this helps. Oleg
9 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

@Oleg, does one need to use a Universal Naming Convention in the first line of the bulk insert command if the sql license and server are on the same machine?
0 Likes 0 ·
@nevada1978 Not necessarily. What is important to understand is that the BULK INSERT is executed on the server, so if you have a local path (drive name, folder name) then the drive name refers to the drive name from the server perspective. If the files to import are located on the different box then yes, UNC path should be used. In this case, the network account under which the SQL Server (or proxy in case of the job) is running, needs to have enough permissions to the network share where the files are stored. Basically, if you run the script from SSMS after connecting with SSPI (your network credentials) then the script is running on the server using your credentials. If you login with SQL Server authentication then it runs as the account which is used to run the SQL Server services. If it is a SQL job then it is a proxy account or the specified account in job configuration. In any case, whatever account is used needs to have permissions to the local folder or UNC share, whatever is used in the script.
0 Likes 0 ·
@Oleg, Last one, If I was to construct a format file and one of the fields was numeric (6,2), how do I represent that? The source file in this case is a .csv
0 Likes 0 ·
@nevada1978 If the column is decimal or numeric then the name of the type in the second column of the format file is SQLDECIMAL or SQLNUMERIC respectively, The value in the 4th column is always 19 and the value in the second column is 1 for these types. Here is the example of the format file to match the table with 4 columns (int, decimal, numeric, and datetime):
14.0
4
1    SQLINT        0    4    ","      1     RecordID        ""
2    SQLDECIMAL    1    19   ","      2     DecColOne       ""
3    SQLNUMERIC    1    19   ","      3     NumColTwo       ""
4    SQLDATETIME   1    8    "\r\n"   4     DateCol         ""
In any case, it is always safer to use bcp itself to generate correct format files. For example, running this from the command prompt (after typing correct table, database and server names) will produce a valid format file (comma-delimited data): bcp YourDB.dbo.SomeTable format nul -S SrvName -T -n -t "," -r "\n" -f "c:\temp\SomeTable.fmt"
0 Likes 0 ·
I get an error when I make the columns sqldecimal about invalid column length and it needing precision. Any thoughts on why this would be?
0 Likes 0 ·
Show more comments

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.