x

How to create a Format File for a Bulk Insert

I have a series of fixed width text files. A sample of the data is given below. The files are originally given names such as xyzjob123456.678x.

    320100000120170203
    320100000320170203
    320100000520170203

The end result should look like this

 32   01    000001    2017   02   03

With the schema being as such

   stfips  char(2)
   Areatype   char (2)
   Area    char(6)
   periodyear   char (4)
   periodtype   char(2)
   period  char (2)

I am reasonably sure that I have to use a format file and so would start out as such:

     BULK INSERT TableA FROM 'C:\Temp\xyzjob123456678x.txt'
     WITH FORMATFILE = 'C:\Temp\Format.xml'
     SELECT * FROM TableA

Would the following format file layout be correct? Also, I know that the 6th column (1,2,1,2..) is incorrect. Does anybody know what this represents?

    9.0
    6
   1     SQLCHAR     0     02     ""         1     stfips          ""
   2     SQLCHAR     0     02     ""         2     areatype        ""
   3     SQLCHAR     0     06     ""         1     area            ""
   4     SQLCHAR     0     04     ""         2     periodyear      ""
   5     SQLCHAR     0     02     ""         1     periodtype      ""
   6     SQLCHAR     0     02     "/r/n"         2     period          ""
more ▼

asked Apr 05 at 06:18 PM in Default

avatar image

nevada1978
51 1 6

@nevada1978 I submitted the answer, but it went into moderation, so it will become available once one of the moderators approves it. To sum up, your file has few errors: replace forward slash with backslash in column 6 of the last rows to spell out the new line correctly (\\r\\n), change the values in column 6 to match those in column 1 (1, 2, 3, 4, 5, 6) and also change the fist line to match your SQL Server version. 9.0 means SQL Server 2005, so if you have 2012 then you should have 11.0 instead of 9.0, if you have 2014 then you should have 12.0

Oh, most importantly, if you use non-XML format file, don't name it with XML extension, use FMT instead, so the file name should be format.fmt, not format.xml.

Apr 06 at 01:45 PM Oleg

@Oleg.

Would the following be correct? If I needed to expand upon this to 22 fields, would the 6 that is below 11.0 be changed to 22. Second, if the field is numeric (9), how would it change the format file, if at all?

    11.0
    6
    1     SQLCHAR     0     02     ""         1     stfips          ""
    2     SQLCHAR     0     02     ""         2     areatype        ""
    3     SQLCHAR     0     06     ""         3     area            ""
    4     SQLCHAR     0     04     ""         4     periodyear      ""
    5     SQLCHAR     0     02     ""         5     periodtype      ""
    6     SQLCHAR     0     02     "\\r\\n"   6     period          ""
Apr 06 at 04:51 PM nevada1978

@nevada1978 Because the file is fixed length, I assume that the numeric field values have the same number of digits. Is this assumption correct? If not then it would be impossible to figure out where the next column starts without a delimiter, but you mentioned that there is no delimiter, the file is fixed length. Usually, such files which include numeric fields come in with numbers left padded with zeroes to preserver the order and position of all columns. In this case, SQLCHAR is the best bet because the char value will still fit into the numeric column in SQL Server database table so long as the number can be converted.

Yes, second line is the total number of columns, so it should be 22 in your case.

The terminator of the last column is the carriage return and line feed, spelled out with single backslash. The reason it is double in my comment is due to the way the comments are parsed, i.e. single backslash gets "goubled up" and not shown at all while 2 of them in a row (which should in theory escape the fist and show the second) still shows both. The last line should be like this:

 SQLCHAR     0     02     "\r\n"   6
Apr 06 at 05:16 PM Oleg

@Oleg,

Your assumptions are correct. I kept it simple for learning purposes. In the actual file, each row is 139 characters long to be divided up into 22 fields. All of the "numeric" fields are left-padded with zeros. For example, employment is numeric (9) and shows up as 000004567. Of the 22, 11 are numeric but none have decimals.

Apr 06 at 05:24 PM nevada1978

@Kev Riley ♦♦ @ThomasRushton ♦♦ Could you please help me out with this answer? I posted it few hours ago, and it went to moderation. Maybe there is a way to alter my account somehow so that my answers don't end up in moderation prior to becoming available. Thank you!

Apr 06 at 05:32 PM Oleg
show all comments (comments are locked)
10|1200 characters needed characters left

1 answer: sort voted first

Here is the list of problems with your format file:

  1. If you are using the non-XML format file, please avoid naming the file with XML extension. The most common extension used to format files is FMT, so the file should be named something lie format.fmt

  2. The 6th column in the format file is "Server column order". Thus, the values in your format file for this column are incorrect, they need to mimic the values in the first column (host file field order) because it looks like you would like to load the data from the file in the matching order.

  3. The terminator field in the file is incorrect. It needs to specify the carriage return and line feed which should be represented as \\r\\n but in the file the /r/n is used. Please change the it to what it needs to be:

    \r\n
  4. The first line in the format file (reading 9.0) is very probably incorrect. The number in the first line needs to match the SQL Server version, so 9.0 is for SQL Server 2005, 10.0 - for 2008 and 2008 R2, 11.0 - for 2012, 12.0 for 2014 and 13.0 for 2016.

Here is the format file which works with your sample data. In my case the SQL Server version is 2016 so I use 13.0 in the first line. Also, the syntax of the bulk insert in question is invalid, it misses parentheses around the format file specifications. Here is the script and format file which work:

 BULK INSERT dbo.TableA FROM 'C:\Temp\xyzjob123456678x.txt'
 WITH (FORMATFILE = 'C:\Temp\Format.fmt');

Format file:

 13.0
 6
 1    SQLCHAR        0    02    ""            1    stfips        ""
 2    SQLCHAR        0    02    ""            2    areatype    ""
 3    SQLCHAR        0    06    ""            3     area        ""
 4    SQLCHAR        0    04    ""            4    periodyear    ""
 5    SQLCHAR        0    02    ""            5    periodtype    ""
 6    SQLCHAR        0    02    "\r\n"        6    period        ""

Hope this helps.

Oleg

more ▼

answered Apr 09 at 01:35 PM

avatar image

Oleg
20.6k 3 7 29

@Oleg,

does one need to use a Universal Naming Convention in the first line of the bulk insert command if the sql license and server are on the same machine?

Apr 09 at 05:58 PM nevada1978

@nevada1978 Not necessarily. What is important to understand is that the BULK INSERT is executed on the server, so if you have a local path (drive name, folder name) then the drive name refers to the drive name from the server perspective. If the files to import are located on the different box then yes, UNC path should be used. In this case, the network account under which the SQL Server (or proxy in case of the job) is running, needs to have enough permissions to the network share where the files are stored. Basically, if you run the script from SSMS after connecting with SSPI (your network credentials) then the script is running on the server using your credentials. If you login with SQL Server authentication then it runs as the account which is used to run the SQL Server services. If it is a SQL job then it is a proxy account or the specified account in job configuration. In any case, whatever account is used needs to have permissions to the local folder or UNC share, whatever is used in the script.

Apr 09 at 08:37 PM Oleg

@Oleg,

Last one, If I was to construct a format file and one of the fields was numeric (6,2), how do I represent that? The source file in this case is a .csv

Apr 12 at 08:09 PM nevada1978

@nevada1978 If the column is decimal or numeric then the name of the type in the second column of the format file is SQLDECIMAL or SQLNUMERIC respectively, The value in the 4th column is always 19 and the value in the second column is 1 for these types. Here is the example of the format file to match the table with 4 columns (int, decimal, numeric, and datetime):

14.0
4
1    SQLINT        0    4    ","      1     RecordID        ""
2    SQLDECIMAL    1    19   ","      2     DecColOne       ""
3    SQLNUMERIC    1    19   ","      3     NumColTwo       ""
4    SQLDATETIME   1    8    "\r\n"   4     DateCol         ""

In any case, it is always safer to use bcp itself to generate correct format files. For example, running this from the command prompt (after typing correct table, database and server names) will produce a valid format file (comma-delimited data):

 bcp YourDB.dbo.SomeTable format nul -S SrvName -T -n -t "," 
     -r "\n" -f "c:\temp\SomeTable.fmt"

Apr 13 at 01:25 PM Oleg

I get an error when I make the columns sqldecimal about invalid column length and it needing precision. Any thoughts on why this would be?

Apr 13 at 04:22 PM nevada1978
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x169
x147
x44
x6

asked: Apr 05 at 06:18 PM

Seen: 89 times

Last Updated: 5 days ago

Copyright 2018 Redgate Software. Privacy Policy