x

File parser to retrieve a txt file.

Hi Guys,

I have a file say sample.txt or a .csv file and this file contains a header, data and a footer. Based on the information in header i need to put the data into the database. In other words i need to extract the data based on the header bytes characters, remove the the footer of the file and put it in the Database.

FYI, everything will be a single string of data and the calculation of header, data and footer should be done on basis of bytes characters ... for example ...the total bytes characters for header or footer component is 700. Size of data component will be variable.

Which is the best approach for this?

Edit
Sample File (from comment below)

 BOF XXXXXXX XXX LOT SELECTIONSVS3CR DATA OF 01/17/2012 A3CR5JQ 5JQ0051201166764100120113 A1R AOMS 01R 20120113201201132012011320120119 000000000020000000000000000000010521
T3CR5JQ 5JQ0051201166764100120113 A1R N 100112 20100112 000000000020000000 000000000000008006
A3CR5JQ 5JQ0051381166764100120113 A2R AOMS 01R 20120113201201132012011320120119 000000000020000000000000000000010521
EOF XXXXXXX XXX LOT SELECTIONSSC3CR DATA OF 01/17/2012Total Detail Records: 03
more ▼

asked Jan 17 '12 at 07:56 AM in Default

AbhiD gravatar image

AbhiD
171 12 15 17

let me correct myself .. everything will be based on the number of characters .. 700 characters for each component.
Jan 17 '12 at 08:47 AM AbhiD
(comments are locked)
10|1200 characters needed characters left

4 answers: sort voted first
I'd use SSIS. You can read read the file in, put in conditional logic, pretty much everything you'd need to make this work. It should be a lot easier than trying to do it all in TSQL too.
more ▼

answered Jan 17 '12 at 11:00 AM

Grant Fritchey gravatar image

Grant Fritchey ♦♦
90.7k 19 21 74

hey Grant, but how will i break the file to separate the header, data and footer ... and eventhough if i have them separated then how should i decode the file ... i think i should develop a file parser or so .... please advice me how shd i go abt it
Jan 18 '12 at 07:27 AM AbhiD
(comments are locked)
10|1200 characters needed characters left

I agree with @Grant Fritchey to use SSIS.

However using SSIS can be also tricky, if the same file can come with different header/footer and have different internal structure as SSIS requires that the structures of the sources and destinations are constant.

However with right conditional logic and separate data flows and/or packages for different file structure and a control mechanism to launch the right one for particular structure the SSIS way will be probably the easiest one to maintain.
more ▼

answered Jan 17 '12 at 12:56 PM

Pavel Pawlowski gravatar image

Pavel Pawlowski
22.2k 8 11 21

hey Pavel, can u suggest me something by which i can put the whole internal structure of file into SSIS and then decode my file based on that.....
Jan 18 '12 at 07:29 AM AbhiD

Based on sample.. I suggest to develop a single package per file structure and a master package which will decide what child package to execute based on the header row.

For the determination a script task can be used to open the flat file, read first row to determine file type. According the file type appropriate child package for processing can be executed.

In the child package there could be in general two possibilities.

1) Use only a flat file connection, which skipping first row and defining all the columns for particular file structure with eventual error rows redirection - the footer row should be redirected to error.

2) Use flat file connection, which will skip first header row. Then define a single column which will contain whole row. After that a script component can be used for particular custom splitting. This splitting can be handled by eg. Regular Expression or whatever method necessary to process the row correctly. When a footer row comes to the script component it will be simply ignored by the code.
Jan 18 '12 at 12:29 PM Pavel Pawlowski
thanks pavel .... i will start working on these lines ..lets c how it goes ...
Jan 19 '12 at 07:04 AM AbhiD
(comments are locked)
10|1200 characters needed characters left
can you post any part of the file? It is possible that LogParser will be able to help you with this situation but I would need to see the header and the first few lines to confirm.
more ▼

answered Jan 17 '12 at 03:17 PM

Fatherjack gravatar image

Fatherjack ♦♦
41.2k 72 77 107

hi all, this the sample file ....all the X's and rest will be constant in header n footer but every time the date will be different .... each record begins with a A or C ....

BOF      XXXXXXX XXX LOT SELECTIONSVS3CR DATA OF  01/17/2012 
A3CR5JQ   5JQ0051201166764100120113                A1R                                              AOMS       01R                                                      20120113201201132012011320120119                    000000000020000000000000000000010521                                                                                                                                                                                                                                                                                                                                                                                  
T3CR5JQ   5JQ0051201166764100120113                A1R                                              N                      100112      20100112    000000000020000000                  000000000000008006                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
A3CR5JQ   5JQ0051381166764100120113                A2R                                              AOMS       01R                                                      20120113201201132012011320120119                    000000000020000000000000000000010521                                                                                                                                                                                                                                                                                                                                                                                  
EOF      XXXXXXX XXX LOT SELECTIONSSC3CR DATA OF  01/17/2012Total Detail Records: 03 
Jan 18 '12 at 07:02 AM AbhiD
(comments are locked)
10|1200 characters needed characters left

It depends on how much processing needs to be done. For basic things, I concur with Grant that SSIS is the way to go. For more advanced pre-processing I personally think Python or Perl are more malleable, but SSIS can accomplish just about any prepocessing you might wish with the right conditional logic and separting out the flows. I personally find Python far more intuitive for that sort of advanced conditional processing.

[Expanded slightly for clarity]
more ▼

answered Jan 18 '12 at 11:39 AM

TimothyAWiseman gravatar image

TimothyAWiseman
15.5k 19 22 32

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x63
x12
x9

asked: Jan 17 '12 at 07:56 AM

Seen: 1131 times

Last Updated: Jan 18 '12 at 11:56 AM