question

AbhiD avatar image
AbhiD asked

File parser to retrieve a txt file.

Hi Guys, I have a file say sample.txt or a .csv file and this file contains a header, data and a footer. Based on the information in header i need to put the data into the database. In other words i need to extract the data based on the header bytes characters, remove the the footer of the file and put it in the Database. FYI, everything will be a single string of data and the calculation of header, data and footer should be done on basis of bytes characters ... for example ...the total bytes characters for header or footer component is 700. Size of data component will be variable. Which is the best approach for this? **Edit**
Sample File (from comment below)

BOF      XXXXXXX XXX LOT SELECTIONSVS3CR DATA OF  01/17/2012 
A3CR5JQ   5JQ0051201166764100120113                A1R                                              AOMS       01R                                                      20120113201201132012011320120119                    000000000020000000000000000000010521                                                                                                                                                                                                                                                                                                                                                                                  
T3CR5JQ   5JQ0051201166764100120113                A1R                                              N                      100112      20100112    000000000020000000                  000000000000008006                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
A3CR5JQ   5JQ0051381166764100120113                A2R                                              AOMS       01R                                                      20120113201201132012011320120119                    000000000020000000000000000000010521                                                                                                                                                                                                                                                                                                                                                                                  
EOF      XXXXXXX XXX LOT SELECTIONSSC3CR DATA OF  01/17/2012Total Detail Records: 03

import-dataflat-filerecommendation
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

AbhiD avatar image AbhiD commented ·
let me correct myself .. everything will be based on the number of characters .. 700 characters for each component.
0 Likes 0 ·
Grant Fritchey avatar image
Grant Fritchey answered
I'd use SSIS. You can read read the file in, put in conditional logic, pretty much everything you'd need to make this work. It should be a lot easier than trying to do it all in TSQL too.
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

AbhiD avatar image AbhiD commented ·
hey Grant, but how will i break the file to separate the header, data and footer ... and eventhough if i have them separated then how should i decode the file ... i think i should develop a file parser or so .... please advice me how shd i go abt it
0 Likes 0 ·
Pavel Pawlowski avatar image
Pavel Pawlowski answered
I agree with @Grant Fritchey to use SSIS. However using SSIS can be also tricky, if the same file can come with different header/footer and have different internal structure as SSIS requires that the structures of the sources and destinations are constant. However with right conditional logic and separate data flows and/or packages for different file structure and a control mechanism to launch the right one for particular structure the SSIS way will be probably the easiest one to maintain.
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

AbhiD avatar image AbhiD commented ·
hey Pavel, can u suggest me something by which i can put the whole internal structure of file into SSIS and then decode my file based on that.....
0 Likes 0 ·
Pavel Pawlowski avatar image Pavel Pawlowski commented ·
Based on sample.. I suggest to develop a single package per file structure and a master package which will decide what child package to execute based on the header row. For the determination a script task can be used to open the flat file, read first row to determine file type. According the file type appropriate child package for processing can be executed. **In the child package there could be in general two possibilities.** 1) Use only a flat file connection, which skipping first row and defining all the columns for particular file structure with eventual error rows redirection - the footer row should be redirected to error. 2) Use flat file connection, which will skip first header row. Then define a single column which will contain whole row. After that a script component can be used for particular custom splitting. This splitting can be handled by eg. Regular Expression or whatever method necessary to process the row correctly. When a footer row comes to the script component it will be simply ignored by the code.
0 Likes 0 ·
AbhiD avatar image AbhiD commented ·
thanks pavel .... i will start working on these lines ..lets c how it goes ...
0 Likes 0 ·
Fatherjack avatar image
Fatherjack answered
can you post any part of the file? It is possible that LogParser will be able to help you with this situation but I would need to see the header and the first few lines to confirm.
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

AbhiD avatar image AbhiD commented ·
hi all, this the sample file ....all the X's and rest will be constant in header n footer but every time the date will be different .... each record begins with a A or C .... BOF XXXXXXX XXX LOT SELECTIONSVS3CR DATA OF 01/17/2012 A3CR5JQ 5JQ0051201166764100120113 A1R AOMS 01R 20120113201201132012011320120119 000000000020000000000000000000010521 T3CR5JQ 5JQ0051201166764100120113 A1R N 100112 20100112 000000000020000000 000000000000008006 A3CR5JQ 5JQ0051381166764100120113 A2R AOMS 01R 20120113201201132012011320120119 000000000020000000000000000000010521 EOF XXXXXXX XXX LOT SELECTIONSSC3CR DATA OF 01/17/2012Total Detail Records: 03
0 Likes 0 ·
TimothyAWiseman avatar image
TimothyAWiseman answered
It depends on how much processing needs to be done. For basic things, I concur with Grant that SSIS is the way to go. For more advanced pre-processing I personally think Python or Perl are more malleable, but SSIS can accomplish just about any prepocessing you might wish with the right conditional logic and separting out the flows. I personally find Python far more intuitive for that sort of advanced conditional processing. [Expanded slightly for clarity]
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.