question

abhi2702 avatar image
abhi2702 asked

Parsing Non-Standard Text File using SSIS script component.

I have a ":" delimited text file which is like below Hospital Outpatient Profiles(HOP)(1) Patient Characteristics ---------------------------------------------------------------- Patient Gender(4) Female: 42.9% Male: 57.1% Patient Race(4) White: 85.7% Black/African American: 14.3% Asian or Native Hawaiian: 0.0% American Indian: 0.0% Other: 0.0% Payment Source ---------------------------------------------------------------- Authorization Required For Care(4) Yes: 0.0% No: 57.1% Unknown: 42.9% No Response: 0.0% I want to extract and store results in a table using Script Component task in SSIS Female: 42.9% Male: 57.1% `` White: 85.7% Black/African American: 14.3% Asian or Native Hawaiian: 0.0% American Indian: 0.0% Other: 0.0% and so on... i am using following script but not getting the exact output /* Microsoft SQL Server Integration Services Script Component * Write scripts using Microsoft Visual C# 2008. * ScriptMain is the entry point class of the script.*/ using System; using System.Data; using Microsoft.SqlServer.Dts.Pipeline.Wrapper; using Microsoft.SqlServer.Dts.Runtime.Wrapper; [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute] public class ScriptMain : UserComponent { public override void Input0_ProcessInputRow(Input0Buffer Row) { string columnName; string columnValue; try { // Check for an empty row. if (Row.Column0.Trim().Length > 0) { columnName = Row.Column0.Substring(0, Row.Column0.IndexOf(":")); // Check for an empty value after the colon. if (Row.Column0.Substring(Row.Column0.IndexOf(":")).TrimEnd().Length > 1) // Extract the column value from after the colon and space. { columnValue = Row.Column0.Substring(Row.Column0.IndexOf(":") + 2); switch (columnName) { case "Female": // The FirstName value indicates a new record. this.Output0Buffer.AddRow(); this.Output0Buffer.Female = columnValue; break; case "Male": this.Output0Buffer.Male = columnValue; break; case "White": this.Output0Buffer.White = columnValue; break; } } } } catch (Exception e) { Console.WriteLine(e.ToString()); } } } I am getting this output ... Famale Male White 42.9% Male: 57.1% Patient Race(4) NULL 85.7% Black/African American: 14.3% Asian or Native Hawaiian: 0.0% Thank you all in advance!!!!
ssisc#
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

abhi2702 avatar image abhi2702 commented ·
I know something is wrong with my c# code snippet for script component (SSIS) but the question is what... why it is only working for new lines like Female and White and not for male? Any suggestion or solutions are deeply appreciated ... thank you all in advance...
0 Likes 0 ·
abhi2702 avatar image abhi2702 commented ·
how to delimit at '%' in my case?
0 Likes 0 ·
tomgough79 avatar image
tomgough79 answered
I think part of the problem you are fighting with is that you have both labels and values all dumped into one, so the values you are trying to extract are not really delimited in the normal way. Looking at the extract you have posted, if you ignore line endings, it looks like the structure is something like: I wonder, therefore, if you could use that fact to build up some sort of hash table I'm afraid I can't help you with actual coding, but my thinking is that you could split the file into fields looking like: by breaking the lines out on the %. Each value in the array is now delimited by the colon and 2 spaces, from which you could hopefully create a hash table looking like: Key Value Female 42.9 Male 57.1 etc Or you could try to persuade the software vendor to supply more useful outputs!
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Pavel Pawlowski avatar image
Pavel Pawlowski answered
I would go using a Regular expression: **`(^.+?|(?<=\%\s*?).+?)\:(?:\s*)(.+?\%) The above Regular expression will get you KeyValue pairs in numbered groups: Match Number Match Text Group 1 Group 2 0 Female: 42.9% Female 42.9% 1 Male: 57.1% Male 57.1% 2 White: 85.7% White 85.7% 3 Black/African American: 14.3% Black/African American 14.3% 4 Asian or Native Hawaiian: 0.0% Asian or Native Hawaiian 0.0% 5 American Indian: 0.0% American Indian 0.0% 6 Other: 0.0% Other 0.0% 7 Yes: 0.0% Yes 0.0% 8 No: 57.1% No 57.1% 9 Unknown: 42.9% Unknown 42.9% 10 No Response: 0.0% No Response 0.0% If a different groups are needed, you can achieve this by slightly modifying the RegEx. To use it: public static Regex regex = new Regex( "(^.+?|(?
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.