question

Jos Menhart avatar image
Jos Menhart asked

Parse custom formatted string into columns using property names from string

Hi All, To show my company's management the performance SSIS is capable of, I want to change an existing import procedure. This import procedure needs to read lines in the following format: (example:) LINEID=1|RECORDTYPE=3|PROPERTYA=1234|PROPERTYB=1234|PROPERTYC=A0441234 There are over a hundred different properties in a single line - many of which can be NULL depending on the RECORDTYPE property - all with their own names and datatypes. I set out to use a script component, but I really do not want to manually type in things like (pseudo code:) "Row.columnName = string.GetProperty(string, "LINE_ID");" a hundred times. Has anyone dealt with a similar problem before? I would like to be able to dynamically assign the values to their corresponding columns according to the property names, so that the LINEID property ends up in a column named LineId. Can I use a script component to do this, or should I learn how to write a custom SSIS component? Or is it even possible in SSIS and might I be better off when I just suck it up and start typing these 100+ get/set commands?
ssisscriptstring
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Hi Daniel and Pavel, thanks for your answers! They were both very educational, as I'm a real novice to C#. I will try both and see what will work best for the data I'm receiving. Kind Regards, Jos
0 Likes 0 ·
Daniel Ross avatar image
Daniel Ross answered
Hi Josbourne, I'm not sure if this will be dynamic enough for you, but here goes; Add a dataflow task to the control flow, and add a flat file connection manager and link it to your flat file. specify a delimiter like ~ so there is only 1 column, (you need to set the delimiter to something that will NEVER occur) so now you should have only one output column from the Flat file task, Next add a script task to the data flow, and connect from the flat file source. Tick the input Column 0 from the input colums tab, and in the Input and Outputs tab click on Add Output and then add the columns as required (This is the painful part, make sure the data type is not numeric for string columns). Once this has been done, click on script tab and click design script and use the code below; Dim strSplit As String() Dim dl As New Hashtable strSplit = Row.Column0.Split("|") For Each strName As String In strSplit MsgBox(strName.Substring(0, strName.IndexOf("=")) & ", " & strName.Substring(strName.IndexOf("=")+1)) dl.Add(strName.Substring(0, strName.IndexOf("=")), strName.Substring(strName.IndexOf("=") + 1)) Next With newOutputBuffer .AddRow() .LINEID = dl.Item("LINEID") .PROPERTYA = dl.Item("PROPERTYA") .PROPERTYC = dl.Item("PROPERTYC") .PROPERTYB = dl.Item("PROPERTYB") .RECORDTYPE = dl.Item("RECORDTYPE") 'ETC, ETC End With you can see that it splits the row into an array, and then splits the string again into the property name and value into a hashtable, then you add the values from the hashtable to the output. The below input rows produce the following output rows LINEID=1|RECORDTYPE=3|PROPERTYA=1234|PROPERTYB=1234|PROPERTYC=A0441234 LINEID=2|RECORDTYPE=2 LINEID PROPERTYC PROPERTYB PROPERTYA RECORDTYPE 2 NULL NULL NULL 2 1 A0441234 1234 1234 3
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

+1 @Daniel, you do not need to specify a delimiter which will not occur in the source file. You can simply set format of the flat file connection to **`RaggedRights`** and you have a single column flat file. :-)
2 Likes 2 ·
Pavel Pawlowski avatar image
Pavel Pawlowski answered
You can use the solution Daniel presented, but for each column, you have to modify the Output columns of the Script component and also modify the the script itself by assigning value of the column to the output. To avoid this, you can use the Script component which will produce a record for each column with it's name and value and connect it to the pivot component, which will make the pivoting. So first you crate a flat file source with one Data column. In the flat file connection manager you do not need to specify an non existing character as a column delimiter as you can simply set the format to the **`RaggedRight`** and by setting this you will receive a single column, if you do not specify additional. this connect to a script component and create 3 output columns (RowID {GUID}, ColName {string}, ColValue {string}). In the properties of Output select "None" for SynchronousID property. as we will create multiple output rows for single input row. Then edit a scrip and modify following method: (code is for SQL Server 2008) public override void Input0_ProcessInputRow(Input0Buffer Row) { if (!Row.Data_IsNull) { Regex r = new Regex(@"(?.+?)=(?.+?)(?:\||$|\n|\r)", RegexOptions.Compiled); Guid rowId = Guid.NewGuid(); foreach (Match m in r.Matches(Row.Data)) { this.Output0Buffer.AddRow(); this.Output0Buffer.RowID = rowId; this.Output0Buffer.ColName = m.Groups["ColName"].Value; this.Output0Buffer.ColValue = m.Groups["ColValue"].Value; } } } The code is in C#, but it's not a problem to rewrite it to VB if necessary. I'm using here a Regex to split the source rows. Guid rowID will help us to group the records in the PIVOT. That's all for the script. Connect the script with the PIVOT component and in the pivot add all the 3 input columns, and in the input output properties set the **`PivotUsage`** property. The RowID will have PivotUsage = 1 (it means it will be a key for grouping the records. ColName will have PivotUsage = 2 (Column names) and ColValue will have PivotUsage = 3 (values of the columns. Now add OUTPUT columns to the output. Name them as necessary and specify **`PivotKeyValue`** for easch. The PivotKey value mas correspond to the fieldName in the source flat file. Also specify the SourceColumn. It will be lineageID of the ColValue field in the source columns for all output fields except RowID output field. For this the PivotKeyValue is not necessary and SourceColumn will be lineage of the RowID source column. After this, you have a complete solution. If there will be a new column, you simply add it to the PIVOT output and specify only a PivotKeyValue for it and LineageID of the ColValue input field.
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

nice, I'll bookmark this for the future
0 Likes 0 ·
Jos Menhart avatar image
Jos Menhart answered
I decided to go with a different version of Daniel's first answer as regex imposed a significant performance overhead compared to using arrays for string split operations. My final implementation looks like this: public override void Input0_ProcessInputRow(Input0Buffer Row) { /* All data is coming in as one maxed out string (4000 chars) */ string rowStr = Row.RawEDR; /*Create an array of keyvalue pairs like [keyName=value]*/ string[] arr1 = rowStr.Split('|'); /* Add a GUID for use by the pivot * component later on in the data flow*/ Guid CdrId = Guid.NewGuid(); for (int i = 0; i < arr1.Length; i++) { /* For every keyValue pair create an output row * (script component is set to asynchronous output)*/ String[] arrKeyValue = arr1[i].Split('='); this.Output0Buffer.AddRow(); this.Output0Buffer.CdrId = CdrId; this.Output0Buffer.ColName = arrKeyValue[0]; this.Output0Buffer.ASB = Row.ASB; /* * Do some business logic to have output values match * the status quo in the current process * (all blank or not available values should be set * to null to have a consistent output for 'unknown') */ if (arrKeyValue[1] != "Not Available" && arrKeyValue[1] != "") { this.Output0Buffer.ColValue = arrKeyValue[1]; } else { this.Output0Buffer.ColValue = null; } } } Thanks again for your answers!
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.