question

tredd68 avatar image
tredd68 asked

Split comma delimited string into separate columns

I have a file that was sent to me with the data separated by commas in 1 field. I want to separate the data into separate columns. Ex: field contains: Nissan Dealer, 123 Main St, Chicago, IL, 76533 I have created col1, col2, col3, col4, col5. I want Nissan Dealer to go to col1, 123 Main St to col2, Chicago to col3, IL to col4 and 76533 to col5. I have a parse string that will split but you have to select which element you want. It would be nicer if I could make it build the columns on the row as it parsed or split them instead of my having to alter the table to add all of the columns or actually build a separate table with them parsed. Thx
sql-server-2005t-sqlstring
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Tim avatar image
Tim answered
If you have BIDS installed you can easily import this file with an SSIS project. You could also use BCP to do it. With SSIS, you just create your file import, chose your destination (SQL TABLE) and possibly have to do a data conversion task. You can chose your mapping of source to destination and then schedule the SSIS package as a SQL job if this will be an ongoing task you need to perform. If you have used DTS in SQL 2000, it is much the same with SSIS. There are many options to you. SQLCMD, BCP, SSIS, DTS, Powershell, etc. If it is a one time think you could also use the import wizard with SSMS. You could also use the Import Wizard and save the import project as SSIS if you don't want the hassle of doing it from scratch with BIDS. (Business Intelligence Design Studio)
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image
Oleg answered
Check out [this question][1]. One of the answers includes the link to [Jeff Moden article][2] about splitting delimited values with T-SQL. Oleg [1]: http://ask.sqlservercentral.com/questions/30459/split-string-in-multiple-columns [2]: http://www.sqlservercentral.com/articles/T-SQL/62867/
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@tredd68 Yes, there is. Here is the [link to the thread][1] where the split functions are discussed (Jeff's DelimitedSplit8K, @Pavel Pawlowski's CLR and mine using xml). Both Pavel and Jeff posted extensive performance tests. The discussion spans over 5 pages. Please check it out. [1]: http://www.sqlservercentral.com/Forums/Topic943562-338-2.aspx
3 Likes 3 ·
TimothyAWiseman avatar image
TimothyAWiseman answered
I think Oleg probably has the reference for the best answer for you, but I always like to have options, so here is an implementation using XML. create table testTable3 (id int, col1 varchar(512)) insert into testTable3 values (1, 'Nissan Dealer, 123 Main St, Chicago, IL, 76533') insert into testTable3 values (2, 'some place, 1234 Somewhere St, Stillater, OK, 123456') ; with cte1 as ( select t2.id, t3.split, ROW_NUMBER() over (partition by t2.id order by t2.id) as num from ( select *, CAST(' '+replace(t.col1,',',' ')+' ' as XML) as xmlfilter from testTable3 t ) t2 cross apply ( select col1data.D.value('.','varchar(50)') as split from t2.xmlfilter.nodes('X') as col1data(D)) t3 ) select id, [1], [2], [3], [4], [5] from ( select id, num, split from CTE1 ) as sourcetable pivot (max(split) for num IN ([1], [2], [3], [4], [5])) as pivottable there is a discussion about techniques like this at [ http://www.sqlservercentral.com/articles/XML/66932/][1] [1]: http://www.sqlservercentral.com/articles/XML/66932/
12 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Fatherjack avatar image Fatherjack ♦♦ commented ·
Hey Timothy, congrats on the 10k :) Its great having your answers on this forum, thanks for spending some of your time with us.
2 Likes 2 ·
Oleg avatar image Oleg commented ·
@tredd68 You just need to ***entitize*** the xml unfriendly characters before casting them to xml. For example, take Timothy's answer and try to run is as is but change one of the values to become unfriendly (make **Nissan Dealer** something like **Nissan Dealer <<&**). This will make the query fail. Now, restate the part inside of the CTE definition to read this:
select 
    t2.id,
    t3.split,
    row_number() over (partition by t2.id order by t2.id) as num
    from 
    (
        select 
            id, cast('' + replace(
            xmlfilter, ',', '') + '' as xml) xmlfilter  
            from #testTable3 cross apply 
            (
                select '' + col1 for xml path('')
            ) t(xmlfilter)
    ) t2
    cross apply
    (
        select col1data.D.value('.','varchar(50)') as split
        from t2.xmlfilter.nodes('x') as col1data(D)
    ) t3
and it will address the issue you have. The idea is to select the col1 values for xml path forcing the engine to entitize unfriendly characters BEFORE adding X nodes part and casting it as xml :)
1 Like 1 ·
Oleg avatar image Oleg commented ·
@tredd68 The part **select '' + col1 for xml path('')** deserves a little explanation. The dummy '' is the key here. Without it, selecting data for xml path will add the to the left of the value and close this node at the end. Adding an emply string to the left forces the column to become nameless so the engine will not add the col1 node. Because the columns cannot be referenced if they are nameless, the name is given to the column AFTER the selection for xml path has already taken place. This is why the **t(xmlfilter)** is present.
1 Like 1 ·
Oleg avatar image Oleg commented ·
@TimothyAWiseman Adding the xml path (like in my comment above) could be a better option though because it will handle all xml unfriendly characters, not just the ampersand. Whatever needs to be entitized will be entitized on the way in and un-entitized on the way out (in the bottom cross apply). In fact, you can even have the column values including itself and any other nodes and still splitting correctly by adding **xml path** part. No way this can be done by attempting to replace all the possible unfriendly combinations with replace function. Congratulations on moving to the 5-digits Timothy! Sorry it is 145 points too late. I have learned a lot from your answers on this site as I am sure did many others.
1 Like 1 ·
Oleg avatar image Oleg commented ·
@tredd68 Here is Timothy's answer restated to handle unfriendly characters. Please note the value of the first record which forces the original script in the answer to break.
create table #temp (id int, col1 varchar(512));

insert into #temp
select 1, 'Nissan Dealer&>>>&&, 123 Main St, Chicago, IL, 76533'
union all
select 2, 'some place, 1234 Somewhere St, Stillater, OK, 123456';

;with cte as
(
select 
    t2.id,
    t3.split,
    row_number() over (
        partition by t2.id order by t2.id) as num
    from 
    (
        select 
            id, cast('' + replace(
            xmlfilter, ', ', '') + '' as xml) xmlfilter  
            from #temp cross apply 
            (
                select '' + col1 for xml path('')
            ) t(xmlfilter)
    ) t2
    cross apply
    (
        select item.value('.','varchar(50)') as split
        from t2.xmlfilter.nodes('x') as R(item)
    ) t3
)
    select id, [1], [2], [3], [4], [5]
        from 
        (
            select id, num, split
            from cte
        ) as src
        pivot
        (max(split) for num in ([1], [2], [3], [4], [5])) as pvt
    
drop table #temp;
1 Like 1 ·
Show more comments
Jeff Moden avatar image
Jeff Moden answered
Just thought I'd let you all know... I've solved the problem with longer strings and wider elements that Tally Table splitters used to suffer with. I'm in the process of writing an article about it for SQLServerCentral.com.
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kevin Feasel avatar image Kevin Feasel commented ·
I believe this is worth a "Huzzah!"
1 Like 1 ·
bibinsami avatar image
bibinsami answered
create function dbo.GetDomain( @source varchar(1024), @delimiter varchar(10), @domain int ) returns varchar(1024) as begin declare @returnValue varchar(1024) declare @workingOn int declare @length int set @workingOn=0 while @workingOn<@domain begin set @source=substring(@source,charindex(@delimiter,@source)+1,1024) set @workingOn+=1 end set @length=charindex(@delimiter,@source) set @returnValue=substring(@source,1,case when @length=0 then 1024 else @length-1 end) return @returnValue end go select t.col1, dbo.GetDomain(t.col1,',',0) as col2, dbo.GetDomain(t.col1,',',1) as col3, dbo.GetDomain(t.col1,',',2) as col4 from #test t go
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.