question

ChintanCG avatar image
ChintanCG asked

Parsing multiple columns at once

Hi, I have a table where the data has been loaded in this format (I know it should have been parsed and normalised at source but its not possible to change now since there are lot of interdependencies) I want to write a function (or sp) which can parse this. DECLARE @T TABLE ( CONTINENTID INT, COUNTRIES VARCHAR(255), CITIES VARCHAR(255) ) INSERT INTO @T ( CONTINENTID, COUNTRIES, CITIES ) VALUES ( 1, -- ID - int 'INDIA,CHINA,BANGLADESH', -- COUNTRIES - varchar(255) 'MUMBAI,DELHI|BEIJING,SHANGHAI|DHAKA' -- CITIES - varchar(255) ) INSERT INTO @T ( CONTINENTID, COUNTRIES, CITIES ) VALUES ( 2, -- ID - int 'SPAIN,UK', -- COUNTRIES - varchar(255) 'BARCELONA|LONDON,MANCHESTER' -- CITIES - varchar(255) ) SELECT * FROM @T Expecting the output in this format CONTINENTID COUNTRIES CITIES 1 INDIA MUMBAI 1 INDIA DELHI 1 CHINA BEIJING 1 CHINA SHANGHAI 1 BANGLADESH DHAKA 2 SPAIN BARCELONA 2 UK LONDON 2 UK MANCHESTER
sqlparsingparse
1 comment
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

What version of SQL Server are you running?
0 Likes 0 ·
KenJ avatar image
KenJ answered
You're stuck with an interesting problem to solve. You not only get to split countries on the comma character, you get to split the cities column twice; once on the pipe and another time on the comma. You'll also have to track the ordinal position of each country and group of cities so you can tie the cities back to the countries when you're done (at least your sample makes it appear that the cities string is always ordered by country. there can be no gaps or mistakes or you can't tie them back together) To get you started, there are some good string splitting resources listed in this question that was previously answered on the site: [ https://ask.sqlservercentral.com/questions/122024/how-do-i-split-pipe-delimited-values-into-separate.html][1] As a quick shortcut, here is a well-regarded split routine - [ http://www.sqlservercentral.com/articles/T-SQL/62867/][2] [1]: https://ask.sqlservercentral.com/questions/122024/how-do-i-split-pipe-delimited-values-into-separate.html [2]: http://www.sqlservercentral.com/articles/T-SQL/62867/
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kev Riley avatar image
Kev Riley answered
Here's an example of how do this. I'll walk you through it. As @KenJ has said you need to split mutiple times over different data sets using different delimiters. First we need to split the countries - I'll assume you haven't got a numbers (or tally) table so will create one on the fly. If you look at the links @KenJ has provided, I'm sure they will talk about using one and you might find it very beneficial, not only for this task, to create one somewhere. DECLARE @T TABLE ( CONTINENTID INT, COUNTRIES VARCHAR(255), CITIES VARCHAR(255) ) INSERT INTO @T ( CONTINENTID, COUNTRIES, CITIES ) VALUES ( 1, -- ID - int 'INDIA,CHINA,BANGLADESH', -- COUNTRIES - varchar(255) 'MUMBAI,DELHI|BEIJING,SHANGHAI|DHAKA' -- CITIES - varchar(255) ) INSERT INTO @T ( CONTINENTID, COUNTRIES, CITIES ) VALUES ( 2, -- ID - int 'SPAIN,UK', -- COUNTRIES - varchar(255) 'BARCELONA|LONDON,MANCHESTER' -- CITIES - varchar(255) ) --assume you haven't got a numbers table, so will create one on the fly as a cte ;with cte_numbers as (select top 10000 ROW_NUMBER() over(order by sc1.object_id) N from sys.columns sc1, sys.columns sc2 ) --split the countries and number them in order as they are found select T.CONTINENTID, row_number()over(partition by T.CONTINENTID order by N) as CountryIndex, substring(','+T.COUNTRIES+',',N+1,charindex(',',','+T.COUNTRIES+',',N+1)-N-1) as Country from cte_numbers join @T T on N < len(','+T.COUNTRIES+',') and substring(','+T.COUNTRIES+',',N,1) = ',' this will give you CONTINENTID CountryIndex Country ----------- -------------------- ----------------------- 1 1 INDIA 1 2 CHINA 1 3 BANGLADESH 2 1 SPAIN 2 2 UK (5 row(s) affected) Now we can split the cities in the same way. but first we need to break up the grouping that is done by using '|' - AND I'm assuming that the order of the city groups corresponds to the order of the country list! ;with cte_numbers as (select top 10000 ROW_NUMBER() over(order by sc1.object_id) N from sys.columns sc1, sys.columns sc2 ) --split the city groups and number them in order as they are found - as this should correspond to the CountryIndex select T.CONTINENTID, row_number()over(partition by T.CITIES order by N) as AssumedCountryIndex, substring('|'+T.CITIES+'|',N+1,charindex('|','|'+T.CITIES+'|',N+1)-N-1) as CityList from cte_numbers join @T T on N < len('|'+T.CITIES+'|') and substring('|'+T.CITIES+'|',N,1) = '|' which splits the data like this CONTINENTID AssumedCountryIndex CityList ----------- -------------------- ---------------------------------- 2 1 BARCELONA 2 2 LONDON,MANCHESTER 1 1 MUMBAI,DELHI 1 2 BEIJING,SHANGHAI 1 3 DHAKA (5 row(s) affected) Obviously we need to run the split over this data set too ...... ;with cte_numbers as (select top 10000 ROW_NUMBER() over(order by sc1.object_id) N from sys.columns sc1, sys.columns sc2 ), citygroups as ( select T.CONTINENTID, row_number()over(partition by T.CITIES order by N) as AssumedCountryIndex, substring('|'+T.CITIES+'|',N+1,charindex('|','|'+T.CITIES+'|',N+1)-N-1) as CityList from cte_numbers join @T T on N < len('|'+T.CITIES+'|') and substring('|'+T.CITIES+'|',N,1) = '|' ) select CONTINENTID, AssumedCountryIndex, substring(','+citygroups.CityList+',',N+1,charindex(',',','+citygroups.CityList+',',N+1)-N-1) as City from cte_numbers join citygroups on N < len(','+citygroups.CityList+',') and substring(','+citygroups.CityList+',',N,1) = ',' to give us CONTINENTID AssumedCountryIndex City ----------- -------------------- ---------------------- 2 1 BARCELONA 2 2 LONDON 2 2 MANCHESTER 1 1 MUMBAI 1 1 DELHI 1 2 BEIJING 1 2 SHANGHAI 1 3 DHAKA (8 row(s) affected) So now we've seen how to acheive the splitting up of the data, the only thing left is to put these parts together and get the final result DECLARE @T TABLE ( CONTINENTID INT, COUNTRIES VARCHAR(255), CITIES VARCHAR(255) ) INSERT INTO @T ( CONTINENTID, COUNTRIES, CITIES ) VALUES ( 1, -- ID - int 'INDIA,CHINA,BANGLADESH', -- COUNTRIES - varchar(255) 'MUMBAI,DELHI|BEIJING,SHANGHAI|DHAKA' -- CITIES - varchar(255) ) INSERT INTO @T ( CONTINENTID, COUNTRIES, CITIES ) VALUES ( 2, -- ID - int 'SPAIN,UK', -- COUNTRIES - varchar(255) 'BARCELONA|LONDON,MANCHESTER' -- CITIES - varchar(255) ) --assume you haven't got a numbers table, so will create one on the fly as a cte ;with cte_numbers as ( select top 10000 ROW_NUMBER() over(order by sc1.object_id) N from sys.columns sc1, sys.columns sc2 ), Countries as ( --split the countries and number them in order as they are found select T.CONTINENTID, row_number()over(partition by T.CONTINENTID order by N) as CountryIndex, substring(','+T.COUNTRIES+',',N+1,charindex(',',','+T.COUNTRIES+',',N+1)-N-1) as Country from cte_numbers join @T T on N < len(','+T.COUNTRIES+',') and substring(','+T.COUNTRIES+',',N,1) = ',' ), citygroups as ( select T.CONTINENTID, row_number()over(partition by T.CITIES order by N) as AssumedCountryIndex, substring('|'+T.CITIES+'|',N+1,charindex('|','|'+T.CITIES+'|',N+1)-N-1) as CityList from cte_numbers join @T T on N < len('|'+T.CITIES+'|') and substring('|'+T.CITIES+'|',N,1) = '|' ), cities as ( select CONTINENTID, AssumedCountryIndex, substring(','+citygroups.CityList+',',N+1,charindex(',',','+citygroups.CityList+',',N+1)-N-1) as City from cte_numbers join citygroups on N < len(','+citygroups.CityList+',') and substring(','+citygroups.CityList+',',N,1) = ',' ) select Countries.CONTINENTID, Countries.Country, cities.City from Countries join cities on cities.CONTINENTID = Countries.CONTINENTID and cities.AssumedCountryIndex = Countries.CountryIndex order by Countries.CONTINENTID to give us CONTINENTID Country City ----------- --------------- -------------- 1 INDIA MUMBAI 1 CHINA BEIJING 1 BANGLADESH DHAKA 1 INDIA DELHI 1 CHINA SHANGHAI 2 SPAIN BARCELONA 2 UK LONDON 2 UK MANCHESTER (8 row(s) affected)
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.