I have a set of data which are directly or indirectly connected, example data set shown below in MT. id e, f, g, h, i, and j are connected through i since they are all in the subgroup 2 and/or 3; a, b, c, d and k are connected since they are in either subgroup 1 or/and 2. Because ids a to k are all directly or indirectly connected to each other, these need to be in one group A. Id i in subgroup 4 and 5, so i is connected to m and n in group 5, should be in a new group B, and so on. Recursive CTE might be a solution, but I couldn't get my head around it. I'd really appreciate if someone here can help me with this. Thank you.
(comments are locked)

I'm posting this as a separate answer as it's entirely unrelated. There was a timely publication of an article by Itzik BenGan (the man who excels at these kinds of puzzles) that presented a TSQL solution to the 'transitive closure' problem  which is essentially what this problem here is. I urge you to read the article: http://www.itprotoday.com/microsoftsqlserver/tsqlpuzzlechallengegroupingconnecteditems So now we can take Itzik's solution and apply it to your data. The first thing we need to do is generate the data in a way that fits into the solution  so we need a table of edges. This is easily constructed as
and gives us
which represents enough necessary edges that could be constructed between ids in the same subgroups, but without addition of noise that could be introduced by adding all possible edges  this is done by limiting the edges so that the first node is less than the second (Itzik talks about why that is important in the article). We then need to tweak Itzik's code as he was using integer datatypes for his ids, and you have nvarchar(10), but we can still use the same ideas. After that we have the output as
Now we need to slightly amend the output to get your desired output  so use a partitioned
Hopefully this performs well enough on your large MT table full code
(comments are locked)

This answer isn't my preferred way of doing this as I've had to resort to a loop, but it might help you, and I'll keep playing to see if there is a better option. I had to do some preprocessing on the data, so I used CTEs to do this
The first CTE
from those I can then work out which subgroups are related, so for example the pair a,k tells me that subgroup 1 is related to subgroup 2, and I get those in CTE
Then I materialize this data in an ordered fashion into
which is then simply joined back onto the original data to determine a supergroupid
The full code:
Thank you so much for the quick response. It is working for the sample data. My MT table is big, the #orderedlinks is more than 2 millions. I applied your code, it's been over 1 hour and it is still in the while begin... end.
Nov 13, 2017 at 05:08 PM
Jason111
ouch.... yeah, rowbyrow processing sucks! There may be some improvements that can be made to remove the links that don't cross subgroups, but ultimately a setbased approach would be better. I'll keep trying....
Nov 13, 2017 at 07:47 PM
Kev Riley ♦♦
(comments are locked)
