I have a set of data which are directly or indirectly connected, example data set shown below in MT. id e, f, g, h, i, and j are connected through i since they are all in the subgroup 2 and/or 3; a, b, c, d and k are connected since they are in either subgroup 1 or/and 2. Because ids a to k are all directly or indirectly connected to each other, these need to be in one group A. Id i in subgroup 4 and 5, so i is connected to m and n in group 5, should be in a new group B, and so on. Recursive CTE might be a solution, but I couldn't get my head around it. I'd really appreciate if someone here can help me with this. Thank you.
I'm posting this as a separate answer as it's entirely unrelated.
There was a timely publication of an article by Itzik Ben-Gan (the man who excels at these kinds of puzzles) that presented a T-SQL solution to the 'transitive closure' problem - which is essentially what this problem here is. I urge you to read the article: http://www.itprotoday.com/microsoft-sql-server/t-sql-puzzle-challenge-grouping-connected-items
So now we can take Itzik's solution and apply it to your data.
The first thing we need to do is generate the data in a way that fits into the solution - so we need a table of edges. This is easily constructed as
and gives us
which represents enough necessary edges that could be constructed between ids in the same subgroups, but without addition of noise that could be introduced by adding all possible edges - this is done by limiting the edges so that the first node is less than the second (Itzik talks about why that is important in the article).
We then need to tweak Itzik's code as he was using integer datatypes for his ids, and you have nvarchar(10), but we can still use the same ideas.
After that we have the output as
Now we need to slightly amend the output to get your desired output - so use a partitioned
Hopefully this performs well enough on your large MT table
This answer isn't my preferred way of doing this as I've had to resort to a loop, but it might help you, and I'll keep playing to see if there is a better option.
I had to do some pre-processing on the data, so I used CTEs to do this
The first CTE
from those I can then work out which sub-groups are related, so for example the pair a,k tells me that sub-group 1 is related to sub-group 2, and I get those in CTE
Then I materialize this data in an ordered fashion into
which is then simply joined back onto the original data to determine a supergroupid
The full code:
answered Nov 13, 2017 at 02:46 PM
Kev Riley ♦♦