Recursive hierarchy?

I have a set of data which are directly or indirectly connected, example data set shown below in MT. id e, f, g, h, i, and j are connected through i since they are all in the subgroup 2 and/or 3; a, b, c, d and k are connected since they are in either subgroup 1 or/and 2. Because ids a to k are all directly or indirectly connected to each other, these need to be in one group A. Id i in subgroup 4 and 5, so i is connected to m and n in group 5, should be in a new group B, and so on. Recursive CTE might be a solution, but I couldn't get my head around it. I'd really appreciate if someone here can help me with this. Thank you.

 Drop table MT
 create table MT ( id nvarchar(10), subgroup int)
 insert into MT Values('a', 1)
 insert into MT Values('b', 1)
 insert into MT Values('c', 1)
 insert into MT Values('d', 1)
 insert into MT Values('e', 2)
 insert into MT Values('f', 2)
 insert into MT Values('g', 2)
 insert into MT Values('h', 3)
 insert into MT Values('i', 3)
 insert into MT Values('i', 2)
 insert into MT Values('j', 3)
 insert into MT Values('k', 1)
 insert into MT Values('k', 2)
 insert into MT Values('l', 4)
 insert into MT Values('l', 5)
 insert into MT Values('m', 5)
 insert into MT Values('n', 5)

 I need to have a result as below, grouping to a top level, the rule as I have explained above.
     id    subgroup    supergroup
     a    1    Group1
     b    1    Group1
     c    1    Group1
     d    1    Group1
     e    2    Group1
     f    2    Group1
     g    2    Group1
     h    3    Group1
     i    3    Group1
     i    2    Group1
     j    3    Group1
     k    1    Group1
     k    2    Group1
     l    4    Group2
     l    5    Group2
     m    5    Group2
     n    5    Group2
more ▼

asked Nov 12 at 09:53 PM in Default

avatar image


(comments are locked)
10|1200 characters needed characters left

1 answer: sort voted first

This answer isn't my preferred way of doing this as I've had to resort to a loop, but it might help you, and I'll keep playing to see if there is a better option.

I had to do some pre-processing on the data, so I used CTEs to do this The first CTE links gets all the id-pairs that are related

  id         id
  ---------- ----------
  a          b
  a          c
  a          d
  a          k
  b          a
  b          c
  b          d

from those I can then work out which sub-groups are related, so for example the pair a,k tells me that sub-group 1 is related to sub-group 2, and I get those in CTE distinctlinks.

  subgroup    subgroup
  ----------- -----------
  1           2
  1           3
  2           1
  2           3
  3           1
  3           2
  4           5
  5           4

Then I materialize this data in an ordered fashion into #orderedlinks as I have to loop over this result set to determine the supergroups, which after running the while loop look like

  sg1         sg2         rn                   group
  ----------- ----------- -------------------- -----------
  1           2           1                    1
  1           3           2                    1
  2           1           3                    1
  2           3           4                    1
  3           1           5                    1
  3           2           6                    1
  4           5           7                    2
  5           4           8                    2

which is then simply joined back onto the original data to determine a supergroupid

  id         subgroup    SuperGroupID
  ---------- ----------- ------------
  a          1           1
  b          1           1
  c          1           1
  d          1           1
  e          2           1
  f          2           1
  g          2           1
  h          3           1
  i          3           1
  i          2           1
  j          3           1
  k          1           1
  k          2           1
  l          4           2
  l          5           2
  m          5           2
  n          5           2

The full code:

  if object_id('tempdb..#orderedlinks') is not null drop table #orderedlinks;
  with links(id1, id2) as (
  select MT1.id, a.id
  from MT MT1
  cross apply (select MT2.id, MT2.subgroup from MT MT2 
              where MT1.id <> MT2.id 
              and MT1.subgroup = MT2.subgroup
              ) a
  , distinctlinks (sg1, sg2) as (
  select distinct
  from links
  join MT MT1 on links.id1 = MT1.id
  join MT MT2 on links.id2 = MT2.id
      MT1.subgroup <> MT2.subgroup
  select sg1, sg2, row_number()over(order by sg1, sg2) as rn, 0 as [group]
  into #orderedlinks
  from distinctlinks;
  while exists (select top 1 rn from #orderedlinks where [group] = 0)
       update o
           set [group] = 
              when (select min([group]) from #orderedlinks o2 where [group] > 0 and (o.sg1 = o2.sg1 or o.sg1 = o2.sg2)) is not null
              then (select min([group]) from #orderedlinks o2 where [group] > 0 and (o.sg1 = o2.sg1 or o.sg1 = o2.sg2))
                  else (select max([group]) from #orderedlinks)+1
       from #orderedlinks o
       where rn in (select top 1 rn from #orderedlinks  where [group] = 0 order by rn);
   from MT
   cross apply (select min([group]) as SuperGroupID 
                        from #orderedlinks 
                        where MT.subgroup = #orderedlinks.sg1 or MT.subgroup = #orderedlinks.sg1) SuperGroups;
more ▼

answered Nov 13 at 02:46 PM

avatar image

Kev Riley ♦♦
66.1k 48 63 81

Thank you so much for the quick response. It is working for the sample data. My MT table is big, the #orderedlinks is more than 2 millions. I applied your code, it's been over 1 hour and it is still in the while begin... end.

Nov 13 at 05:08 PM Jason111

ouch.... yeah, row-by-row processing sucks! There may be some improvements that can be made to remove the links that don't cross sub-groups, but ultimately a set-based approach would be better. I'll keep trying....

Nov 13 at 07:47 PM Kev Riley ♦♦
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.



asked: Nov 12 at 09:53 PM

Seen: 47 times

Last Updated: Nov 13 at 07:47 PM

Copyright 2017 Redgate Software. Privacy Policy