question

Fatherjack avatar image
Fatherjack asked

Data filtering puzzle

Friday challenge time: My wife asked me this question yesterday (She's a DBA too so don't worry about the content!) and I can't see a neat solution to it. You have a table- IF OBJECT_ID('testtable_x') > 0 TRUNCATE TABLE testtable_x ELSE CREATE TABLE testtable_x ( ID_1 INT, ID_2 INT ) with some data- INSERT INTO [dbo].[testtable_x] ( [ID_1], [ID_2] ) VALUES ( 1, 2 ), ( 1, 3 ), ( 1, 4 ), ( 2, 3 ), ( 3, 3 ), ( 5, 8 ), ( 5, 9 ), ( 2, 1 ), ( 4, 1 ), ( 9, 5 ) The puzzle is to return a dataset that has every row that doesn't have a 'reverse match' and one of each of the pairs of rows that is a 'reverse match'. For example there is a row with values 1,2 and also a row with 2,1. We only want to see the 1,2 OR the 2,1 but not both. Now in essence this isn't to difficult until I hear that the solution cannot use CURSORS (well that's a given anyway), temporary tables or variables. Oh and I expect CLR is out too although it wasn't specified. From the data above the expected results are: 1,2 1,3 1,4 2,3 3,3 5,8 5,9 What solutions does the internet have for me?
t-sqlfriday-challenge
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Whitfield avatar image
Matt Whitfield answered
A quick and dirty go - but wondering if maybe this should be more generic? SELECT [x1].* FROM [dbo].[testtable_x] x1 LEFT OUTER JOIN [dbo].[testtable_x] x2 ON [x2].[ID_2] = [x1].[ID_1] AND [x2].[ID_1] = [x1].[ID_2] AND ([x2].[ID_1] != [x1].[ID_1] OR [x2].[ID_2] != [x2].[ID_2]) WHERE [x2].[ID_1] IS NULL OR [x2].[ID_1] > [x1].[ID_1]
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Whitfield avatar image Matt Whitfield ♦♦ commented ·
@Fatherjack - yeah. The editor is actually really beginning to p*ss me off. Highlight code block, click code button, randomness. Awesome.
1 Like 1 ·
Fatherjack avatar image Fatherjack ♦♦ commented ·
ah, thanks. that's what we were looking for. I was trying to do the != with another join/table combo and gave myself a headache! you have an extra [ID_1] at the end of your answer - the beneficial output of the recently added auto paste functionality in the editor I think..
0 Likes 0 ·
Fatherjack avatar image Fatherjack ♦♦ commented ·
patience young one, you will have your chance to change things soon enough ;)
0 Likes 0 ·
Matt Whitfield avatar image Matt Whitfield ♦♦ commented ·
Either that, or a chance to b*tch :)
0 Likes 0 ·
Scot Hauder avatar image
Scot Hauder answered
...if your data domain doesn't change significantly :)
    SELECT  ID_1, ID_2 FROM(
    SELECT ID_1, ID_2,r = ROW_NUMBER() OVER(PARTITION BY POWER(ID_1,2) + POWER(ID_2,2) ORDER BY (SELECT 1))
    FROM [testtable_x])d
    WHERE r = 1

4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Fatherjack avatar image Fatherjack ♦♦ commented ·
I like this a lot Scott but personally stayed away from it as I was unsure whether maths on the columns should be considered. If the numbers increase too much then arithmetic overflows are going to be encountered. I never actually worked out a solution like this so thanks for providing it. I have no idea if the ID columns are open for use like this within the realms of the problem or not, so I guess its perfectly viable. Thanks again
0 Likes 0 ·
Matt Whitfield avatar image Matt Whitfield ♦♦ commented ·
Ha - awesome. Bit twiddling ftmfw.
0 Likes 0 ·
Scot Hauder avatar image Scot Hauder commented ·
yep, should be easy to break it if the column values are char or too large. If this is not just a toy problem I can look at it more seriously otherwise Matt has a good solution
0 Likes 0 ·
Fatherjack avatar image Fatherjack ♦♦ commented ·
its just a puzzle that I thought would be a good one to share. I think everyone must be at PASS though, judging by the lack of answers.
0 Likes 0 ·
Cyborg avatar image
Cyborg answered


WITH    CTE
          AS ( SELECT   ROW_NUMBER() OVER ( ORDER BY ID_1, ID_2 ) RID , * ,
                        CONVERT(VARCHAR, ID_1) + '-' + CONVERT(VARCHAR, ID_2) NID ,
                        CONVERT(VARCHAR, ID_2) + '-' + CONVERT(VARCHAR, ID_1) RvID
               FROM     TestTable_X
             )
    SELECT  ID_1 , ID_2
    FROM    ( SELECT    ID_1 , ID_2 , CASE WHEN NID NOT IN ( SELECT   RvID
                                                             FROM     CTE C2
                                                             WHERE    C1.RID > C2.RID )
                             THEN 1 ELSE 0 END Flag
             FROM      CTE C1
            ) T
    WHERE   Flag = 1
    ORDER BY ID_1 , ID_2



7 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Whitfield avatar image Matt Whitfield ♦♦ commented ·
Hahaha... My thought process this week is mostly jealousy of those at PASS.
2 Likes 2 ·
Fatherjack avatar image Fatherjack ♦♦ commented ·
my thought process was all cloudy with a distinct lack of rainbows and unicorns. That's why I asked the question :)
1 Like 1 ·
Fatherjack avatar image Fatherjack ♦♦ commented ·
Thanks for this one, its the closest to the script I ended up with. :)
0 Likes 0 ·
Matt Whitfield avatar image Matt Whitfield ♦♦ commented ·
With or without the syntax error? :) But, seriously, is there any benefit in approaching it like this? Because you end up with a much more complex plan... Just wondering what your thinking is...
0 Likes 0 ·
Cyborg avatar image Cyborg commented ·
Sorry matt,fixed the errors
0 Likes 0 ·
Show more comments

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.