question

Mrs_Fatherjack avatar image
Mrs_Fatherjack asked

Removing duplicates from data

I have a script that needs to join to 9 tables, they are all on outer joins as they don't all have matching data and some tables will bring back multiple rows. Ideally I would like my script to bring back only unique fields. To replicate my issue use the following: ** Edited to remove the insert statements *** What I would really like is: FieldA FieldB FieldC 1 1 1 2 2 3 3 2 1 1 2 2 3 3 3 1 1 2 3 3 3 And I'm not sure how to do this, I'm using SQL 2000 otherwise I would have used ROWNUMBER function. Many thanks ---------- Excerpt from real data added: iID FieldA FieldB FieldC 1 900075756 41 454 1 900075756 41 1683 1 900075756 42 454 1 900075756 42 1683 1 900075756 113 454 1 900075756 113 1683 1 900075756 179 454 1 900075756 179 1683 1 900075756 224 454 1 900075756 224 1683 1 900075756 276 454 1 900075756 276 1683 1 900075756 471 454 1 900075756 471 1683 1 900075756 584 454 1 900075756 584 1683 1 900075756 1174 454 1 900075756 1174 1683 The tables join on FieldA, value 900075752 (for this example), and for this example TableA is the primary table to use and left outer join to the others. TableB for this value has: 3 5 7 8 10 11 14 TableC for this value has: 57, 80, 433 481 675 742 1207 When I run the script the data comes back giving all options for the three tables.
sql-server-2000duplicates
12 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kev Riley avatar image Kev Riley ♦♦ commented ·
What's the value in FieldA for the 2nd and 3rd rows?
0 Likes 0 ·
Mrs_Fatherjack avatar image Mrs_Fatherjack commented ·
It's the same as in the first row.
0 Likes 0 ·
ThomasRushton avatar image ThomasRushton ♦♦ commented ·
So you want FieldA to autoincrement whenever fieldb = fieldc = 1? and for it to be blank otherwise? Oh, and what are the ordering / sorting criteria?
0 Likes 0 ·
Mrs_Fatherjack avatar image Mrs_Fatherjack commented ·
I don't think I explained my script very clearly sorry. Basically I have 9 tables that join on 1 field, I've used the outer join as not all the tables will come back with any data. Because some of the tables have many rows for each unique field the data coming back is enormous and repeated and I want to try to just bring back each set of data once. I've added an excerpt from the actual data
0 Likes 0 ·
Kev Riley avatar image Kev Riley ♦♦ commented ·
I think you need to illustrate with different values as I'm getting confused as to how the tables join (or don't)
0 Likes 0 ·
Show more comments
Kev Riley avatar image
Kev Riley answered
As a starter..... select TA1.FieldA ,TB2.FieldB ,TC2.FieldC from #TableA TA1 left join ( select TB1.FieldA, TB1.FieldB, COUNT(*) as Seq from #TableB TB1 join #TableB TB2 on TB1.FieldA = TB2.FieldA and TB1.FieldB >= TB2.FieldB group by TB1.FieldA, TB1.FieldB ) TB2 on TA1.FieldA = TB2.FieldA LEFT JOIN ( select TC1.FieldA, TC1.FieldC, COUNT(*) as Seq from #TableC TC1 join #TableC TC2 on TC1.FieldA = TC2.FieldA and TC1.FieldC >= TC2.FieldC group by TC1.FieldA, TC1.FieldC ) TC2 on TB2.FieldA = TC2.FieldA and TB2.Seq= TC2.seq This pre-2005 solution creates a sequence number (or rank) by joining the table back onto itself and counting the number of rows with a value less than or equal to the current one. ---- Edit: going code-blind now... does this give the right results drop table #tally SELECT TOP 1000 IDENTITY(INT,1,1) AS N INTO #tally FROM Master.dbo.SysColumns sc1, Master.dbo.SysColumns sc2 drop table #tablea drop table #tableb drop table #tablec CREATE TABLE #TableA (iID INT IDENTITY, FieldA INT ) CREATE TABLE #TableB (FieldA int, FieldB int) CREATE Table #TableC (FieldA int, FieldC int) INSERT INTO #TableA ( FieldA ) SELECT 1 union SELECT 2 UNION SELECT 3 INSERT INTO #TableB (FieldA, FieldB) SELECT 1,1 UNION SELECT 1,2 UNION SELECT 1,3 UNION SELECT 2,1 UNION SELECT 2,2 UNION SELECT 2,3 UNION SELECT 3,1 UNION SELECT 3,2 UNION SELECT 3,3 INSERT INTO #TableC (FieldA, FieldC) SELECT 1,1 UNION SELECT 1,2 UNION SELECT 1,3 UNION SELECT 2,1 UNION SELECT 2,2 UNION SELECT 2,3 UNION SELECT 3,1 UNION SELECT 3,2 UNION SELECT 3,3 select TA3.FieldA, TB3.FieldB, TC3.FieldC from #tally t full join #TableA TA3 on ta3.FieldA = ta3.FieldA left join ( select TB1.FieldA, TB1.FieldB, COUNT(*) as Seq from #TableB TB1, #TableB TB2 where TB1.FieldA = TB2.FieldA and TB1.FieldB >= TB2.FieldB group by TB1.FieldA, TB1.FieldB ) TB3 on TB3.FieldA = TA3.FieldA and TB3.Seq = N left join ( select TC1.FieldA, TC1.FieldC, COUNT(*) as Seq from #TableC TC1 join #TableC TC2 on TC1.FieldA = TC2.FieldA and TC1.FieldC >= TC2.FieldC group by TC1.FieldA, TC1.FieldC ) TC3 on TC3.FieldA = TA3.FieldA and TC3.Seq = t.N where TB3.FieldA is not null or TC3.FieldA is not null Includes a pre-2005 tally table to give something common for all the tables to join on, even if there are differing numbers of rows.
10 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@Kev Riley It is hard to believe that just a few short years ago we all had to use the very [triangular join][1] everyone disliked but had to use because of the lack of the windowing functions in SQL Server pre-2005. Just remembering how to do it deserves more than +1 :) [1]: http://www.sqlservercentral.com/articles/T-SQL/61539/
1 Like 1 ·
Mrs_Fatherjack avatar image Mrs_Fatherjack commented ·
This has worked to a point. I've tried it on the first couple of tables but it's stopped when one of the tables has hit the total of number of records in that table for that unique record, for example tableB as 14 records and tableC as 25 records, it's brought back 14 records. There are no duplicates though.
0 Likes 0 ·
Kev Riley avatar image Kev Riley ♦♦ commented ·
How where you going to use row_number()? Could you use this SQL2000 method instead?
0 Likes 0 ·
Mrs_Fatherjack avatar image Mrs_Fatherjack commented ·
I thought it would work so I'm trying to run it on a SQL 2005 box linked to the SQL 2000 server but currently struggling with that too. I'm not actually sure this is possible or whether I just suggest the user trawls through lots of data given the time frames available.
0 Likes 0 ·
Fatherjack avatar image Fatherjack ♦♦ commented ·
@mrs_fatherjack - the limit is caused by the JOIN in the bracketed SQL, I think changing those to outer joins would mean you would get flexible row counts from tables used
0 Likes 0 ·
Show more comments
ThomasRushton avatar image
ThomasRushton answered
OK, one option (**edit** - based on original query - **edit**) would appear to be create a fourth temporary table and select from that. So change your `SELECT...` statement to read: SELECT DISTINCT TA.FieldA, TB.FieldB, TC.FieldC INTO #TableD FROM #TableA AS TA LEFT OUTER JOIN #TableB AS TB ON TA.FieldA = TB.FieldA LEFT OUTER JOIN #TableC AS TC ON TA.FieldA = TC.FieldA WHERE TB.FieldB = TC.FieldC Then use the following: SELECT CASE WHEN FieldB=1 AND FieldC=1 THEN Convert(char(5), FieldA) ELSE '' END AS FieldA, FieldB, FieldC FROM #TableD
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

mandava avatar image
mandava answered
select field1,field2,count(*) as count into #temp from tablename groupby field1,field2 truncate table tablename select field1,field2 into tablename from #temp
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

mandava avatar image
mandava answered
select field1,field2,count(*) as count into #temp from tablename groupby field1,field2 truncate table tablename select field1,field2 into tablename from #temp
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.