x

Removing duplicates from data

I have a script that needs to join to 9 tables, they are all on outer joins as they don't all have matching data and some tables will bring back multiple rows. Ideally I would like my script to bring back only unique fields. To replicate my issue use the following:

  • Edited to remove the insert statements ***

What I would really like is:

 FieldA    FieldB    FieldC
 1    1    1
     2    2
     3    3
 2    1    1
     2    2
     3    3
 3    1    1
     2    3
     3    3

And I'm not sure how to do this, I'm using SQL 2000 otherwise I would have used ROWNUMBER function.

Many thanks


Excerpt from real data added:

 iID    FieldA    FieldB    FieldC
 1    900075756    41    454
 1    900075756    41    1683
 1    900075756    42    454
 1    900075756    42    1683
 1    900075756    113    454
 1    900075756    113    1683
 1    900075756    179    454
 1    900075756    179    1683
 1    900075756    224    454
 1    900075756    224    1683
 1    900075756    276    454
 1    900075756    276    1683
 1    900075756    471    454
 1    900075756    471    1683
 1    900075756    584    454
 1    900075756    584    1683
 1    900075756    1174    454
 1    900075756    1174    1683

The tables join on FieldA, value 900075752 (for this example), and for this example TableA is the primary table to use and left outer join to the others.

TableB for this value has: 3 5 7 8 10 11 14

TableC for this value has: 57, 80, 433 481 675 742 1207

When I run the script the data comes back giving all options for the three tables.

more ▼

asked May 24, 2011 at 06:25 AM in Default

avatar image

Mrs_Fatherjack
5.1k 64 66 76

What's the value in FieldA for the 2nd and 3rd rows?

May 24, 2011 at 06:33 AM Kev Riley ♦♦

It's the same as in the first row.

May 24, 2011 at 06:35 AM Mrs_Fatherjack

So you want FieldA to autoincrement whenever fieldb = fieldc = 1? and for it to be blank otherwise?

Oh, and what are the ordering / sorting criteria?

May 24, 2011 at 06:37 AM ThomasRushton ♦♦

I don't think I explained my script very clearly sorry. Basically I have 9 tables that join on 1 field, I've used the outer join as not all the tables will come back with any data. Because some of the tables have many rows for each unique field the data coming back is enormous and repeated and I want to try to just bring back each set of data once. I've added an excerpt from the actual data

May 24, 2011 at 06:42 AM Mrs_Fatherjack

I think you need to illustrate with different values as I'm getting confused as to how the tables join (or don't)

May 24, 2011 at 06:43 AM Kev Riley ♦♦
show all comments (comments are locked)
10|1200 characters needed characters left

5 answers: sort voted first

As a starter.....

 select
     TA1.FieldA
     ,TB2.FieldB
     ,TC2.FieldC
 from
     #TableA TA1
 left join
     (
     select TB1.FieldA, TB1.FieldB, COUNT(*) as Seq 
     from #TableB TB1 join #TableB TB2 
     on TB1.FieldA = TB2.FieldA
     and TB1.FieldB >= TB2.FieldB
     group by TB1.FieldA, TB1.FieldB
     ) TB2
     on TA1.FieldA = TB2.FieldA
 LEFT JOIN
     (    
     select TC1.FieldA, TC1.FieldC, COUNT(*) as Seq 
     from #TableC TC1 join #TableC TC2 
     on TC1.FieldA = TC2.FieldA
     and TC1.FieldC >= TC2.FieldC
     group by TC1.FieldA, TC1.FieldC
     ) TC2
     on TB2.FieldA = TC2.FieldA and TB2.Seq= TC2.seq

This pre-2005 solution creates a sequence number (or rank) by joining the table back onto itself and counting the number of rows with a value less than or equal to the current one.


Edit: going code-blind now...

does this give the right results

 drop table #tally
 
 SELECT TOP 1000 IDENTITY(INT,1,1) AS N
 INTO #tally
 FROM Master.dbo.SysColumns sc1,
     Master.dbo.SysColumns sc2
     
 drop table #tablea
 drop table #tableb
 drop table #tablec
 
 
 
 CREATE TABLE #TableA
 (iID INT IDENTITY,
 FieldA INT )
 
 CREATE TABLE #TableB
 (FieldA int,
 FieldB int)
 
 CREATE Table #TableC
 (FieldA int, FieldC int)
 
 INSERT INTO #TableA
         ( FieldA )
 SELECT 1
 union
 SELECT 2
 UNION
 SELECT 3
 
 INSERT INTO #TableB
 (FieldA, FieldB)
 SELECT 1,1
 UNION
 SELECT 1,2
 UNION 
 SELECT 1,3
 UNION 
 SELECT 2,1
 UNION
 SELECT 2,2
 UNION
 SELECT 2,3
 UNION 
 SELECT 3,1
 UNION
 SELECT 3,2
 UNION SELECT 3,3
 
 INSERT INTO #TableC
 (FieldA, FieldC)
 SELECT 1,1
 UNION
 SELECT 1,2
 UNION 
 SELECT 1,3
 UNION 
 SELECT 2,1
 UNION
 SELECT 2,2
 UNION
 SELECT 2,3
 UNION 
 SELECT 3,1
 UNION
 SELECT 3,2
 UNION SELECT 3,3
 
 select
     TA3.FieldA,
     TB3.FieldB,
     TC3.FieldC
     
 from
     #tally t full join #TableA TA3 on ta3.FieldA = ta3.FieldA
 left join
     (
     select TB1.FieldA, TB1.FieldB, COUNT(*) as Seq 
     from #TableB TB1, #TableB TB2 
     where TB1.FieldA = TB2.FieldA
     and TB1.FieldB >= TB2.FieldB
     group by TB1.FieldA, TB1.FieldB
     ) TB3
     on TB3.FieldA = TA3.FieldA
     and TB3.Seq = N
 
 left join
     (
     select TC1.FieldA, TC1.FieldC, COUNT(*) as Seq 
     from #TableC TC1 join #TableC TC2 
     on TC1.FieldA = TC2.FieldA
     and TC1.FieldC >= TC2.FieldC
     group by TC1.FieldA, TC1.FieldC
     ) TC3
     on TC3.FieldA = TA3.FieldA
     and TC3.Seq = t.N
 where 
     TB3.FieldA is not null
     or
     TC3.FieldA is not null

Includes a pre-2005 tally table to give something common for all the tables to join on, even if there are differing numbers of rows.

more ▼

answered May 24, 2011 at 07:06 AM

avatar image

Kev Riley ♦♦
64.2k 48 62 81

This has worked to a point. I've tried it on the first couple of tables but it's stopped when one of the tables has hit the total of number of records in that table for that unique record, for example tableB as 14 records and tableC as 25 records, it's brought back 14 records. There are no duplicates though.

May 24, 2011 at 07:23 AM Mrs_Fatherjack

How where you going to use row_number()? Could you use this SQL2000 method instead?

May 24, 2011 at 07:26 AM Kev Riley ♦♦

I thought it would work so I'm trying to run it on a SQL 2005 box linked to the SQL 2000 server but currently struggling with that too. I'm not actually sure this is possible or whether I just suggest the user trawls through lots of data given the time frames available.

May 24, 2011 at 07:28 AM Mrs_Fatherjack

@mrs_fatherjack - the limit is caused by the JOIN in the bracketed SQL, I think changing those to outer joins would mean you would get flexible row counts from tables used

May 24, 2011 at 07:30 AM Fatherjack ♦♦

@Kev Riley It is hard to believe that just a few short years ago we all had to use the very triangular join everyone disliked but had to use because of the lack of the windowing functions in SQL Server pre-2005. Just remembering how to do it deserves more than +1 :)

May 24, 2011 at 07:33 AM Oleg
(comments are locked)
10|1200 characters needed characters left

OK, one option (edit - based on original query - edit) would appear to be create a fourth temporary table and select from that. So change your SELECT... statement to read:

 SELECT DISTINCT TA.FieldA, TB.FieldB, TC.FieldC
 INTO #TableD
 FROM #TableA AS TA
 LEFT OUTER JOIN #TableB AS TB ON TA.FieldA = TB.FieldA
 LEFT OUTER JOIN #TableC AS TC ON TA.FieldA = TC.FieldA
 WHERE TB.FieldB = TC.FieldC

Then use the following:

 SELECT CASE WHEN FieldB=1 AND FieldC=1 
        THEN Convert(char(5), FieldA) 
        ELSE '' END AS FieldA, 
     FieldB, FieldC
 FROM #TableD
more ▼

answered May 24, 2011 at 06:44 AM

avatar image

ThomasRushton ♦♦
40.3k 20 49 53

(comments are locked)
10|1200 characters needed characters left

select field1,field2,count(*) as count into #temp from tablename groupby field1,field2 truncate table tablename select field1,field2 into tablename from #temp

more ▼

answered Jun 23, 2011 at 03:03 AM

avatar image

mandava
33 1 4

(comments are locked)
10|1200 characters needed characters left

select field1,field2,count(*) as count into #temp from tablename groupby field1,field2

truncate table tablename

select field1,field2 into tablename from #temp

more ▼

answered Jun 23, 2011 at 03:04 AM

avatar image

mandava
33 1 4

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x501
x18

asked: May 24, 2011 at 06:25 AM

Seen: 1755 times

Last Updated: Jun 23, 2011 at 01:26 AM

Copyright 2016 Redgate Software. Privacy Policy