x

Removing duplicates from data

I have a script that needs to join to 9 tables, they are all on outer joins as they don't all have matching data and some tables will bring back multiple rows. Ideally I would like my script to bring back only unique fields. To replicate my issue use the following:

** Edited to remove the insert statements ***

What I would really like is:

FieldA  FieldB   FieldC
1   1 1
    2  2
    3  3
2   1 1
    2  2
    3  3
3   1 1
    2  3
    3  3

And I'm not sure how to do this, I'm using SQL 2000 otherwise I would have used ROWNUMBER function.

Many thanks

---------- Excerpt from real data added:

iID FieldA  FieldB   FieldC
1   900075756 41  454
1   900075756 41  1683
1   900075756 42  454
1   900075756 42  1683
1   900075756 113 454
1   900075756 113 1683
1   900075756 179 454
1   900075756 179 1683
1   900075756 224 454
1   900075756 224 1683
1   900075756 276 454
1   900075756 276 1683
1   900075756 471 454
1   900075756 471 1683
1   900075756 584 454
1   900075756 584 1683
1   900075756 1174    454
1   900075756 1174    1683

The tables join on FieldA, value 900075752 (for this example), and for this example TableA is the primary table to use and left outer join to the others.

TableB for this value has: 3 5 7 8 10 11 14

TableC for this value has: 57, 80, 433 481 675 742 1207

When I run the script the data comes back giving all options for the three tables.
more ▼

asked May 24 '11 at 06:25 AM in Default

Mrs_Fatherjack gravatar image

Mrs_Fatherjack
4.7k 57 62 66

What's the value in FieldA for the 2nd and 3rd rows?
May 24 '11 at 06:33 AM Kev Riley ♦♦
It's the same as in the first row.
May 24 '11 at 06:35 AM Mrs_Fatherjack

So you want FieldA to autoincrement whenever fieldb = fieldc = 1? and for it to be blank otherwise?

Oh, and what are the ordering / sorting criteria?
May 24 '11 at 06:37 AM ThomasRushton ♦
I don't think I explained my script very clearly sorry. Basically I have 9 tables that join on 1 field, I've used the outer join as not all the tables will come back with any data. Because some of the tables have many rows for each unique field the data coming back is enormous and repeated and I want to try to just bring back each set of data once. I've added an excerpt from the actual data
May 24 '11 at 06:42 AM Mrs_Fatherjack
I think you need to illustrate with different values as I'm getting confused as to how the tables join (or don't)
May 24 '11 at 06:43 AM Kev Riley ♦♦
(comments are locked)
10|1200 characters needed characters left

5 answers: sort voted first

As a starter.....

select
    TA1.FieldA
    ,TB2.FieldB
    ,TC2.FieldC
from
    #TableA TA1
left join
    (
    select TB1.FieldA, TB1.FieldB, COUNT(*) as Seq 
    from #TableB TB1 join #TableB TB2 
    on TB1.FieldA = TB2.FieldA
    and TB1.FieldB >= TB2.FieldB
    group by TB1.FieldA, TB1.FieldB
    ) TB2
    on TA1.FieldA = TB2.FieldA
LEFT JOIN
    (  
    select TC1.FieldA, TC1.FieldC, COUNT(*) as Seq 
    from #TableC TC1 join #TableC TC2 
    on TC1.FieldA = TC2.FieldA
    and TC1.FieldC >= TC2.FieldC
    group by TC1.FieldA, TC1.FieldC
    ) TC2
    on TB2.FieldA = TC2.FieldA and TB2.Seq= TC2.seq

This pre-2005 solution creates a sequence number (or rank) by joining the table back onto itself and counting the number of rows with a value less than or equal to the current one.

---- Edit: going code-blind now...

does this give the right results

drop table #tally

SELECT TOP 1000 IDENTITY(INT,1,1) AS N
INTO #tally
FROM Master.dbo.SysColumns sc1,
    Master.dbo.SysColumns sc2

drop table #tablea
drop table #tableb
drop table #tablec



CREATE TABLE #TableA
(iID INT IDENTITY,
FieldA INT )

CREATE TABLE #TableB
(FieldA int,
FieldB int)

CREATE Table #TableC
(FieldA int, FieldC int)

INSERT INTO #TableA
        ( FieldA )
SELECT 1
union
SELECT 2
UNION
SELECT 3

INSERT INTO #TableB
(FieldA, FieldB)
SELECT 1,1
UNION
SELECT 1,2
UNION 
SELECT 1,3
UNION 
SELECT 2,1
UNION
SELECT 2,2
UNION
SELECT 2,3
UNION 
SELECT 3,1
UNION
SELECT 3,2
UNION SELECT 3,3

INSERT INTO #TableC
(FieldA, FieldC)
SELECT 1,1
UNION
SELECT 1,2
UNION 
SELECT 1,3
UNION 
SELECT 2,1
UNION
SELECT 2,2
UNION
SELECT 2,3
UNION 
SELECT 3,1
UNION
SELECT 3,2
UNION SELECT 3,3

select
    TA3.FieldA,
    TB3.FieldB,
    TC3.FieldC

from
    #tally t full join #TableA TA3 on ta3.FieldA = ta3.FieldA
left join
    (
    select TB1.FieldA, TB1.FieldB, COUNT(*) as Seq 
    from #TableB TB1, #TableB TB2 
    where TB1.FieldA = TB2.FieldA
    and TB1.FieldB >= TB2.FieldB
    group by TB1.FieldA, TB1.FieldB
    ) TB3
    on TB3.FieldA = TA3.FieldA
    and TB3.Seq = N

left join
    (
    select TC1.FieldA, TC1.FieldC, COUNT(*) as Seq 
    from #TableC TC1 join #TableC TC2 
    on TC1.FieldA = TC2.FieldA
    and TC1.FieldC >= TC2.FieldC
    group by TC1.FieldA, TC1.FieldC
    ) TC3
    on TC3.FieldA = TA3.FieldA
    and TC3.Seq = t.N
where 
    TB3.FieldA is not null
    or
    TC3.FieldA is not null

Includes a pre-2005 tally table to give something common for all the tables to join on, even if there are differing numbers of rows.

more ▼

answered May 24 '11 at 07:06 AM

Kev Riley gravatar image

Kev Riley ♦♦
50.7k 43 49 76

This has worked to a point. I've tried it on the first couple of tables but it's stopped when one of the tables has hit the total of number of records in that table for that unique record, for example tableB as 14 records and tableC as 25 records, it's brought back 14 records. There are no duplicates though.
May 24 '11 at 07:23 AM Mrs_Fatherjack
How where you going to use row_number()? Could you use this SQL2000 method instead?
May 24 '11 at 07:26 AM Kev Riley ♦♦
I thought it would work so I'm trying to run it on a SQL 2005 box linked to the SQL 2000 server but currently struggling with that too. I'm not actually sure this is possible or whether I just suggest the user trawls through lots of data given the time frames available.
May 24 '11 at 07:28 AM Mrs_Fatherjack
@mrs_fatherjack - the limit is caused by the JOIN in the bracketed SQL, I think changing those to outer joins would mean you would get flexible row counts from tables used
May 24 '11 at 07:30 AM Fatherjack ♦♦

@Kev Riley It is hard to believe that just a few short years ago we all had to use the very [triangular join][1] everyone disliked but had to use because of the lack of the windowing functions in SQL Server pre-2005. Just remembering how to do it deserves more than +1 :)

[1]: http://www.sqlservercentral.com/articles/T-SQL/61539/
May 24 '11 at 07:33 AM Oleg
(comments are locked)
10|1200 characters needed characters left

OK, one option (**edit** - based on original query - edit) would appear to be create a fourth temporary table and select from that. So change your SELECT... statement to read:

SELECT DISTINCT TA.FieldA, TB.FieldB, TC.FieldC
INTO #TableD
FROM #TableA AS TA
LEFT OUTER JOIN #TableB AS TB ON TA.FieldA = TB.FieldA
LEFT OUTER JOIN #TableC AS TC ON TA.FieldA = TC.FieldA
WHERE TB.FieldB = TC.FieldC

Then use the following:

SELECT CASE WHEN FieldB=1 AND FieldC=1 
       THEN Convert(char(5), FieldA) 
       ELSE '' END AS FieldA, 
    FieldB, FieldC
FROM #TableD
more ▼

answered May 24 '11 at 06:44 AM

ThomasRushton gravatar image

ThomasRushton ♦
33.3k 13 20 44

(comments are locked)
10|1200 characters needed characters left

select field1,field2,count(*) as count into #temp from tablename groupby field1,field2 truncate table tablename select field1,field2 into tablename from #temp

more ▼

answered Jun 23 '11 at 03:03 AM

mandava gravatar image

mandava
33 1 2

(comments are locked)
10|1200 characters needed characters left

select field1,field2,count(*) as count into #temp from tablename groupby field1,field2

truncate table tablename

select field1,field2 into tablename from #temp
more ▼

answered Jun 23 '11 at 03:04 AM

mandava gravatar image

mandava
33 1 2

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x472
x14

asked: May 24 '11 at 06:25 AM

Seen: 1275 times

Last Updated: Jun 23 '11 at 01:26 AM