|
I have tried different techniques, but I can't figure it out. I really don't want to manually fix over 3000 duplicates. Here is the SQL code to detect them.
Does anybody have any clue? Thank you, Mike
(comments are locked)
|
|
Something like this should do the trick. I keep getting syntax error in From clause. I have the following table structure: I'm not sure if that matters or not. All our case sensitive.
Oct 11 '09 at 05:44 PM
Mike 6
Which column where you trying to use to deturmin if the rows are duplicates?
Oct 12 '09 at 04:23 AM
mrdenny
(comments are locked)
|
|
Which version of SQL Server are you using?
(comments are locked)
|
|
DELETE MyTable FROM MyTable a JOIN ( SELECT URL, ID = MIN(ID) FROM MyTable GROUP BY URL ) b ON a.URL = b.URL AND a.ID <> b.ID
(comments are locked)
|
|
Unless I have misunderstood your example, this seems pretty straightforward. If the records are exact duplicates (meaning every field matches), then you have it easy. Here are the steps:
--eliminate all the rows in the original table
--copy back the rows from the other table
--drop the other table
--OPTIONAL?!?! add a unique constaint or primary key (perhaps you have a key on ID?)
BE CAREFUL: I have had people tell me to get rid of duplicates when the records actually varied in some other field. If you have duplicate entries for the URL but other fields are not the same then the fix will be more involved. Verify you get the same number for each of the following queries. --count distinct values on the URL field
--count distinct values on multiple fields... notice the GROUP BY
--if you want an end-to-end example of this, here you go --create the test table
--insert some rows and some duplicates
--export the 1 row per duplicate to another table
--eliminate all the rows in the original table
--copy back the rows from the other table
--drop the other table
--OPTIONAL?!?! add a unique constaint or primary key (perhaps you have a key on ID?)
--verify the records in the table
--clean up from our test This example assumes that there's no ID value on the table which @Mike said that there was. Also running that on a table with a few hundred million records would kill the system and take the application down (or at least in a funky state) for quite some time while it runs.
Oct 12 '09 at 09:05 PM
mrdenny
Denny, I'm having trouble finding where Mike mentioned there is a unique value in the table. Perhaps he replied to you privately? In any case, your answer is the best one and what I would use if there is a unique value. I was presenting a last-ditch de-duplication process where there is no unique value. I assume the application would be acting sporadically anyway in this case. Also, I thought the table was only a few thousand rows. Would would you say if there are no unique fields: do you temporarily add an identity field and remove after using the DELETE JOIN as above. This is another.
Oct 15 '09 at 06:19 PM
Benjamin
(comments are locked)
|
|
I'm going to 2nd mrdenny's answer. It's short and elegant. I usually went the route involving table variables to store distinct data temporarily and later dumping it back to the destination..
(comments are locked)
|

