x
login about faq Site discussion (meta-askssc)

Stored Procedure - Eliminate duplicates in all columns except where Education...etc

I managed to get my huge query to return results but still get too many duplicates. How can I write a stored procedure that will eliminate all duplicate records except for duplicates in EducationLevelDescription, Emp, and LanguageCodeDescription?

SELECT DISTINCT Employee.Emp, 
    Employee.EmpName, 
    Employee.Status, 
    JobTitles.JobTitleDesc, 
    Entity.EntityName, 
    Employee.WorkLocationDesc, 
    Employee.HireDate, 
    Employee.TermDate, 
    EducationLevel.EducationLevelDescription, 
    CourseOfStudy.CourseDescription, 
    LearningInstitute.InstituteDescription, 
    RegistrationCode.RegDescription, 
    EmployeeRegistrations.RegisteredLocationCode, 
    Languages.LanguageCodeDescription, 
    Skills.SkillDescription, 
    SF254Codes.SF254Desc
FROM Employee 
INNER JOIN EmployeeEducation ON Employee.Emp = EmployeeEducation.Emp 
INNER JOIN EducationLevel ON EmployeeEducation.EducationLevelCode = EducationLevel.EducationLevelCode 
INNER JOIN CourseOfStudy ON EmployeeEducation.CourseOfStudyCode = CourseOfStudy.CourseCode 
INNER JOIN LearningInstitute ON EmployeeEducation.LearningInstCode = LearningInstitute.InstituteCode 
INNER JOIN EmployeeRegistrations ON Employee.Emp = EmployeeRegistrations.Emp 
INNER JOIN RegistrationCode ON EmployeeRegistrations.RegistrationCode = RegistrationCode.RegCode 
INNER JOIN EmployeeLanguages ON Employee.Emp = EmployeeLanguages.Emp 
INNER JOIN Languages ON EmployeeLanguages.LanguageCode = Languages.LanguageCode 
INNER JOIN EmployeeSF254 ON EmployeeLanguages.Emp = EmployeeSF254.Emp 
INNER JOIN EmployeeSkills ON EmployeeLanguages.Emp = EmployeeSkills.Emp 
INNER JOIN SF254Codes ON EmployeeSF254.SF254Code = SF254Codes.SF254Code 
INNER JOIN Skills ON EmployeeSkills.SkillCode = Skills.SkillCode 
INNER JOIN Entity ON Employee.Entity = Entity.Entity 
INNER JOIN JobTitles ON Employee.JobTitleCode = JobTitles.JobTitleCode 
WHERE  (Employee.Status = 'AFT')
more ▼

asked Feb 24 '10 at 07:02 PM in Default

technette gravatar image

technette
847 45 76 88

did you see the extra answer I put on your previous question? It illustrates the different joins you can use

Feb 25 '10 at 10:05 AM Fatherjack ♦♦
(comments are locked)
10|1200 characters needed characters left

3 answers: sort voted first

You can also join the duplicates table in and use the ROW_NUMBER function with a partition and only select the values that = 1 in the row number. It's a quick way to eliminate duplicates.

more ▼

answered Feb 25 '10 at 09:47 AM

Grant Fritchey gravatar image

Grant Fritchey ♦♦
62.1k 12 19 66

I hadn't thought of that one, it sounds good. Do you think that is faster than doing a distinct subquery and joining in the nondistinct columns?

Feb 25 '10 at 01:56 PM TimothyAWiseman

I don't think it's always faster, but I've seen lots of instances where it is faster, yes.

Feb 25 '10 at 02:01 PM Grant Fritchey ♦♦

Thank you. I'll give joining the duplicates table a try.

Feb 26 '10 at 12:50 PM technette
(comments are locked)
10|1200 characters needed characters left

You need to determine what is causing your duplicates first. Do you have duplicate empids in the employee table, or are they coming from somewhere else?

Also, please format your code so it can be easily read. There is a code button that can be used to maintain code formatting when submitting your questions and answers.

more ▼

answered Feb 24 '10 at 07:34 PM

CirqueDeSQLeil gravatar image

CirqueDeSQLeil
3.9k 6 11 15

Sorry about the format. I'll correct that next time. The duplicates occur because for each Distinct employee Education Level, Learning Institution, Language, a new row is returned. I will allow the additional row because of educational level but want to delete the other additional rows that are returned for language..etc.

Feb 24 '10 at 08:43 PM technette
(comments are locked)
10|1200 characters needed characters left

There are a couple of ways of going about it. The one that will give you the most direct solution is to join everything except the education level, learning instution, and language in a view, CTE, or subquery with distinct, and then join in the others that don't have to be distinct in the outer query without using distinct in the outer query.

Another option is to concatenate the values for education level, learning instution, and language view so that they appear as comma separate values inside of one cell. If you want to explore that option, then Jeff Moden has a great article at: http://www.sqlservercentral.com/articles/Test+Data/61572/

more ▼

answered Feb 24 '10 at 10:14 PM

TimothyAWiseman gravatar image

TimothyAWiseman
14.3k 16 20 29

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments



Facebook logo Follow Ask SSC on Facebook
Find Ask SSC on Google+
linkedin logo Find us on LinkedIn

Topics:

x43

asked: Feb 24 '10 at 07:02 PM

Seen: 1607 times

Last Updated: Feb 24 '10 at 08:55 PM

Copyright © 2002-2012 Simple Talk Publishing. All Rights Reserved. If you have any queries, please contact the site administrators.
Ask SQL Server Central is a community service provided by Red Gate.