Stored Procedure - Eliminate duplicates in all columns except where Education...etc

I managed to get my huge query to return results but still get too many duplicates. How can I write a stored procedure that will eliminate all duplicate records except for duplicates in EducationLevelDescription, Emp, and LanguageCodeDescription?

SELECT DISTINCT Employee.Emp,  Employee.EmpName,  Employee.Status,  JobTitles.JobTitleDesc,  Entity.EntityName,  Employee.WorkLocationDesc,  Employee.HireDate,  Employee.TermDate,  EducationLevel.EducationLevelDescription,  CourseOfStudy.CourseDescription,  LearningInstitute.InstituteDescription,  RegistrationCode.RegDescription,  EmployeeRegistrations.RegisteredLocationCode,  Languages.LanguageCodeDescription,  Skills.SkillDescription,  SF254Codes.SF254Desc FROM Employee INNER JOIN EmployeeEducation ON Employee.Emp = EmployeeEducation.Emp INNER JOIN EducationLevel ON EmployeeEducation.EducationLevelCode = EducationLevel.EducationLevelCode INNER JOIN CourseOfStudy ON EmployeeEducation.CourseOfStudyCode = CourseOfStudy.CourseCode INNER JOIN LearningInstitute ON EmployeeEducation.LearningInstCode = LearningInstitute.InstituteCode INNER JOIN EmployeeRegistrations ON Employee.Emp = EmployeeRegistrations.Emp INNER JOIN RegistrationCode ON EmployeeRegistrations.RegistrationCode = RegistrationCode.RegCode INNER JOIN EmployeeLanguages ON Employee.Emp = EmployeeLanguages.Emp INNER JOIN Languages ON EmployeeLanguages.LanguageCode = Languages.LanguageCode INNER JOIN EmployeeSF254 ON EmployeeLanguages.Emp = EmployeeSF254.Emp INNER JOIN EmployeeSkills ON EmployeeLanguages.Emp = EmployeeSkills.Emp INNER JOIN SF254Codes ON EmployeeSF254.SF254Code = SF254Codes.SF254Code INNER JOIN Skills ON EmployeeSkills.SkillCode = Skills.SkillCode INNER JOIN Entity ON Employee.Entity = Entity.Entity INNER JOIN JobTitles ON Employee.JobTitleCode = JobTitles.JobTitleCode WHERE (Employee.Status = 'AFT') 
more ▼

asked Feb 24, 2010 at 07:02 PM in Default

technette gravatar image

1.2k 86 99 109

did you see the extra answer I put on your previous question? It illustrates the different joins you can use
Feb 25, 2010 at 10:05 AM Fatherjack ♦♦
(comments are locked)
10|1200 characters needed characters left

3 answers: sort voted first

You can also join the duplicates table in and use the ROW_NUMBER function with a partition and only select the values that = 1 in the row number. It's a quick way to eliminate duplicates.

more ▼

answered Feb 25, 2010 at 09:47 AM

Grant Fritchey gravatar image

Grant Fritchey ♦♦
103k 19 21 74

I hadn't thought of that one, it sounds good. Do you think that is faster than doing a distinct subquery and joining in the nondistinct columns?
Feb 25, 2010 at 01:56 PM TimothyAWiseman
I don't think it's always faster, but I've seen lots of instances where it is faster, yes.
Feb 25, 2010 at 02:01 PM Grant Fritchey ♦♦
Thank you. I'll give joining the duplicates table a try.
Feb 26, 2010 at 12:50 PM technette
(comments are locked)
10|1200 characters needed characters left

You need to determine what is causing your duplicates first. Do you have duplicate empids in the employee table, or are they coming from somewhere else?

Also, please format your code so it can be easily read. There is a code button that can be used to maintain code formatting when submitting your questions and answers.

more ▼

answered Feb 24, 2010 at 07:34 PM

CirqueDeSQLeil gravatar image

4.8k 10 11 15

Sorry about the format. I'll correct that next time. The duplicates occur because for each Distinct employee Education Level, Learning Institution, Language, a new row is returned. I will allow the additional row because of educational level but want to delete the other additional rows that are returned for language..etc.
Feb 24, 2010 at 08:43 PM technette
(comments are locked)
10|1200 characters needed characters left

There are a couple of ways of going about it. The one that will give you the most direct solution is to join everything except the education level, learning instution, and language in a view, CTE, or subquery with distinct, and then join in the others that don't have to be distinct in the outer query without using distinct in the outer query.

Another option is to concatenate the values for education level, learning instution, and language view so that they appear as comma separate values inside of one cell. If you want to explore that option, then Jeff Moden has a great article at: http://www.sqlservercentral.com/articles/Test+Data/61572/

more ▼

answered Feb 24, 2010 at 10:14 PM

TimothyAWiseman gravatar image

15.6k 20 23 32

(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.



asked: Feb 24, 2010 at 07:02 PM

Seen: 2425 times

Last Updated: Feb 24, 2010 at 08:55 PM