x

t sql loop logic

I have a client that wants me to generate employee numbers. I cannot figure out how. I have a table of firstname, lastname, and employeenumber.
The logic is this:
employeenumber = firstinitial & first 7 characters of lastname --- IF that's not taken. If it is...
employeenumber = firstinitial & first 6 characters of lastname & '1'-- IF that's not taken. If it is...
employeenumber = firstinitial & first 6 characters of lastname & '2'-- IF that's not taken. If it is, you get the idea up to 9. Then give up (though we might need to go to some higher number... might need to go to first initial & first 5 characters & then go double digits)
This is used as a logon for an older system that only accepts 8 character user names, which have to be unique... So, I would probably be dmurph9
more ▼

asked Jun 20, 2011 at 12:51 PM in Default

mmdmurphy gravatar image

mmdmurphy
11 1 1 1

Alright, someone accused me of being a student wanting help with homework. Is there some way I can prove this isn't the case???
Jun 20, 2011 at 01:11 PM mmdmurphy

With regard to the student thing..... it is not a problem if you are, help will still be offered.

The idea of the site is to help answer problems. If you supply more information (at best table creation scripts, test data and expected outputs) then people are more likely to help out.

Please don't take the assumptions personally, some people just dump their homework on here and expect the work to be done for them.

If you could supply us with the creation scripts etc. and maybe the solution you have so far (if available), we can get to work.
Jun 20, 2011 at 01:18 PM WilliamD
@mmdmurphy What do you want to do with those employees which have first initial + last name shorter than 8 characters. Do you want them to be padded to 8 characters anyway or not? For example James Doe, Jane Doe and John Doe become JDoe as per first try, so would you like James Doe to remain JDoe, Jane Doe to become JDoe1 and John Doe - Jdoe2 or you need to right-pad all names first so all employee numbers end up having exactly 8 characters regardless of how short (or long) is their last name?
Jun 20, 2011 at 01:45 PM Oleg
Sorry, I wasn't more clear, I tried to keep the question short, and made it too short. First off, nope, I don't want padding jdoe, jdoe1 and jdoe2 are perfectly valid. Secondly, (again, my fault), many of them are ALREADY assigned to someone. So, going back to john doe example, the first john doe was hired a long time ago and is already in there as jdoe. Then, we hired in jane and jerald doe. We have to leave jdoe assigned to john doe, and then assign jdoe1 and jdoe2 to jane and jerald (nope, doesn't matter which one gets which)
Jun 21, 2011 at 05:29 AM mmdmurphy
(comments are locked)
10|1200 characters needed characters left

2 answers: sort newest

Here is a version that should handle more than ten ties in terms of shortened names. This means it will automatically reduce the alphanumeric name length and increment the ID added to the end of login name for names that are the same.

This is not perfect, if you have too many ties, then the LEFT() command will crash. We're talking 8 figures here though, so I think that would be rare!

What I did was to take the initial shortnames that are valid (8 characters, as specced). Then run a ROW_NUMBER() identifying the shortnames that are the same, this is all in the CTE shortnames.

I then ran a CASE statement, finding shortnames that were long enough and had ties. These were then modified to take the necessary amount of letters in the short column and decide how many letters to throw away and replace with a digit. This automatically works regardless of how many ties you have, as displayed in my example code, with SLongname being changed up to SLongn10.

EDIT 3: Missed the case of usernames that are shorter than 8 but still tie, that is in there now.

EDIT 4: Changed the final CASE to fit the spec better - thanks @Oleg :o)

EDIT 5: Last change to fix the brokenness as found by @Kevin Feasel. This should fulfill all requirements.

DECLARE @employees AS TABLE
    (FirstName varchar(255),
     LastName varchar(255))
INSERT INTO @employees (FirstName,LastName)
SELECT 'John','Smith'        UNION ALL
SELECT 'Paul','Smith'        UNION ALL
SELECT 'John','Doe'          UNION ALL
SELECT 'Jane','Doe'          UNION ALL
SELECT 'Peter','Jones'    UNION ALL
SELECT 'Someone','Longname'  UNION ALL
SELECT 'Someone','Longname1' UNION ALL
SELECT 'Someone','Longname2' UNION ALL
SELECT 'Someone','Longname3' UNION ALL
SELECT 'Someone','Longname4' UNION ALL
SELECT 'Someone','Longname5' UNION ALL
SELECT 'Someone','Longname6' UNION ALL
SELECT 'Someone','Longname7' UNION ALL
SELECT 'Someone','Longname8' UNION ALL
SELECT 'Someone','Longname9' UNION ALL
SELECT 'Another','Longername'
;
WITH    shortnames
          AS (SELECT    LEFT(FirstName, 1) + LEFT(Lastname, 7) AS short,
                        ROW_NUMBER() OVER (PARTITION BY LEFT(FirstName, 1) + LEFT(Lastname, 7) ORDER BY LEFT(FirstName, 1) + LEFT(Lastname, 7)) AS ties,
                        *
              FROM      @employees),
newids as (
SELECT  *,
               CASE WHEN (LEN(short) > 7
                   AND MAX(ties) OVER (PARTITION BY short) > 1) THEN LEFT(short, 8 - LEN(CAST(ties AS varchar)))
             WHEN MAX(ties) OVER (PARTITION BY short) > 1 THEN LEFT(short, 8 - LEN(CAST(ties AS varchar)))
             ELSE short
        END as NewUserId
FROM    shortnames
)
SELECT  *,
        NewUserId +
       CASE WHEN ties > ROW_NUMBER() OVER (PARTITION BY NewUserId ORDER BY NewUserId) THEN cast( ties AS varchar) ELSE
       CASE WHEN ROW_NUMBER() OVER (PARTITION BY NewUserId ORDER BY NewUserId)=1 THEN '' ELSE CAST (ROW_NUMBER() OVER (PARTITION BY NewUserId ORDER BY NewUserId) AS varchar) END END
FROM    newids
more ▼

answered Jun 20, 2011 at 01:38 PM

WilliamD gravatar image

WilliamD
25.9k 17 19 41

Good show on getting more than 10 in there, though this query doesn't look like it handles cases like my Smithers vs. Smithe. If you run your query against my data set above (replacing the @employees with @users), you'll see that Carl Smithe and Chuck Smithers both have the same user ID: CSmithe1.
Jun 20, 2011 at 02:14 PM Kevin Feasel
@WilliamD This is a very good solution, just needs a small tweak because it looks like OP wanted to have the first rule tried first and then kick in the number appending logic. In other words, Jane Doe and John Doe should appear as JDoe and JDoe1 but in your script they happen to become JDoe1 and JDoe2. Since the ultimate goal is to update the EmployeeNumber column which probably has NULL in every record so it could become not nullable and unique, it might make sense to restate your solution into 2 updates, first brutally updating whatever can be unique without adding any numbers regardless of whether the name is too short or not (just based on the N = 1) and then issuing a second update with EmployeeNumber is not null predicate using your solution as is to update the leftovers.
Jun 20, 2011 at 03:01 PM Oleg
@Oleg - I changed the final CASE statement to fulfill what you were talking about. Now I have to think of how to fix the bug the @Kevin Feasel was talking about. That is a tricky bugger. I think I may have to resolve to running more than one update...... time to put my thinking cap on!
Jun 20, 2011 at 03:12 PM WilliamD

There's another little bug in your new query. I'm not trying to be mean, honest!

If you run your query against my lengthened data set, the CSmithe group goes from CSmithe9 to CSmithe10, which is 9 characters long.
Jun 21, 2011 at 04:05 AM Kevin Feasel
It's a little bit trickier than it appears. To have to solution really bulletproof, you will end up with at least 8 consequently nested CTEs because each of the steps can produce a duplicity by reducing the number of characters if there is higher count of users. Probably the easiest way to produce a really correct results will be a stored procedure witch input table variable which can call itself recursively. Or maybe a recursive CTE.
Jun 28, 2011 at 01:11 AM Pavel Pawlowski
(comments are locked)
10|1200 characters needed characters left

Update: I have added in a way to get past the 11-match barrier. The downside is that you do have "lost" numbers, but unless the company hires hundreds of people with the same first initial and most of the last name, it should be fine... I also extended the test data to become more interesting, so now there are two sets of names which pass the eleven-match barrier: C. Smith and C. Smithe*. I still recommend doing some more testing, but I'm more confident about this than I was earlier on today when I posted the first version.

The short version of the change is that I'm now looking at the row_number for the first five characters of the last name, and in the event that the 6-character return is at least 10, return the 5-character row number. This results in some numbers being skipped (there is no CSmith10 or CSmith11 in this case, nor 17-25), but should guarantee that each username is unique. To go up to three digits, rinse and repeat with four-character codes, though that's getting to be ridiculous.

declare @users table
(
    FirstName varchar(30),
    LastName varchar(30), 
    EmployeeNumber varchar(8),
    AddedTime datetime  
);

/*We'll start with some seed data.*/
insert into @users(FirstName, LastName, EmployeeNumber, AddedTime)
values ('Winston', 'Chambers', 'WChamber', '2011-06-10');

insert into @users(FirstName, LastName, EmployeeNumber, AddedTime)
values ('Joe', 'Smith', 'JSmith', '2011-06-10');

insert into @users(FirstName, LastName, EmployeeNumber, AddedTime)
values ('Alvin', 'Washington', 'AWashing', '2011-06-10');

/*Now we get to add the new users.*/
insert into @users(FirstName, LastName, AddedTime)
values('Chuck', 'Smith', CURRENT_TIMESTAMP);

insert into @users(FirstName, LastName, AddedTime)
values('Fred', 'Jones', CURRENT_TIMESTAMP);

insert into @users(FirstName, LastName, AddedTime)
values('Margaret', 'Fjordenstein', CURRENT_TIMESTAMP);

insert into @users(FirstName, LastName, AddedTime)
values('Joe', 'Smith', CURRENT_TIMESTAMP);

insert into @users(FirstName, LastName, AddedTime)
values('Chuck', 'Smithington', CURRENT_TIMESTAMP);

insert into @users(FirstName, LastName, AddedTime)
values('Chuck', 'Smithers', CURRENT_TIMESTAMP);

insert into @users(FirstName, LastName, AddedTime)
values('Carl', 'Smithe', CURRENT_TIMESTAMP);

insert into @users(FirstName, LastName, AddedTime)
values('Cate', 'Smithe', CURRENT_TIMESTAMP);

insert into @users(FirstName, LastName, AddedTime)
values('Carlos', 'Smithes', CURRENT_TIMESTAMP);

insert into @users(FirstName, LastName, AddedTime)
values('Chuck', 'Smitherly', CURRENT_TIMESTAMP);

insert into @users(FirstName, LastName, AddedTime)
values('Como', 'Smithe', DATEADD(dd, 1, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Carly', 'Smithe', DATEADD(dd, 1, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Cathy', 'Smithe', DATEADD(dd, 1, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Cindy', 'Smithe', DATEADD(dd, 1, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Cailey', 'Smithe', DATEADD(dd, 1, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Cara', 'Smithe', DATEADD(dd, 1, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Cody', 'Smithe', DATEADD(dd, 2, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Crazy', 'Smithers', DATEADD(dd, 2, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Cool', 'Smitherly', DATEADD(dd, 2, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Cynthia', 'Smithe', DATEADD(dd, 3, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Carlton', 'Smith', DATEADD(dd, 4, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Ceke', 'Smith', DATEADD(dd, 4, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Cole', 'Smith', DATEADD(dd, 4, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Chadwick', 'Smith', DATEADD(dd, 4, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Colm', 'Smith', DATEADD(dd, 4, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Curt', 'Smith', DATEADD(dd, 4, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Cable', 'Smith', DATEADD(dd, 4, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Craw', 'Smith', DATEADD(dd, 4, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Clappy', 'Smith', DATEADD(dd, 4, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Copen', 'Smith', DATEADD(dd, 4, current_timestamp));

insert into @users(FirstName, LastName, AddedTime)
values('Calm', 'Smith', DATEADD(dd, 5, current_timestamp));

with names as
(
    select 
        FirstName,
        LastName,
        EmployeeNumber,
        AddedTime,
        SUBSTRING(FirstName, 1, 1) as FirstName1,
        SUBSTRING(LastName, 1, 7) as LastName7,
        SUBSTRING(LastName, 1, 6) as LastName6,
        SUBSTRING(LastName, 1, 5) as LastName5,
        ROW_NUMBER() over (partition by substring(firstname, 1, 1) + substring(LastName, 1, 6) order by AddedTime asc, LastName asc, FirstName asc) as RowNumber6,
        ROW_NUMBER() over (partition by substring(firstname, 1, 1) + substring(LastName, 1, 5) order by AddedTime asc, LastName asc, FirstName asc) as RowNumber5
    from 
        @users
)
select 
    *,
    case
        when RowNumber6 = 1 then FirstName1 + LastName7
        when RowNumber6 > 10 then FirstName1 + LastName5 + CAST(RowNumber5 - 1 as varchar)
        else FirstName1 + LastName6 + cast(RowNumber6 - 1 as VARCHAR) 
    end as Username
from 
    names;

This is for SQL 2005 or later, as it uses ROW_NUMBER() and a common table expression.

Some of the things I did--like LastName7 and LastName6--weren't absolutely necessary, but I already needed a common table expression to get the row number, so I figured I would create friendlier aliases.

Finally, you might need to tweak it a little bit if you don't have an added time or some other marker to show, historically, what was already done. You could probably even use an identity column and insert your seed data (the employee records you already have) in ascending employee number order. Then, instead of ordering by the added time, you would order by the identity column.
more ▼

answered Jun 20, 2011 at 01:23 PM

Kevin Feasel gravatar image

Kevin Feasel
6.1k 3 5 11

Looking at your solution, and trying to implement it, it occurs to me that I wasn't clear. I am NOT starting with 'clean' data. Many of the employeenumbers are ALREADY taken. So, it's not enough to just calculate them. You have to look thru the table for ones that are already assigned to someone.
Jun 21, 2011 at 05:22 AM mmdmurphy

This solution already handles that. You can see the three "seed data" users, which were entered before our next batches. As long as the previous logins followed the same process, if you have all of the logins, you can add them first as seed data.

If the previous logins did not follow the same process, you could seed pseudo-users to fill in the gaps. So, for example, if Chuck Jones was CJones3 and there wasn't a CJones2, CJones1, or CJones prior to that, you could pre-insert seed records to generate those three, and just have a flag noting that they aren't actual users. That's pretty ugly, but would work.

Also, it might be helpful to look into using an identity column for historical insertion if you don't have valid user creation times. Then, insert your existing records in order so that the fellow who has JSmith comes before JSmith1, who comes before JSmith2, etc.
Jun 21, 2011 at 05:32 AM Kevin Feasel

I spent a little more time on this this afternoon and realised I'm terribly over-engineering my solution. I wanted to make it dynamic, so it can deal with all sorts of possibilities.

Your solution does the job, with a single pass of the data. As you wrote, if there are 3 digit ties in a username, you just extend the CTE and case statement.

+1 to you, because that is all I can give.
Jun 27, 2011 at 03:59 PM WilliamD
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x12

asked: Jun 20, 2011 at 12:51 PM

Seen: 1152 times

Last Updated: Jun 20, 2011 at 12:51 PM