x

Find a string within a string - SQL takes too much time

I have two tables:
Table 1: CustomerEmails:

CREATE TABLE [dbo].[CustomerEmails](
    [id]            [int] IDENTITY(1,1) NOT NULL,
    [datecreated]   [datetime] NULL,
    [UID]           [nvarchar](250) NULL,
    [From]          [nvarchar](100) NULL,
    [To]            [nvarchar](100) NULL,
    [Subject]       [nvarchar](max) NULL,
    [Body]          [nvarchar](max) NULL,
    [Dated]         [datetime] NULL,
 CONSTRAINT [PK_CustomerEmails] PRIMARY KEY CLUSTERED 
(
    [id] ASC
)WITH (PAD_INDEX  = OFF,  
STATISTICS_NORECOMPUTE  = OFF, 
IGNORE_DUP_KEY = OFF,  
ALLOW_ROW_LOCKS  = ON,  
ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
)

Table 2: CustomerEmailIds:

CREATE TABLE [dbo].[CustomerEmailIds](
    [Email] [varchar](200) NULL
) ON [PRIMARY]

You may be thinking that why i am not using the EmailId in table CustomerEmails instead of using the emails itself? what i mean is that i can add a column EmailId (INT) to the table CustomerEmailIds & then i can refer that column instead of CustomerEmails.From & CustomerEmails.To?? Guys the issue is the table CustomerEmails is used by some other application & that application just keeps the track of emails sent/received through OutLook. The table CustomerEmailIds have customer emails & these emails come into system from the application i am working on.

So the Table CustomerEmails have 7,00,000+ records while table CustomerEmailIds have 1,00,000+ records.

I need to find out the emails from table CustomerEmails, based on the emails in CustomerEmailIds table.

The query i am using is:

SELECT  
        e.*
FROM    CustomerEmails e 
WHERE   EXISTS 
        (
            SELECT  Email
            FROM    CustomerEmailIds c 
            WHERE   ( ISNULL(e.[From],'') + '/' + ISNULL(e.[To],'') )  
            LIKE  '%'+c.Email+'%' 
        )

Some facts:
1- I am using SQL Server 2008
2- That the CustomerEmails.**To** can contains multiple comma separated emails like: email1@yahho.com,email2@yahho.com,email3@yahho.com
3- Because of the fact2 the c.Cs3Emails+'%' OR to= c.Cs3Emails will not list the desired results that's why i am using '%'+c.Cs3Emails+'%'

Guys the above query return wrong results..... & i don't know why??

But below query works fine:

SELECT  
        e.*
FROM    CustomerEmails e 
WHERE   (ISNULL(e.[From],'') + '/' + ISNULL(e.[To],''))  
            LIKE  '%email1@gmail.com%' 
more ▼

asked Jun 23, 2011 at 12:31 AM in Default

yaqubonnet gravatar image

yaqubonnet
247 16 17 20

I'm confused : you mention CustomerEmails and CustomerEmailIds , but your script refers to Emails and Cs3EmailsForPurge
Jun 23, 2011 at 01:19 AM Kev Riley ♦♦
Sorry for that, by mistake the wrong table names were used there. Actually i am trying different things and.. that's why i put the wrong table names there.
Jun 23, 2011 at 06:49 AM yaqubonnet
(comments are locked)
10|1200 characters needed characters left

3 answers: sort voted first

I have tried to mock up your scenario, but fail to get the incorrect results?

What are the 'wrong results'?

if exists(select OBJECT_ID from sys.objects where type = 'U' and name = N'emails')
    drop table Emails
create table emails (EmailFrom varchar(200), EmailTo varchar(200))

if exists(select OBJECT_ID from sys.objects where type = 'U' and name = N'cs3emailsforpurge')
    drop table cs3emailsforpurge
create table cs3emailsforpurge (cs3emails varchar(200))

insert into emails(EmailTo) 
  select 'email1@yahho.com,email1@gmail.com,email2@yahho.com,email3@yahho.com'

insert into cs3emailsforpurge select 'email1@gmail.com'


select * from emails e 
where (ISNULL(e.EmailFrom,'') + '/' + ISNULL(e.EmailTo,''))  
            LIKE  '%email1@gmail.com%'

SELECT  
        e.*
FROM    Emails e 
WHERE   EXISTS 
        (
            SELECT  Cs3Emails 
            FROM    Cs3EmailsForPurge c 
            WHERE   ( ISNULL(e.EmailFrom,'') + '/' + ISNULL(e.EmailTo,'') )  
            LIKE  '%'+c.Cs3Emails+'%' 
        )
more ▼

answered Jun 23, 2011 at 02:55 AM

Kev Riley gravatar image

Kev Riley ♦♦
52.8k 47 49 76

Thanks for your reply.

Yes you are right it works for small number of records but not in my case!!

By wrong results i mean that it returns all the records from CustomerEmails table whether or not the email exists in the table CustomerEmailIds.
Jun 23, 2011 at 04:19 AM yaqubonnet
maybe you have values in your CustomerEmailIDs table that could match any or all of the CustomerEmails rows - e.g. '@'
Jun 23, 2011 at 04:24 AM Kev Riley ♦♦
hmmm... i have checked but found no entry starting from @. all the emails are valid emails like email1@google.com
Jun 23, 2011 at 06:37 AM yaqubonnet
so all your emails matched %.%
Jun 23, 2011 at 07:04 AM Kev Riley ♦♦

Yes & i have resolved the issue by deleting all the invalid emailIDs, so now all the emailIDs are in this format EmailId@Domain.Com

Thanks Kev, you guided me towards the solution.
Jun 27, 2011 at 07:21 AM yaqubonnet
(comments are locked)
10|1200 characters needed characters left

Update - Right when I posted this, I saw your response to @Kev Riley, so I removed the first paragraph.

Fundamentally, there are several problems here. This is a bit of a rant, but an important one. I know that you said that the data is coming from a different system, but that doesn't make the following any less important.

First, you are searching within a string. This makes it very likely that your table is not in first normal form: your e-mail attribute is not atomic.

Second, you are using a non-SARGable query. When you do % + value + %, indexes will be of no help and you need to scan every record to find a value. This really is an implication of point #1, and is a very good case for why normalization makes things faster. You mentioned that your table has 7 million records. I'm guessing, based on the rest of this, that there probably aren't any indexes aside from the primary key, so every time you do a search, it needs to scan 7 million rows. And with an EXISTS clause, you could be scanning up to 7 million rows up to 1 million times, because each row in the outer part of your query (the Emails table) needs to do a scan of the exists part (Cs3Emails). Having the exists clause will kill the search once it finds the first record, so that's a little bit better than not stopping, but it's still a horrible performance sink.

We can probably help you figure out what's going on and get your system to limp along, but the setup in place now is like shooting yourself in the foot several times and complaining that your revolver's empty. Unfortunately, unless you change the way that e-mail addresses are stored there, you won't get the kind of performance you want. If you do change it so that you have a good data model (which would incorporate somehow a detail table for mail recipients), the query becomes trivial and performance will increase significantly.

Even if the other application needs to remain in place, exactly as it was before, it would be beneficial to consider its results "staging" data and break it out into a normalized table structure, treating that as the real data.
more ▼

answered Jun 23, 2011 at 04:25 AM

Kevin Feasel gravatar image

Kevin Feasel
6.1k 3 5 11

700k, not 7M but points still stand :) +1
Jun 23, 2011 at 07:24 AM Fatherjack ♦♦
True. I saw two commas and went Dr. Evil...
Jun 23, 2011 at 07:32 AM Kevin Feasel
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x1834
x985
x33

asked: Jun 23, 2011 at 12:31 AM

Seen: 1394 times

Last Updated: Jun 23, 2011 at 06:48 AM