x

Problems with Substring in Subquery/View

OK, I've already advised the dev on how to work around this issue, but I am still not understanding why this is happening. We have a report that gets its data from a stored procedure. The stored procedure selects from a view. The view itself runs fine, but when I try to select from the view using a where criteria, I get "Invalid length parameter passed to the LEFT or SUBSTRING function." The problem can be recreated with a subquery:

SELECT * FROM
(SELECT COALESCE(SUBSTRING(c.email,1, CHARINDEX('@', c.email)-1), '') [UserName]
FROM Contact WHERE type = 'student') Subquery
WHERE UserName<>'Blah'

Whenever "UserName" is used in a where clause, the query fails. Figuring that it must be something to do with the source data, I tried to hone in on a record to see which data was causing the error by messing with the "contactID" in the subquery, but to no avail - the error arbitrarily disappears at a certain "contactID" that does not exist in the subquery's result set. Also, just FYI, every "email" in the subquery result set contains an "@".

So in summary, the subquery runs fine by itself. Select * from the subquery works fine. Select * from the subquery where UserName=Anything fails with "Invalid length parameter passed to the LEFT or SUBSTRING function". Is it possibly because the "where" condition "outside" the subquery gets applied before the "where" condition "inside" the subquery? Is there some record in the "contact" table that is not in the Subquery's result set that is causing this error?

EDIT OK I might have over simplified the issue in the example above. Check out this code:

DECLARE @contact TABLE (personID int, email varchar(150), type varchar(20))
DECLARE @t TABLE(personID int, type varchar(20))

INSERT INTO @contact (personID, email, type) SELECT 1, 'student1@school.com', 'student'
UNION SELECT 2, 'teacher2@school.com', 'teacher'
UNION SELECT 3, '', 'teacher'
UNION SELECT 4, null, 'teacher'

INSERT INTO @t(personID, type)
SELECT 1, 'student'
UNION SELECT 2, 'teacher'
UNION SELECT 3, 'teacher'
UNION SELECT 4, 'teacher'

--This works fine....

SELECT * FROM 
(SELECT 
COALESCE(SUBSTRING(c.email,1, CHARINDEX('@', c.email)-1), '') UserName
FROM @contact c
INNER JOIN @t t
	ON c.personID=t.personID
WHERE c.type='Student') subQuery
WHERE subQuery.UserName<>''

--This generates an error....

SELECT * FROM
(SELECT
COALESCE(SUBSTRING(c.email,1, CHARINDEX('@', c.email)-1), '') UserName
FROM @contact c
INNER JOIN @t t
	ON c.personID=t.personID
WHERE t.type='Student') subQuery
WHERE subQuery.UserName<>'' 
That appears to be the crux of my issue. Does anyone have an explanation?
more ▼

asked Nov 20, 2012 at 05:22 PM in Default

jpatchak gravatar image

jpatchak
290 2 3 5

(comments are locked)
10|1200 characters needed characters left

2 answers: sort voted first

The query plan actually gives you the answer, although it's a bit tough to find.

Let's use a simplification of your view: replace UserName with the coalesce function.

SELECT
 COALESCE(SUBSTRING(c.email,1, CHARINDEX('@', c.email)-1), '') UserName
FROM @contact c
 INNER JOIN @t t ON c.personID=t.personID
WHERE 
 t.type='Student'
 and COALESCE(SUBSTRING(c.email,1, CHARINDEX('@', c.email)-1), '') <> ''

The execution plan says that first we scan @contact. Then we compute a scalar value (calculating UserName). Then we join to @t via a Hash Match. Then we get our result set.

The important bit, however, is that we're scanning @contact first. Check out the Predicate when you look at the estimated plan for the above query:

CASE WHEN substring(@contact.[email] as [c].[email],(1),charindex('@',@contact.[email] as [c].[email])-(1)) IS NOT NULL THEN substring(@contact.[email] as [c].[email],(1),charindex('@',@contact.[email] as [c].[email])-(1)) ELSE '' END<>''

Here it is when you use c.type instead of t.type:

@contact.[type] as [c].[type]='Student' AND CASE WHEN substring(@contact.[email] as [c].[email],(1),charindex('@',@contact.[email] as [c].[email])-(1)) IS NOT NULL THEN substring(@contact.[email] as [c].[email],(1),charindex('@',@contact.[email] as [c].[email])-(1)) ELSE '' END<>''

The difference is that the latter can filter where type is Student, and thus get rid of all of those bad e-mail addresses. The former, however, doesn't get to filter on type until bringing in the @t table. @t's Predicate reads:

@t.[type] as [t].[type]='Student'

Thus you get an error.

But wait, there's more! Given what I said, let's try forcing the join order:

SELECT
 COALESCE(SUBSTRING(c.email,1, CHARINDEX('@', c.email)-1), '') UserName
FROM
 @t t
 INNER LOOP JOIN @contact c ON c.personID=t.personID
WHERE 
 t.type='Student'
 and COALESCE(SUBSTRING(c.email,1, CHARINDEX('@', c.email)-1), '') <> ''

This works, right? Well, no. It still gives you the same error for the same reason: @c's Predicate does not filter out the bad records. You might think that you're filtering all of them out with the join, but the join happens after the table scan's Predicate.

Thus, the only safe query was the one you found: where the predicate gets rid of potentially bad rows before applying the function. Interestingly enough, because I don't believe there is a required predicate order, I would say that even this might cause an error if the predicates were evaluated in the "wrong" order. That's something I'm not sure about, though.
more ▼

answered Nov 20, 2012 at 10:42 PM

Kevin Feasel gravatar image

Kevin Feasel
6.1k 3 5 11

+1. This is a really good answer.
Nov 20, 2012 at 11:01 PM Magnus Ahlkvist
(comments are locked)
10|1200 characters needed characters left

Have a look at the estimated execution plan to see if things are executed in different order in the two queries. The error message will be generated for those rows where the email-address does not contain the '@'-character. Since you only select the Student-row, and that contains an '@'-character in email you should be OK. But you're not, so I'm thinking the SUBSTRING-function might be executed before the FILTER-operation in the execution plan for the second query.

On a more general note: I'd put t.Type='Student' and/or c.Type)'Student' in the INNER JOIN clause instead of the WHERE clause. Not sure that will solve anything in this case though.
more ▼

answered Nov 20, 2012 at 09:19 PM

Magnus Ahlkvist gravatar image

Magnus Ahlkvist
16.4k 16 19 33

@Magnus - thanks for your reply. The estimated execution plans are identical. Also, the t.type/c.type = 'Student' piece generates the same error if it's a part of the INNER JOIN clause. We've worked around the issue, but I still don't understand why it's a problem.
Nov 20, 2012 at 09:26 PM jpatchak
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x1834
x74
x12

asked: Nov 20, 2012 at 05:22 PM

Seen: 3329 times

Last Updated: Nov 20, 2012 at 11:01 PM