question

jpatchak avatar image
jpatchak asked

Problems with Substring in Subquery/View

OK, I've already advised the dev on how to work around this issue, but I am still not understanding why this is happening. We have a report that gets its data from a stored procedure. The stored procedure selects from a view. The view itself runs fine, but when I try to select from the view using a where criteria, I get "Invalid length parameter passed to the LEFT or SUBSTRING function." The problem can be recreated with a subquery: SELECT * FROM (SELECT COALESCE(SUBSTRING(c.email,1, CHARINDEX('@', c.email)-1), '') [UserName] FROM Contact WHERE type = 'student') Subquery WHERE UserName<>'Blah' Whenever "UserName" is used in a where clause, the query fails. Figuring that it must be something to do with the source data, I tried to hone in on a record to see which data was causing the error by messing with the "contactID" in the subquery, but to no avail - the error arbitrarily disappears at a certain "contactID" that does not exist in the subquery's result set. Also, just FYI, every "email" in the subquery result set contains an "@". So in summary, the subquery runs fine by itself. Select * from the subquery works fine. Select * from the subquery where UserName=Anything fails with "Invalid length parameter passed to the LEFT or SUBSTRING function". Is it possibly because the "where" condition "outside" the subquery gets applied before the "where" condition "inside" the subquery? Is there some record in the "contact" table that is not in the Subquery's result set that is causing this error? **EDIT** OK I might have over simplified the issue in the example above. Check out this code:
DECLARE @contact TABLE (personID int, email varchar(150), type varchar(20))
DECLARE @t TABLE(personID int, type varchar(20))

INSERT INTO @contact (personID, email, type) SELECT 1, 'student1@school.com', 'student'
UNION SELECT 2, 'teacher2@school.com', 'teacher'
UNION SELECT 3, '', 'teacher'
UNION SELECT 4, null, 'teacher'

INSERT INTO @t(personID, type)
SELECT 1, 'student'
UNION SELECT 2, 'teacher'
UNION SELECT 3, 'teacher'
UNION SELECT 4, 'teacher'

--This works fine....

SELECT * FROM 
(SELECT 
COALESCE(SUBSTRING(c.email,1, CHARINDEX('@', c.email)-1), '') UserName
FROM @contact c
INNER JOIN @t t
	ON c.personID=t.personID
WHERE c.type='Student') subQuery
WHERE subQuery.UserName<>''

--This generates an error....

SELECT * FROM
(SELECT
COALESCE(SUBSTRING(c.email,1, CHARINDEX('@', c.email)-1), '') UserName
FROM @contact c
INNER JOIN @t t
	ON c.personID=t.personID
WHERE t.type='Student') subQuery
WHERE subQuery.UserName<>'' 
That appears to be the crux of my issue. Does anyone have an explanation?
sql-server-2008error-messagesubstring
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kevin Feasel avatar image
Kevin Feasel answered
The query plan actually gives you the answer, although it's a bit tough to find. Let's use a simplification of your view: replace UserName with the coalesce function. SELECT COALESCE(SUBSTRING(c.email,1, CHARINDEX('@', c.email)-1), '') UserName FROM @contact c INNER JOIN @t t ON c.personID=t.personID WHERE t.type='Student' and COALESCE(SUBSTRING(c.email,1, CHARINDEX('@', c.email)-1), '') '' The execution plan says that first we scan @contact. Then we compute a scalar value (calculating UserName). Then we join to @t via a Hash Match. Then we get our result set. The important bit, however, is that we're scanning @contact first. Check out the Predicate when you look at the estimated plan for the above query: CASE WHEN substring(@contact.[email] as [c].[email],(1),charindex('@',@contact.[email] as [c].[email])-(1)) IS NOT NULL THEN substring(@contact.[email] as [c].[email],(1),charindex('@',@contact.[email] as [c].[email])-(1)) ELSE '' END'' Here it is when you use c.type instead of t.type: @contact.[type] as [c].[type]='Student' AND CASE WHEN substring(@contact.[email] as [c].[email],(1),charindex('@',@contact.[email] as [c].[email])-(1)) IS NOT NULL THEN substring(@contact.[email] as [c].[email],(1),charindex('@',@contact.[email] as [c].[email])-(1)) ELSE '' END'' The difference is that the latter can filter where type is Student, and thus get rid of all of those bad e-mail addresses. The former, however, doesn't get to filter on type until bringing in the @t table. @t's Predicate reads: @t.[type] as [t].[type]='Student' Thus you get an error. But wait, there's more! Given what I said, let's try forcing the join order: SELECT COALESCE(SUBSTRING(c.email,1, CHARINDEX('@', c.email)-1), '') UserName FROM @t t INNER LOOP JOIN @contact c ON c.personID=t.personID WHERE t.type='Student' and COALESCE(SUBSTRING(c.email,1, CHARINDEX('@', c.email)-1), '') '' This works, right? Well, no. It still gives you the same error for the same reason: @c's Predicate does **not** filter out the bad records. You might think that you're filtering all of them out with the join, but the join happens **after** the table scan's Predicate. Thus, the only safe query was the one you found: where the predicate gets rid of potentially bad rows before applying the function. Interestingly enough, because I don't believe there is a required predicate order, I would say that even this **might** cause an error if the predicates were evaluated in the "wrong" order. That's something I'm not sure about, though.
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Magnus Ahlkvist avatar image Magnus Ahlkvist commented ·
+1. This is a really good answer.
1 Like 1 ·
Magnus Ahlkvist avatar image
Magnus Ahlkvist answered
Have a look at the estimated execution plan to see if things are executed in different order in the two queries. The error message will be generated for those rows where the email-address does not contain the '@'-character. Since you only select the Student-row, and that contains an '@'-character in email you should be OK. But you're not, so I'm thinking the SUBSTRING-function might be executed **before** the FILTER-operation in the execution plan for the second query. On a more general note: I'd put t.Type='Student' and/or c.Type)'Student' in the INNER JOIN clause instead of the WHERE clause. Not sure that will solve anything in this case though.
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jpatchak avatar image jpatchak commented ·
@Magnus - thanks for your reply. The estimated execution plans are identical. Also, the t.type/c.type = 'Student' piece generates the same error if it's a part of the INNER JOIN clause. We've worked around the issue, but I still don't understand why it's a problem.
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.