Query plan not using PK, but unclustered index ... why ?

Basic SOURCE Table defns:

table_L  (bint bigint PRIMARY KEY, lookup_val1 int, lookup_val2 int)

lkup     (id int identity(1,1) PRIMARY KEY, 
          catid int, 
          abbr varchar(20), 
          descr nvarchar(40))
UNIQUE NONCLUSTERED INDEX IX_1 on lookup(catid, abbr)
UNIQUE NONCLUSTERED INDEX IX_2 on lookup(catid, descr)

(tried it without "unique" nonclustered indices, doesn't help)

select L.bint, cast(LU1.abbr as int), cast(LU2.descr as int)
from table_L L
        inner join lkup LU1 on L.lookup_val1 = LU1.ID
        inner join lkup LU2 on L.lookup_val2 = LU2.ID

QUERY plan says that it will use indexes: IX_1 and IX_2 (Index Scan) to fullfil the query.

Since "catid" is not even mentioned in the query, why has it chosen a scan of an index rather than the direct route to the actual record? Why not a "Clustered Index SEEK" ?

Use of the non-clustered indexes causes an error (Conversion failed when converting the nvarchar value "some string" to data type int). Of course, if it had used the PK, there would be no conversion issue.

Adding the resultant query plan:

|--Hash Match(Inner Join, HASH:([LU1].[id])=([L].[lookup_val1]), RESIDUAL:([table_L].[lookup_val1] as [L].[lookup_val1]=[lkup].[id] as [LU1].[id]))  |--Compute Scalar(DEFINE:([Expr1007]=CONVERT(int,[lkup].[descr] as [LU1].[descr],0)))  | |--Index Scan(OBJECT:([lkup].[IX_2] AS [LU1]))  |--Hash Match(Inner Join, HASH:([LU2].[id])=([L].[look_val2]), RESIDUAL:([table_L].[lookup_val2] as [L].[lookup_val2]=[lkup].[id] as [LU2].[id]))  |--Compute Scalar(DEFINE:([Expr1006]=CONVERT(int,[lkup].[abbr] as [LU2].[abbr],0)))  | |--Index Scan(OBJECT:([lkup].[IX_1] AS [LU2]))  |--Clustered Index Scan(OBJECT:([table_L].[PK_table_L] AS [L])) 
more ▼

asked Oct 20, 2009 at 05:42 PM in Default

Andrew gravatar image

15 1 1 1

(comments are locked)
10|1200 characters needed characters left

4 answers: sort voted first

The Execution Plan in Microsoft SQL Server is cost based. It is cheaper for the Storage Engine to scan a table (sequential reads) than seek a table (random reads). In your case, it depends on what more columns are fetched from the query. The Storage Engine will avoid to do Key Lookups if possible.

Also, since there is no SARG-able filter clause in your query, a SEEK can never happen since you basically told the Storage Engine to fetch all records.

more ▼

answered Oct 20, 2009 at 07:42 PM

Peso gravatar image

1.6k 5 6 8

Thanks, this explains it well.
Oct 21, 2009 at 04:13 PM Andrew
(comments are locked)
10|1200 characters needed characters left
  1. Saints preserve us -- It's a MUCK (EAV) !!!! I'm going to bet that this query returns unanticipated extra rows someday because it doesn't take CATID into consideration.

  2. Because there is no WHERE condition, the entire primary table is being scanned. As volumes increase, at some point doing a hash merge of two scans will be less work than looking up values one at a time by ID.

  3. The query joins to the LKUP table twice. This means LKUP is actually treated as two separate tables. Each of the joins is handled separately, and the optimizer will seek the most efficient way to perform each of the joins.

  4. Because the indexes begin with the CATID column, which isn't part of the join, and the primary table is indexed on BINT, which also isn't part of the join, the query would have to do an entire table scan to look up even one description or abbreviation. Using a hash merge of two scans is certainly going to be more efficient.

  5. As Grant mentioned, each of the indexes probably contains fewer pages than the primary table (because each index row is narrower, by virtue of omitting one of the two largest columns in the table). So, for each join, there is an index which will take less work to scan than the primary table.

more ▼

answered Oct 21, 2009 at 12:07 AM

Bob Hovious gravatar image

Bob Hovious
1.6k 5 6 9

You've misread the code to some extent. No unanticipated rows would ever be returned - "lookup_val1 & 2" reference the PK of the "lkup" table (1-to-1 join). (BINT and CATID are not relevant for this query - i'm providing the "lkup" PK value via "lookup_val1 & 2").

I accept the table scan on "table_L" - it's what I required. What I don't like, is that the "lkup" table is preferring to scan non-clustered indexes rather than performing a clustered index SEEK.

Pt 2, 3 & 5 - I agree with.

Pt 4 - no exceptions, lookup_val1 & 2 will always have values.
Oct 21, 2009 at 04:04 PM Andrew

You're correct, I did misread the reference to the primary key of the LKUP table. My apologies. That error certainly makes number 4 off the mark, although it had nothing to do with the presence or absence of values.

However, the fact remains that SQL is being asked to perform two joins that will include every row in the primary table. According to the optimizer's calculations, based on the volumes and statistics known to it, it can scan the indexes and do hash matches faster than it can do a number of discrete SEEKS.
Oct 22, 2009 at 12:25 AM Bob Hovious

Two things to try:

  1. Take the exact same query and add a WHERE clause to it to pull only one or two rows. I suspect you will see the optimizer switch to using a SEEK, because that will be less work for a small number of rows.

  2. Although I haven't tried this in years, I believe you can use a query hint to force a SEEK for your original query. Then compare the runtimes.

Please let me know what the results are.
Oct 22, 2009 at 12:26 AM Bob Hovious
(comments are locked)
10|1200 characters needed characters left

The conversion error is not caused by the use of the index. It's your cast that's the issue there. Why are you even bothering to cast the values in the SELECT statement? If they are numeric in nature, though stored as strings (why), they'll return just fine.

Since it's doing a scan, it probably sees fewer pages in each of those indexes so it's trying to save time by scanning them rather than scanning the table/clustered index which probably has more pages in it.

more ▼

answered Oct 20, 2009 at 10:46 PM

Grant Fritchey gravatar image

Grant Fritchey ♦♦
103k 19 21 74

Pt 2 is probably correct as far as the optimizer is concerned - granted.

Pt 1 - the "lkup" table is a lookup table, so certain "catid" will contain non-numeric values, and others, numeric values. My expectations for these items are that they will be numeric - always. The conversion error occurs, because it is returning records that do not correspond to the "ON" clause ... the "cast" is occuring prior to the "filtering" of the resultant set.
Oct 21, 2009 at 03:12 PM Andrew
Since it's doing a scan on these indexes, it's very likely that you're getting a CAST before the data comes back.
Oct 21, 2009 at 08:18 PM Grant Fritchey ♦♦
(comments are locked)
10|1200 characters needed characters left

Also, regardless of index strategy, you're asking for a varchar and a nvarchar to be CASTed as int, so why are you surprised that you get a conversion error?

more ▼

answered Oct 20, 2009 at 08:01 PM

Kev Riley gravatar image

Kev Riley ♦♦
53.8k 47 49 76

I'm not ... the angst is that the PK is not used to obtain the exact record requested - which would never contain non-numerics.
Oct 21, 2009 at 03:08 PM Andrew
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

New code box

There's a new way to format code on the site - the red speech bubble logo will automatically format T-SQL for you. The original code box is still there for XML, etc. More details here.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.



asked: Oct 20, 2009 at 05:42 PM

Seen: 2815 times

Last Updated: Oct 21, 2009 at 10:09 AM