question

sujafiza avatar image
sujafiza asked

Performance improvement while querying TEXT column

Hello, I have a table 'Document' with a TEXT column, say DocumentBody. The column has huge data. And whenever I try to query DocumentBody from Document table for a given list of ID's, it is taking hell lot of time. The table has only 2000+ records. How can I make my query perform better? Please suggest. CREATE TABLE Document (ID UNIQUEIDENTIFIER, DocumentBody TEXT)
sql-server-2005performance-tuning
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

WilliamD avatar image WilliamD commented ·
Please be aware the data types TEXT and IMAGE are deprecated data types from SQL 2005 onwards. You should be using varchar(max), nvarchar(max) or varbinary(max)
8 Likes 8 ·
Blackhawk-17 avatar image Blackhawk-17 commented ·
What version of SQL Server is this on?
0 Likes 0 ·
sujafiza avatar image sujafiza commented ·
I'm using SQL server 2005.
0 Likes 0 ·
Tim avatar image
Tim answered
How are you trying to query the table? IE select * from document where id > '1' and id
6 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Tim avatar image Tim commented ·
Update your nonclustered index on ID to clustered. Did you design this table? Why would you want to use a GUID for identifier. You could easily set this to INT in you don't think you will have more than 2 billion rows of data and set it as PK, clustered, with identify seed.
2 Likes 2 ·
Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
Wow! Edged me by seconds on that as near as I can tell.
0 Likes 0 ·
Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
Yep. Just refreshed, 10 seconds. Well done.
0 Likes 0 ·
Tim avatar image Tim commented ·
@Grant Fritchey, well you did put a lot more text into your response. I don't think I have ever told you this but my first introduction to SQL 2008 and DMV's was last year at the PASS Summit. I attended two of your sessions about DMV's and performance tuning. After attending PASS I came back to the office and installed our first 2008 instance. Since then I have upgraded over 200 databases from SQL 2000 to SQL 2005 or SQL 2008. I can't wait to see what type of SQL goodness I will get this year. I am thinking I should do more about performance and policy based management.
0 Likes 0 ·
sujafiza avatar image sujafiza commented ·
Well, I haven't designed this table. This is an existing table from the older version(which was in sql 2000). ID is given from a 3rd party s/w.
0 Likes 0 ·
Show more comments
Grant Fritchey avatar image
Grant Fritchey answered
What do the queries look like? Do you have indexes on the table? Have you checked the execution plan? Are you querying the TEXT data or simply retrieving it? I mean, do you have searches against the TEXT as part of the WHERE clause or not? In order to tell you how to make the query run better, all this information would be needed. To get started, I'd check the execution plan, first thing, to see if you're getting a seek or a scan. After that, it's probably disk I/O, so I'd look at some of the standard DMOs to see if you've got waits on the disk.
7 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
With only a nonclustered index on the ID column, you're getting, at least, a RID lookup operation, which will cause 3:1 reads, right out of the gate. Also, using the IN clause works like an OR and this can lead to scans. You might be better off doing a UNION ALL with four queries, one against each ID value.
3 Likes 3 ·
sujafiza avatar image sujafiza commented ·
select ID,DocumentBody from Document where ID IN ('808CC82A-406D-43F3-BBD4-0072A772DF7F', '6FCB6E9F-B07A-4904-97B9-019D021FC5C3', '19208C2B-0A27-4EE2-A63D-0A58732B1A7E'). There is a nonclustered, unique, primary key located on (ID).
0 Likes 0 ·
Håkan Winther avatar image Håkan Winther commented ·
One of the problems are your nonclustered index, see my answer. Do you have a clustered index at all? Or is it a HEAP?
0 Likes 0 ·
Blackhawk-17 avatar image Blackhawk-17 commented ·
@Grant "I'd check the execution plan, first thing" Do you have that programmed into a macro somewhere :)
0 Likes 0 ·
sujafiza avatar image sujafiza commented ·
There is no clustered index on the table.
0 Likes 0 ·
Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
@Blackhawk What, you don't?
0 Likes 0 ·
Blackhawk-17 avatar image Blackhawk-17 commented ·
@Grant - not yet... but then I haven't finished your book either. Maybe you should include one in the Apress d/load!
0 Likes 0 ·
Håkan Winther avatar image
Håkan Winther answered
You do not mention if there is any index on the table or not, but i suggest you should use a clustered index on ID (even if it is a uniqueidentifier, but try to use another datatype for the table because it is not an optimal column from a performance perspective). If you use a non clustered index on ID, SQL will perform an index seek + key lookup, instead of a clustered index seek. If you are using SQL server 2005 or later, I recommend you to convert the TEXT datatype into a VARCHAR(MAX) because TEXT is deprecated and will be removed in a near future (5year?). Another suggestion is to investigate if FILESTREAM can be something consider for the documents instead. I haven't tried in a production system myself, but it may be worth a try. It has never been a good idea to store large documents inside the SQL database.
7 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image Grant Fritchey ♦♦ commented ·
I agree. VARCHAR(MAX) is the right way to go. Also, I agree, a cluster on the GUID is not the end of the world.
1 Like 1 ·
Oleg avatar image Oleg commented ·
@Blackhawk-17 The pain caused by the clustered guid can be alleviated somewhat by not allowing front-ends to feed the DB with the guid values when inserting new records which causes page splits. I am working with databases I inherited which use guids extensively, and clustered guid with newsequentialid() default seems to be the best option so far (we go into 100s of million records per table partition at the most). If there is a default (newsequentialid()) on clustered guid column then all new inserts go to the tail as expected (until next reboot). This makes the life a bit easier though the pain caused by needless increases in all non-clustered indexes' leaf pages' sizes is still there.
1 Like 1 ·
sujafiza avatar image sujafiza commented ·
Thanks a lot for all your suggestions, I'll try a clustered Index on ID and try to change the datatype of TEXT column to VARCHAR(MAX)
0 Likes 0 ·
Blackhawk-17 avatar image Blackhawk-17 commented ·
SQL Server is designed to take advantage of clustered indexes. With so few records a GUID is fine... if you start adding tons 'o records that may become a different story.
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@sujafiza In my answer I pointed out the small change to the front-end code this step might require.
0 Likes 0 ·
Show more comments
Blackhawk-17 avatar image
Blackhawk-17 answered
select ID,DocumentBody from Document where ID IN ('808CC82A-406D-43F3-BBD4-0072A772DF7F', '6FCB6E9F-B07A-4904-97B9-019D021FC5C3', '19208C2B-0A27-4EE2-A63D-0A58732B1A7E') > There is a nonclustered, unique, > primary key located on (ID). You seem to define a reasonable data retrieval setup. You may want to ensure your statistics are up to date (I get the feeling it may play a part in this) and that your disk subsystem is capable of streaming the data through. Want kind of performance do you see - relatively speaking - if you just do a SELECT TOP 3...? Perhaps in SSMS you could run the query with Client Stats on and post those as well.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image
Oleg answered
Text must be replaced with varchar(max), there is no doubt about it. If the DocumentBody values are big in size then any select query will take time in SSMS simply because it takes long time to retrieve and print huge number of characters in the grid. Big values and default table options also have another not immediately obvious side effect: your records are currently stored like this: every record fills the page (8192 bytes, 8060 of which are available to us, lesser mortals) until it becomes (almost) full, it also stores a 16-byte pointer to the rest of the text data which is then stored in LOB pages. This means that every page has only ONE record. This, combined with the fact that your table is a heap (which leads to higher logical reads count) makes any select even slower than it deserves to be. In addition to other valuable suggestions to either add a narrow key or at least cluster the table by existing ID (this could be bad, but never as bad as keeping the heap), please consider the following: alter table dbo.Document alter column DocumentBody varchar(max); -- null or not null per your specs go exec sp_tableoption 'dbo.Document', 'large value types out of row', 1; go The latter will change the way your data is stored so only the pointer to the LOB data is stored on the in-row-data page, the whole value of the DocumentBody goes to LOB pages. This will make your records much more narrow, which will allow the engine to store a bunch of them on the single page rhather than just one, as it does now. This will tremendously reduce the logical reads count One implication of the changing the datatype from evil TEXT to proper varchar(max) is that you will have to modify your front end code used to insert the data into this table. If your currently use SqlDbType.Text for your respective parameter, you will have to change it to SqlDbType.VarChar, setting its length to -1 (to allow the framework to correctly translate it to varchar(max) when the parameter is passed to the stored proc / parameterized SQL used to insert new record. Oleg
3 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Blackhawk-17 avatar image Blackhawk-17 commented ·
+1 Where's the "we're not worthy" emoticon? Good solid answer and explanation as to why, how and next steps.
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@Blackhawk-17 I was actually thinking of adding a comment because other answers solve the problem quite nicely, but figured that the answer will give me the space I crave so I can make the short story long as I usually do :)
0 Likes 0 ·
Håkan Winther avatar image Håkan Winther commented ·
+1 Excellent explanation and excellent suggestion for the sp_tableoption
0 Likes 0 ·
Matt Whitfield avatar image
Matt Whitfield answered
The first thing that comes to mind is: * Is the query still slow when you are just selecting ID instead of ID, DocumentBody? If it speeds up considerably, then you are dealing with overhead in extracting large object values from the out-of-row store. The second thing that comes to mind is to change the IN () to a JOIN. How would you do that? Well, given your example: select ID,DocumentBody from Document where ID IN ('808CC82A-406D-43F3-BBD4-0072A772DF7F', '6FCB6E9F-B07A-4904-97B9-019D021FC5C3', '19208C2B-0A27-4EE2-A63D-0A58732B1A7E') You could change that to DECLARE @seekValues TABLE (seekValue uniqueidentifier) INSERT INTO @seekValues VALUES ('808CC82A-406D-43F3-BBD4-0072A772DF7F') INSERT INTO @seekValues VALUES ('6FCB6E9F-B07A-4904-97B9-019D021FC5C3') INSERT INTO @seekValues VALUES ('19208C2B-0A27-4EE2-A63D-0A58732B1A7E') SELECT ID, DocumentBody FROM Document INNER JOIN @seekValues sv on Document.ID = sv.seekValue
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Håkan Winther avatar image Håkan Winther commented ·
+1 for the suggestion to use a join instead of the IN clause.
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.