Performance improvement while querying TEXT column
Hello, I have a table 'Document' with a TEXT column, say DocumentBody. The column has huge data. And whenever I try to query DocumentBody from Document table for a given list of ID's, it is taking hell lot of time. The table has only 2000+ records. How can I make my query perform better? Please suggest. CREATE TABLE Document (ID UNIQUEIDENTIFIER, DocumentBody TEXT)
What do the queries look like? Do you have indexes on the table? Have you checked the execution plan? Are you querying the TEXT data or simply retrieving it? I mean, do you have searches against the TEXT as part of the WHERE clause or not? In order to tell you how to make the query run better, all this information would be needed. To get started, I'd check the execution plan, first thing, to see if you're getting a seek or a scan. After that, it's probably disk I/O, so I'd look at some of the standard DMOs to see if you've got waits on the disk.
You do not mention if there is any index on the table or not, but i suggest you should use a clustered index on ID (even if it is a uniqueidentifier, but try to use another datatype for the table because it is not an optimal column from a performance perspective). If you use a non clustered index on ID, SQL will perform an index seek + key lookup, instead of a clustered index seek. If you are using SQL server 2005 or later, I recommend you to convert the TEXT datatype into a VARCHAR(MAX) because TEXT is deprecated and will be removed in a near future (5year?). Another suggestion is to investigate if FILESTREAM can be something consider for the documents instead. I haven't tried in a production system myself, but it may be worth a try. It has never been a good idea to store large documents inside the SQL database.
select ID,DocumentBody from Document where ID IN ('808CC82A-406D-43F3-BBD4-0072A772DF7F', '6FCB6E9F-B07A-4904-97B9-019D021FC5C3', '19208C2B-0A27-4EE2-A63D-0A58732B1A7E') > There is a nonclustered, unique, > primary key located on (ID). You seem to define a reasonable data retrieval setup. You may want to ensure your statistics are up to date (I get the feeling it may play a part in this) and that your disk subsystem is capable of streaming the data through. Want kind of performance do you see - relatively speaking - if you just do a SELECT TOP 3...? Perhaps in SSMS you could run the query with Client Stats on and post those as well.
Text must be replaced with varchar(max), there is no doubt about it. If the DocumentBody values are big in size then any select query will take time in SSMS simply because it takes long time to retrieve and print huge number of characters in the grid. Big values and default table options also have another not immediately obvious side effect: your records are currently stored like this: every record fills the page (8192 bytes, 8060 of which are available to us, lesser mortals) until it becomes (almost) full, it also stores a 16-byte pointer to the rest of the text data which is then stored in LOB pages. This means that every page has only ONE record. This, combined with the fact that your table is a heap (which leads to higher logical reads count) makes any select even slower than it deserves to be. In addition to other valuable suggestions to either add a narrow key or at least cluster the table by existing ID (this could be bad, but never as bad as keeping the heap), please consider the following: alter table dbo.Document alter column DocumentBody varchar(max); -- null or not null per your specs go exec sp_tableoption 'dbo.Document', 'large value types out of row', 1; go The latter will change the way your data is stored so only the pointer to the LOB data is stored on the in-row-data page, the whole value of the DocumentBody goes to LOB pages. This will make your records much more narrow, which will allow the engine to store a bunch of them on the single page rhather than just one, as it does now. This will tremendously reduce the logical reads count One implication of the changing the datatype from evil TEXT to proper varchar(max) is that you will have to modify your front end code used to insert the data into this table. If your currently use SqlDbType.Text for your respective parameter, you will have to change it to SqlDbType.VarChar, setting its length to -1 (to allow the framework to correctly translate it to varchar(max) when the parameter is passed to the stored proc / parameterized SQL used to insert new record. Oleg
The first thing that comes to mind is: * Is the query still slow when you are just selecting ID instead of ID, DocumentBody? If it speeds up considerably, then you are dealing with overhead in extracting large object values from the out-of-row store. The second thing that comes to mind is to change the IN () to a JOIN. How would you do that? Well, given your example: select ID,DocumentBody from Document where ID IN ('808CC82A-406D-43F3-BBD4-0072A772DF7F', '6FCB6E9F-B07A-4904-97B9-019D021FC5C3', '19208C2B-0A27-4EE2-A63D-0A58732B1A7E') You could change that to DECLARE @seekValues TABLE (seekValue uniqueidentifier) INSERT INTO @seekValues VALUES ('808CC82A-406D-43F3-BBD4-0072A772DF7F') INSERT INTO @seekValues VALUES ('6FCB6E9F-B07A-4904-97B9-019D021FC5C3') INSERT INTO @seekValues VALUES ('19208C2B-0A27-4EE2-A63D-0A58732B1A7E') SELECT ID, DocumentBody FROM Document INNER JOIN @seekValues sv on
Document.ID = sv.seekValue