question

FaultyTrucker avatar image
FaultyTrucker asked

Sql Server Recursive Stored Procedure

Want to create a stored procedure with CTE and want to know what the performance hit will be:
DECLARE @Directory Table( DirId INT, ParentId INT ) DECLARE @Files Table( FileId INT, DirId INT ) INSERT INTO @Directory SELECT 1, NULL INSERT INTO @Directory SELECT 2, 1 INSERT INTO @Directory SELECT 3, 1 INSERT INTO @Directory SELECT 4, 2 INSERT INTO @Files SELECT 1, 1 INSERT INTO @Files SELECT 2, 1 INSERT INTO @Files SELECT 3, 2 INSERT INTO @Files SELECT 4, 2 INSERT INTO @Files SELECT 5, 3 INSERT INTO @Files SELECT 6, 3 INSERT INTO @Files SELECT 7, 4 INSERT INTO @Files SELECT 8, 4 ;WITH Directories AS ( SELECT DirId, ParentID FROM @Directory WHERE DirId = 2 UNION ALL SELECT d.DirId, d.ParentID FROM @Directory d INNER JOIN Directories p ON d.ParentId = p.DirId ) SELECT * FROM Directories d INNER JOIN @Files f ON d.DirId = f.DirId
sql-server-2008sqlcterecursion
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

FaultyTrucker avatar image FaultyTrucker commented ·
How do you add the grey background?
0 Likes 0 ·
Håkan Winther avatar image Håkan Winther commented ·
You mark your code and click the button with the numbers 1 & 0 (preformatted text)
0 Likes 0 ·
Håkan Winther avatar image Håkan Winther commented ·
Is this somekind of homework maybe?
0 Likes 0 ·
WilliamD avatar image WilliamD commented ·
This recursion only goes up one level from the directory that you choose, is that wanted? Try changing DirId in the root level of the recursion from 2 to 4 and see that you won't go up to the top level of the directory list. This type of recursion will not be terribly slow. If you look at the execution plan you will see that there are two table scans for @Directory, two work tables for the recursion and a table scan of @Files. If you index correctly you should see no great problems.
0 Likes 0 ·
Håkan Winther avatar image
Håkan Winther answered
To see the performance hit you need to test your query against another solution. Create two procedures, one with this code and one with another solution and take a look at different execution plans and execution statistics. Have you figured out the other solution yet? Keep in mind that SQL server doesn't use statistics for table variables and the engine assumes only one record in table variables. That can cause inefficient execution plans if you have many records in the table variable.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Grant Fritchey avatar image
Grant Fritchey answered
The only way to know what the performance will be is to measure the performance. One way to do this, assuming we're working against a dev or test environment, don't do this in production, is to create a query window, put the two queries you're interested in, in the same window. On the standard tool bar find the little icon that shows "Client Statistics" in the hint, or click on the Query menu and select "Include Client Statistics" or hit shift-alt-s. This will turn on client statistics. Highlight the first query, run it, and you'll get some performance metrics. Highlight the second query, run that, you'll get the second queries performance metrics, side-by-side with the original, showing which values have gone up & down. For completeness, to take into account any caching that's occurred, do it one more time for each query. Now you'll have four sets of metrics that you can compare against each other. Is this the best way to do this? Not necesarily, but it is one way. There other ways with different strengths & weaknesses, set up time, etc.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Phil Factor avatar image
Phil Factor answered
The biggest problem with measuring performance of any SQL is that performance is meaningless outside the context of the data. With a SQL Query, if you are doing it right, you're actually just asking SQL Server to do something but you're not telling it HOW to do it, and the query optimizer is likely to choose different strategies according to the amount and distribution of the data. This means that a solution that works well with one set of data can be complete pants with a larger, or differently-distributed, set. When you're judging performance, to be completely confident you really need to do it on a range of trial data of various sizes and distributions. Very few people actually do that, but it is surprising how badly some SQL 'scales' to large sets of data.
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Håkan Winther avatar image Håkan Winther commented ·
+1 You are completely right, I think we all have seen a lot of terrible example where no adequate performance tests have been performed. "Know your data" is an important lesson to learn. You have to know the size, growth, distribution and even the workload. Some queries can perform well when nothing else is executing, but under full workload it takes for ever. BUT the lesson from this question is that you can't know the performance hit without testing the performance and compare to something else. If you don't have anything to compare to, you may be forced by the business demands to use some terrible code.
0 Likes 0 ·
Scot Hauder avatar image
Scot Hauder answered
CREATE TABLE Directory ( DirPath hierarchyid PRIMARY KEY, DirId int, ParentId int ) CREATE TABLE Files ( FileId int, DirId int ) -- just insert dummy values for hierarchyid, they are updated later INSERT Directory VALUES('/1/', 1, NULL), ('/2/', 2, 1), ('/3/', 3, 1), ('/4/', 4, 2) INSERT Files VALUES(1, 1), (2, 1), (3, 2), (4, 2), (5, 3), (6, 3), (7, 4), (8, 4) -- update table with hierarchy ;WITH hierarchy AS ( SELECT '/' + CAST(DirId AS varchar(MAX)) [Path], DirId FROM Directory d WHERE d.ParentId IS NULL UNION ALL SELECT CAST(hierarchy.[path] + '/' + CAST(d.DirId AS varchar(MAX)) AS varchar(MAX)), d.DirId FROM Directory d JOIN hierarchy ON (d.parentID = hierarchy.DirId) ) UPDATE d SET [DirPath] = h.[Path] + '/' FROM hierarchy h JOIN Directory d ON (h.DIRID = d.DIRID) -- once the hierarchy is up to date you can run queries against it, -- this should beat most recursive queries DECLARE @root hierarchyid; SELECT @root = [DirPath] FROM Directory WHERE DirId = 2 -- set your top level directory SELECT d.DirId, f.FileId FROM Directory d JOIN Files f ON (f.DirId = d.DirId) WHERE [DirPath].IsDescendantOf(@root) = 1 ORDER BY d.DirId, f.FileId DROP TABLE Directory, Files
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.