question

Håkan Winther avatar image
Håkan Winther asked

IO bottleneck (as ususal)

Our client outsourced their SQL server environment to a large multinational IT consultant company. (I don't want to mention the name). They are using a Hitachi Universal Storage Platform. When our client moved, the performance was reduced to a third of original performance. After some investigations, I can see that there is a IO bottleneck. When I run the code below I get 140125054 io_stall (ms) on four files in tempdb. (the avg_stall is 151ms/file) The outsourcing partner is stubborn and claims that RAID 5 will be good enough for tempDB, and I claims the opposite (and recommendation from Hitatchi and Microsoft claims the same as me). How do i convince the outsourcing partner that they are soo wrong. (The server was restarted in less than a week)

The database is 1Tb and is used in a telecom billing application.

The median for a write operation is 40ms in the SAN.

SELECT db_name(database_id) AS Database_Name
, file_id
,CAST(num_of_reads AS BIGINT) AS [total_num_reads]
,CAST(num_of_writes AS BIGINT) AS [total_num_writes]
,CAST(io_stall_read_ms AS BIGINT)
/ CAST(CASE WHEN num_of_reads=0
THEN 1
ELSE num_of_reads
END AS BIGINT) AS [avg_read_stall]
,CAST(io_stall_write_ms AS BIGINT)
/ CAST(CASE WHEN num_of_writes=0
THEN 1
ELSE num_of_writes
END AS BIGINT) AS [avg_write_stall]
,io_stall_write_ms
,io_stall_read_ms
,CAST(num_of_bytes_read AS BIGINT)/1024/1024 AS [total_bytes_read_mb]
,CAST(num_of_bytes_written AS BIGINT)/1024/1024 AS [total_bytes_written_mb]
,CAST(num_of_bytes_read AS BIGINT)
/ CAST(CASE WHEN num_of_reads=0
THEN 1
ELSE num_of_reads
END AS BIGINT)
/ 1024 AS [avg_read_size_Kbytes]
,CAST(num_of_bytes_written AS BIGINT)
/ CAST(CASE WHEN num_of_writes=0
THEN 1
ELSE num_of_writes
END AS BIGINT)
/ 1024 AS [avg_write_size_Kbytes]
FROM sys.dm_io_virtual_file_stats(NULL,NULL)
sql-server-2005performancebest-practicedisk-ioraid
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Håkan Winther avatar image
Håkan Winther answered

First of all, I want to thank you for your answers.

Next I found out that the client didn't gave me all the facts about this situation. The main reason why Logica recommended RAID 5 is the fact that the client didn't want to pay 4 times as much for using RAID 1+0 and they claimed that RAID 5 is as good as RAID 1+0 when it comes to read, but not writes. That is true, but the client didn't realize that they have a write intence database. (strange? The system have been in production since 2005 and they don't know that?)

What I had to do was give them figures on how much write operations they have, but I do not think that they are interested in paying 5 EURO per GB each month for RAID 1+0! You heard right 5 EURO! They have to pay for the whole disk volume, not only the used disk space, and that results in 5 EURO per GB. When the database is 1TB that will cost you 5000 Euro each month + 5000 EURO for the Datawarehouse.

I guess you think now it is time to archive some data, and you are right, thats why I am here.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Whitfield avatar image
Matt Whitfield answered

I think the onus is probably on your client to do the persuading. I would explain the issue to them, give them the data with which they can prove the case in an easily manageable form.

Then they can choose to broach it with the outsourcing 'partner' as they choose to. Usually not paying the bill helps...

4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Håkan Winther avatar image Håkan Winther commented ·
Actaully, I talk to the outourcing partner through my client, and I have to convince them to trust me more than the large ugly company. It is proven that the new environment is slower, but not why.
0 Likes 0 ·
Matt Whitfield avatar image Matt Whitfield ♦♦ commented ·
If you can get some similar stats off the old equipment then that would certainly help... Not sure if that's possible?
0 Likes 0 ·
Håkan Winther avatar image Håkan Winther commented ·
No, the old environment is still up, but has not been used for a while and have no stats in the dmv:s.
0 Likes 0 ·
Matt Whitfield avatar image Matt Whitfield ♦♦ commented ·
Any possibility of capturing a workload from the new production system and running it against the old to get a comparison?
0 Likes 0 ·
Fatherjack avatar image
Fatherjack answered

Ask the outsourcing host to provide their IO figures and how they reach them. If they cant then they need to back down. As Matt says though, its your client that needs to take the problem on. If you can prove its the IO and the hardware changed as you say then its a pretty secure position for your client to argue from. Holding back the $$$ is certainly an attention getter!

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Blackhawk-17 avatar image
Blackhawk-17 answered

You would probably also want to use the Avg. Disk sec/Transfer performance object to back up your figures.

From the PerfMon explanation:

Avg. Disk sec/Transfer is the time, in seconds, of the average disk transfer.

You will also need the Avg. Disk Bytes/Transfer object to correlate the information.

Avg. Disk Bytes/Transfer is the average number of bytes transferred to or from the disk during write or read operations.

From here you should be able to illustrate the difference between the two RAID implementations. You may need to break it out further by using the individual read or write objects to get a little more granular.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.