The last couple of weeks we have had, what appears to be a ghost reindex job try and kick off on this server, even though there is not currently a reindex job on this server. We usually know when it kicks off because we get a failure alert through email:
DESCRIPTION:[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure COMMENT: The Optimization Job has failed, Review the Job History and logs to determine the error and specific remediation JOB RUN: (None)
When I look at the SQL Server Log, I see no job process attempting to run or a manually generated DBCC Reindex process. But I do see the following:
07/01/2010 10:23:33,spid2s,Unknown,SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:SQLLogsCommerce_log.ldf] in database Commerce. The OS file handle is 0x00000938. The offset of the latest long I/O is: 0x000000004e4800 07/01/2010 10:23:33,spid2s,Unknown,SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [E:SQLLogsSmartPay_log.ldf] in database SmartPay. The OS file handle is 0x00000914. The offset of the latest long I/O is: 0x000000079c4c00
I looked at the Application Event Log for the same time frame and I see the follwoing:
07/01/2010 10:24:13,MSSQLSERVER,Error,[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure,Failover,1073760843,,HLSQLSRV02N1 07/01/2010 10:24:13,MSSQLSERVER,Error,[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed,Failover,1073760843,,HLSQLSRV02N1 07/01/2010 10:24:13,MSSQLSERVER,Error,[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure,Failover,1073760843,,HLSQLSRV02N1 07/01/2010 10:24:13,MSSQLSERVER,Error,[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed,Failover,1073760843,,HLSQLSRV02N1 07/01/2010 10:24:13,MSSQLSERVER,Error,[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure,Failover,1073760843,,HLSQLSRV02N1 07/01/2010 10:24:13,MSSQLSERVER,Error,[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed,Failover,1073760843,,HLSQLSRV02N1 07/01/2010 10:24:13,MSSQLSERVER,Error,[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure,Failover,1073760843,,HLSQLSRV02N1 07/01/2010 10:24:13,MSSQLSERVER,Error,[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed,Failover,1073760843,,HLSQLSRV02N1 07/01/2010 10:24:13,MSSQLSERVER,Error,[sqsrvres] printODBCError: sqlstate = 08S01; native error = 0; message = [Microsoft][SQL Native Client]Communication link failure,Failover,1073760843,,HLSQLSRV02N1 07/01/2010 10:24:13,MSSQLSERVER,Error,[sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed,Failover,1073760843,,HLSQLSRV02N1 07/01/2010 10:24:13,MSSQLSERVER,Error,[sqsrvres] OnlineThread: QP is not online.,Failover,1073760843,,HLSQLSRV02N1 07/01/2010 10:24:13,MSSQLSERVER,Error,[sqsrvres] printODBCError: sqlstate = 08S01; native error = 2746; message = [Microsoft][SQL Native Client]Communication link failure,Failover,1073760843,,HLSQLSRV02N1
I am not sure what this means, but it almost appears like the server wants to fail over (although it never actually does). I do not know if it is a result of the ghost process or if it is the cause of the ghost process. I speculate that there is a process that is stuck in cache somewhere and if I were to fail the cluster over it might release this process.
If anybody has experienced this or can add anything to my thought process, I would appreciate it.
asked
Jul 01 '10 at 09:10 AM
in Default
Dave Myers
123
●
13
●
15
●
16
How much of the environment is virtualised? What's the storage - local disk, NAS, SAN? Is it a regular occurrence? What else is happening when this happens?
No virtual servers. Storage is done on SAN, this just started happening about 2 weeks ago and last about 5-10 seconds
I've seen problems where backup systems cause I/O freezes for a few seconds - and the longer these freezes get, the more, erm, interesting the error messages become.
I must be missing something - but how do you know this has anything to do with DBCC DBREINDEX at all? And what server type / service pack are you on?
Matt, We speculate that is the process based on the alert message received (see first indent) This particular server is running on Windows Server 2003,SP2 and SQL Server Standard Edition, SP 2 running in a clusted environment