WS2K8 R2 Cluster does not detect Generic Service failure


we have service set generic service cluster resource named qtrans-bpplog. have resource set restarted automatically in case of failure.

what's happening when service fails or crashes, cluster unaware of fact service down , doesn't restart it. if go services.msc applet, can see service not running. service process gone in task manager. however, cluster administrator still shows service online. restart, have bring resource offline online again. can help?

here excerpt of cluster log 1 of times brought online , crashed right away cluster doesn't see it. note there resource failed in group there no dependencies between resource , qtrans-bpplog/

00000d14.00001ea8::2015/06/24-15:26:23.248 info  [nm] received request client address ncsmcdwtst02.

00000d14.00002134::2015/06/24-15:31:23.131 info  [nm] received request client address ncsmcdwtst02.

---- bringing offline qtrans-bpplog, not running cluster thinks it's online because didn't detect previous failure
00000d14.00002134::2015/06/24-15:31:34.706 info  [rcm] rcm::rcmapi::offlineresource: (qtrans-bpplog)
00000d14.00002134::2015/06/24-15:31:34.862 info  [rcm] transitiontostate(qtrans-bpplog) online-->offlinecallissued.
00000d14.00002134::2015/06/24-15:31:34.862 info  [rcm] rcm::rcmgroup::updatestateifchanged: (ncsmcdwtst-b, failed --> pending)
00000d14.00002010::2015/06/24-15:31:34.862 info  [rcm] handlemonitorreply: offlineresource 'qtrans-bpplog', gen(2) result 997.
00000d14.00002010::2015/06/24-15:31:34.862 info  [rcm] transitiontostate(qtrans-bpplog) offlinecallissued-->offlinepending.
00000f20.000021a0::2015/06/24-15:31:34.862 info  [res] generic service <qtrans-bpplog>: service died or not active more; status = 1062.
---- cluster realized service down, when brought offline

00000f20.000021a0::2015/06/24-15:31:34.862 info  [res] generic service <qtrans-bpplog>: service offline.
00000f20.000021a0::2015/06/24-15:31:34.862 info  [rhs] resource qtrans-bpplog has come offline. rhs report resource status rcm.
00000d14.00002010::2015/06/24-15:31:34.862 info  [rcm] handlemonitorreply: offlineresource 'qtrans-bpplog', gen(2) result 0.
00000d14.00002010::2015/06/24-15:31:34.862 info  [rcm] transitiontostate(qtrans-bpplog) offlinepending-->offlinesavingcheckpoints.
00000d14.000008ac::2015/06/24-15:31:34.862 info  [rcm] transitiontostate(qtrans-bpplog) offlinesavingcheckpoints-->offline.
00000d14.000008ac::2015/06/24-15:31:34.862 info  [rcm] rcm::rcmgroup::updatestateifchanged: (ncsmcdwtst-b, pending --> failed)

---- bringing qtrnas-bpplog online...
00000d14.00002134::2015/06/24-15:31:38.139 info  [rcm] rcm::rcmapi::onlineresource: (qtrans-bpplog)
00000d14.00002134::2015/06/24-15:31:38.201 info  [rcm] transitiontostate(qtrans-bpplog) offline-->onlinecallissued.
00000d14.00002134::2015/06/24-15:31:38.201 info  [rcm] rcm::rcmgroup::updatestateifchanged: (ncsmcdwtst-b, failed --> pending)
00000d14.00001e80::2015/06/24-15:31:38.217 info  [rcm] handlemonitorreply: onlineresource 'qtrans-bpplog', gen(2) result 997.
00000d14.00001e80::2015/06/24-15:31:38.217 info  [rcm] transitiontostate(qtrans-bpplog) onlinecallissued-->onlinepending.
00000f20.00002334::2015/06/24-15:31:39.745 info  [res] generic service <qtrans-bpplog>: service running.
00000f20.00002334::2015/06/24-15:31:39.745 info  [rhs] resource qtrans-bpplog has come online. rhs report status change rcm
00000d14.00001e80::2015/06/24-15:31:39.745 info  [rcm] handlemonitorreply: onlineresource 'qtrans-bpplog', gen(2) result 0.
00000d14.00001e80::2015/06/24-15:31:39.745 info  [rcm] transitiontostate(qtrans-bpplog) onlinepending-->online.
00000d14.00001e80::2015/06/24-15:31:39.745 info  [rcm] rcm::rcmgroup::updatestateifchanged: (ncsmcdwtst-b, pending --> failed)
---- qtrans-bpplog crashed @ 15:31:48, cluster doesn't see failure

00000d14.00002520::2015/06/24-15:34:14.047 info  [nm] received request client address ncsmcdwtst02.

your log not enough issue identification


Windows Server  >  High Availability (Clustering)



Comments

Popular posts from this blog

CRL Revocation always failed

Failed to query the results of bpa xpath

0x300000d errors in Microsoft Remote Desktop client