Hyper-V 2012 R2 2-Node Cluster Completely Fails When One Node Shut Down

- September 22, 2013

we have 2 node 2012 r2 cluster running on dell vrtx chassis m620 blades. cluster storage 10 tb shared chassis storage , 21 tb iscsi synology nas. cluster, live migration, management , iscsi on subnets. we have 30 vms on cluster. the quorum , chassis storage csvs owned node #1. the iscsi csv owned node #2.

cluster functional , live migration works fine as long both nodes running.

here’s problem have discovered: need shutdown node #1 dell recommended troubleshooting (1 port on dual port intel 10 gig pci card receiving not sending packets, that’s story.)

when tried drain roles node #1, error “move of cluster role 'cluster group' drain not completed. the operation failed error code 0x138d.”
i attempted move csv disks node #1 node #1 , fails error “clustered storage not connected node” seems clue problem, not sure why i’m getting error.
so go ahead , manually live migrate roles node #2 without problem.
i shut down node #1.
as node #1 gets shut down, quorum disk , 2 other csv disks (which happen owned node #1) go offline. this shouldn’t happen!
since cluster can’t talk quorum disk, whole cluster goes down and, since 2 out of 3 csvs not available node #2, many of vms go down.
when node #1 comes up, i’m able reconnect cluster, , quorum disk online, csv disks still offline.
in failover cluster manager, have “resume” node #1 option “fail roles” (even though had no roles)
then able online csv disks , cluster “normal.”

so seems node 2 having problems talking quorum 2 out of 3 csv disks when node 1 missing. definitely not redundant!

when built cluster year , half ago, validated , working flawlessly. the problems seemed begin after long lasting blue screen issue on node #1 traced bad fan on 1 of 10 gig nics. i suspect networking issue, when run cluster validation, issue pops connection issue our iscsi drive (because have bad port on 1 of our nics, working dell on now.) the iscsi csv owned node #2 , doesn’t go offline when node #1 rebooted.

can offer insight?

thanks!

george moore

the cluster validation network test not test is simple ping between nodes. has been mentioned, there going on storage , in redirected mode on csv. run powershell command get-clustersharedvolumestate on 1 of nodes , tell csv drives connectivity nodes

there 2 things need at. first stateinfo. if says direct, machine has direct access , good. however, if says filesystemredirected or blockredirected on node, has no connectivity it. reason, @ blockredirectioreason parameter. if says nodiskconnectivity, not seeing disks @ all.

you can go system event log , see if having errors relating storage or iscsi. seeing here, need contact storage vendor assistance.

thanks, john marlin microsoft server beta team

Windows Server > High Availability (Clustering)

Search This Blog

Bit

Hyper-V 2012 R2 2-Node Cluster Completely Fails When One Node Shut Down

Comments

Post a Comment

Popular posts from this blog

CRL Revocation always failed

Failed to query the results of bpa xpath

0x300000d errors in Microsoft Remote Desktop client