Hyper-V 2012 R2 2-Node Cluster Completely Fails When One Node Shut Down
we have 2 node 2012 r2 cluster running on dell vrtx chassis m620 blades. cluster storage 10 tb shared chassis storage , 21 tb iscsi synology nas. cluster, live migration, management , iscsi on subnets. we have 30 vms on cluster. the quorum , chassis storage csvs owned node #1. the iscsi csv owned node #2.
cluster functional , live migration works fine as long both nodes running.
here’s problem have discovered: need shutdown node #1 dell recommended troubleshooting (1 port on dual port intel 10 gig pci card receiving not sending packets, that’s story.)
- when tried drain roles node #1, error “move of cluster role 'cluster group' drain not completed. the operation failed error code 0x138d.”
- i attempted move csv disks node #1 node #1 , fails error “clustered storage not connected node” seems clue problem, not sure why i’m getting error.
- so go ahead , manually live migrate roles node #2 without problem.
- i shut down node #1.
- as node #1 gets shut down, quorum disk , 2 other csv disks (which happen owned node #1) go offline. this shouldn’t happen!
- since cluster can’t talk quorum disk, whole cluster goes down and, since 2 out of 3 csvs not available node #2, many of vms go down.
- when node #1 comes up, i’m able reconnect cluster, , quorum disk online, csv disks still offline.
- in failover cluster manager, have “resume” node #1 option “fail roles” (even though had no roles)
- then able online csv disks , cluster “normal.”
so seems node 2 having problems talking quorum 2 out of 3 csv disks when node 1 missing. definitely not redundant!
when built cluster year , half ago, validated , working flawlessly. the problems seemed begin after long lasting blue screen issue on node #1 traced bad fan on 1 of 10 gig nics. i suspect networking issue, when run cluster validation, issue pops connection issue our iscsi drive (because have bad port on 1 of our nics, working dell on now.) the iscsi csv owned node #2 , doesn’t go offline when node #1 rebooted.
can offer insight?
thanks!
george moore
the cluster validation network test not test is simple ping between nodes. has been mentioned, there going on storage , in redirected mode on csv. run powershell command get-clustersharedvolumestate on 1 of nodes , tell csv drives connectivity nodes
there 2 things need at. first stateinfo. if says direct, machine has direct access , good. however, if says filesystemredirected or blockredirected on node, has no connectivity it. reason, @ blockredirectioreason parameter. if says nodiskconnectivity, not seeing disks @ all.
you can go system event log , see if having errors relating storage or iscsi. seeing here, need contact storage vendor assistance.
thanks, john marlin microsoft server beta team
Windows Server > High Availability (Clustering)
Comments
Post a Comment