DFS Replication sort of stopped, no errors, please help


hi all, i've been getting little frustrated trying fix hoping please help.

i have 7 servers in 2 namespaces , replica groups (4 servers each, 1 server overlaps) replica group1=[server1a, server1b, server1c, server12] , replica group2=[server2a, server2b, server2c, server12]. (each replica group on server12 on separate volume.

this past weekend server12 had nic flake out. had uninstall , reinstall drivers.

ever since have found server1a , server2a (the masters in full mesh) not sending or receiving replica information other 3 servers. other 3 servers replicating fine 1 (including server12). server12 common thread between 2 replica groups , cannot figure out why primaries on both groups has stopped.

my staging space 32gb , on separate drive on 600gb free

i have:

- rebooted servers (not @ same time)

- ran dfsrdiag polladd

- ran diagnostic reports - no errors

- checked event viewer, system, application, , dfs logs - no errors

- ran dfsrcheck.exe - no errors

- cleaned conflict , deleted , restarted

- bumped conflictandstaging folder 660mb 1024mb , unchecked save deleted files here

- right clicked replica , selected replicate each

- ran reports backlog (server1a had lot)

- set debug 4 think that's more ms cause numbers. no noticeable errors.

as far can find on net i've done besides brute force method of deleting server replica group , readding it, didn't want that.

any appreciated.

thanks


life moves pretty fast. if don't stop , around once in while, miss it.

well, after 24 hours still syncing, 3,000,000 files go. i'm getting 50,000 files hour. have no idea how many files queued up, thing can think of when backup run set archive bit on files therefore files needed resynced. ask ms guess.

anyway further monitor log wrote little batch file in case anyone's interested. i'm going set run daily. you'll need blat, can search yahoo not hard find.

it email report similar this, 1 line each call print_head

***************** share1 *******************
fri 01/04/2013  9:37:02.36 : share1 : server1a member <server1b> backlog file count: 3045548

echo ***************** share1 ******************* >> c:\scripts\dfsreports\dfsbacklog.log

call :print_head server1a server1b share1

etc etc etc... did 1 line each dfs server have

rem ***************** email file *******************

blat c:\scripts\dfsreports\dfsbacklog.log -to <myemail> -server <my relay> -f <myemail> -subject dfs replication health

goto :eof

:print_head
dfsrdiag backlog /receivingmember:%2 /sendingmember:%1 /rgname:%3 /rfname:%3 > c:\scripts\dfsreports\dfsbacklogtmp.log
setlocal enabledelayedexpansion
set /a counter=0

for /f ^"usebackq^ eol^=^

^ delims^=^" %%a in (c:\scripts\dfsreports\dfsbacklogtmp.log) (
        if "!counter!"=="1" goto :eof
        echo %date% %time% : %3 : %1 %%a >> c:\scripts\dfsreports\dfsbacklog.log
        set /a counter+=1
)
erase c:\scripts\dfsreports\dfsbacklogtmp.log


life moves pretty fast. if don't stop , around once in while, miss it.



Windows Server  >  File Services and Storage



Comments

Popular posts from this blog

CRL Revocation always failed

Failed to query the results of bpa xpath

0x300000d errors in Microsoft Remote Desktop client