Clustering Fail-over Issue

Dear Domino Administrators,

Could you please help to verify the following notes log if fail-over did work right after service unavailability on server01(sbjsdp01)?

server01 was restarted around 18:20 then was completely restored at 19:27. The log shows fail-over were failing at 18:30 (directing open to server01-- shouldn’t it say server02??) and only worked around 19:17(pls see bottom log).

===================================================

sbjsdp02.wk.dcx.com’s log:

06/05/2008 18:30:45 Failing over from sbjsdp01.wk.dcx.com/Coop/Prod/DCX!!sbjsdp02\mikruege.nsf for replica id C1256BC2:00656F54, directing open to sbjsdp01.wk.dcx.com/Coop/Prod/DCX

06/05/2008 19:12:24 Failing over from sbjsdp01.wk.dcx.com/Coop/Prod/DCX!!mail\uauspru.nsf, directing open to sbjsdp02.wk.dcx.com/Coop/Prod/DCX

06/05/2008 19:12:24 Failing over from sbjsdp01.wk.dcx.com!!mail\laryan.nsf, directing open to sbjsdp02.wk.dcx.com/Coop/Prod/DCX

06/05/2008 19:12:25 Failing over from sbjsdp01.wk.dcx.com/Coop/Prod/DCX!!mail\xikong.nsf, directing open to sbjsdp02.wk.dcx.com/Coop/Prod/DCX

06/05/2008 19:12:25 Failing over from sbjsdp01.wk.dcx.com!!mail\leishen.nsf, directing open to sbjsdp02.wk.dcx.com/Coop/Prod/DCX

===================================================

Your inputs will be greatly appreciated!

thank you & best regards

Russel

Subject: Clustering Fail-over Issue

Firstly checked the ClDBDIR.nsf and make sure mikruege.nsf is present on the 2nd Domino server. It should also state that Clustering is in service.

So from the above text I am guessing that no fail overs worked at all???

The last log entries didn’t they work? They are forwarding the OPEN to itself, did these not open???

Do you control load balancing in the cluster via Notes.ini settings? ie max_concurrent_users / Max_Users this would point to why the server would attempt to push back to the downed server. The cluster would accept the second open if it was not picked up by the Clustered server. What I mean is this. If normally the cluster controls access to the number of users for load balancing. It will not accept the open DB from the user because MAX users had been reached. It would pass ti back to another server. If this or these servers do not accept it (being down or at their MAX) the inital server will accept the OPEN.

If this is not the case do a cluster analysis and see what that brings up.

Put any weird findings here.