NRouter / Connection problems between/to servers

OS = NTServer A = 6.5 (upgraded to 6.5.3 to see if it fixed anything)

Server B = 6.5.1

Server SMTP = 6.5.3 (outside the firewall)

We have entered the twilight zone… I would love some suggestions on what to look at.

We began having replication issues with remote sites, very slow/sporadic replication. It took a while to find the server and then it would start replication, freeze for 30 minutes or so and then burst thru some more data, then hang for a while more. The database that it hangs on is not the same one. The data in the database doesn’t seem to be an issue. We could immediately follow a successful replication and it would hang again. Or sometimes it would go as normal. No consistent pattern. Replication in the office is not a problem.

Then we started having trouble delivering mail to one of our remote servers. We could receive mail from them but not deliver it. (Server B)

Then we could no longer open databases on the remote server. Notes finds it & connects to it, but never opens it. We can ping the servers all day long. (Server B, and people on Server B could not open Server A, but could open DBs on Server SMTP)

At one point, Server A stopped with a fault on NRouter. The offending thread appears to be on the mail.box on a different server outside the firewall. (We upgraded to 6.5.3 after that)

############################################################

FATAL THREAD 7/25 [ nRouter:0a50: 2180]

FP=0x0a62dd54, PC=0x60001263, SP=0x0a62dd4c, stksize=8

EAX=0x00000884, EBX=0x2b428944, ECX=0x00000000, EDX=0x00000000

ESI=0x000000e0, EDI=0x00000a50, CS=0x0000001b, SS=0x00000023

DS=0x00000023, ES=0x00000023, FS=0x00000038, GS=0x00000000 Flags=0x00010246

Exception code: c0000005 (ACCESS_VIOLATION)

############################################################

** VThread [ nRouter:0a50: 7]

.Mapped To: PThread [ nRouter:0a50: 2180]

… SOBJ: addr=0x0a736244, h=0xf0104029 t=c30a (BLK_LOOKUP_THREAD)

… SOBJ: addr=0x0a7b0904, h=0xf0104028 t=ca35 (BLK_TRACECONNECTION)

… SOBJ: addr=0x010ccfc0, h=0xf0104026 t=c130 (BLK_TLA)

… SOBJ: addr=0x07f21e6c, h=0xf0104027 t=c820 (BLK_CLIENT_OPENSESSION_TIME)

… Database: “Server SMTP”!!mail.box

… DBH: 701, By: “Server SMTP”

… Database: D:\Lotus\Domino\Data\mail.box

… DBH: 348, By: “Server A”

We see no unusual traffic on the network, bandwidth appears fine. No changes were made to the Notes.ini file before the ‘collapse’, we’ve also checked the nic cards. We’ve rebooted routers, put in a VPN Accelerator on the router, rebooted servers deleted mail.boxes and the only thing that has consistently worked is taking the server down, turning off the machine and restarting.

Any ideas would be greatly appreciated.

Subject: NRouter / Connection problems between/to servers

Hi Cheryl

I suggest you study the connections to these remote servers. What type of WAN/LAN do you have? Have you tested them using a packet sniffer? Notes is not intermittent but connections sure can be especially since notes works fine in your company. Also check your ports on the firewall. Port 1352 must be open in both directions in each location for notes to work. Likewise ports 110 and 25 need to be open for mail delivery. Also put a trace in the mail options to see how far the mail gets when sending to server B.You can also check the Notes log (Log.nsf) for replication errors and mail delivery smtp connection errors. If you find any errors please post them for troubleshooting.

Also create a SMTP debug output file by adding this to your notes.ini

SMTPdebugio=3

SMTPdebug=1

debug_output=C:\out.txt

I also suggest you open a ticket with IBM/Lotus Customer Support and provide the info and errors you found.

Their paid support is really good.

Good Luck

Subject: RE: NRouter / Connection problems between/to servers

thanks for your input… it turned out to be an mtu/mss setting on a router somewhere in our vpn.

(ps. the answer didn’t come from paid support at IBM - the best they could suggest was defragging our hard drive & doing db maintainance.)

Subject: RE: NRouter / Connection problems between/to servers

we’ve had a sniffer on the network for the past couple of days. We have finally gotten several patterns to show up. To reiterate: We have no trouble ‘connecting’ to the servers, there’s just no data flowing. It’s only happening to clients & server on the vpn (at least 3 different isps). Sniffer does show no data from server and several VPN retransmissions - although Cisco doesn’t seem to think it’s a problem. No error messages on logs or clients, connection will eventually timeout with 'network did not complete in a reasonable amt of time", or in the case of server to server, just never disconnect (leaving hundreds of connections to mail.box)

Ports are ok, trace mail never leaves the originating mailbox, although port trace finds the server just fine.

We’ve upgraded to Domino 6.5.3FP1, we’ve shut down the server & ncompacted & updalled all 156gb. Next step - defrag the hard disk & clean the cache on all remote routers involved.

We’ve got a call into IBM, but I thought I’d try the forum as well.

Thanks, Cheryl