Major bug: reproducible failure to send to some domains

This has been reported for some time, on several versions of Domino, and still it occurs on 8.5.

The problem in a nutshell is that the router task intermittently tries to send email to the domain name, rather tan the MX record. A fundamental issue for a router task, IMHO.

I have some info that should lead to a resolution, but it is not rocket science.

It seems that when we get this error it is always that the Router task is reporting that it cannot send email to the host which is the DOMAIN. eg ACME.COM instead of MAIL.ACME.COM

These domains all have one thing in common: there is a DNS entry present which resolves the name of the domain to a different server from the mail server.

Consider this example:

www.acme.com resolves to 111.111.111.111

acme.com resolves to 111.111.111.111

mail.acme.com resolves to 222.222.222.222 as an MX record

(This is done so that people using browsers do not need to put in the “www”)

What the Domino Router task seems to sometimes be doing is to use the name of the domain instead of the correct MX record. It does this randomly, and will not re-query the DNS until a router restart.

here are examples of log entries (old, but the symptoms have not changed)

01/07/2008 12:00:12 AM Router: No messages transferred to EXPERIENT-INC.COM (host EXPERIENT-INC.COM) via SMTP: The server is not responding

…restart the router…

01/07/2008 12:00:40 AM Router: [00000007] Transferring mail to domain EXPERIENT-INC.COM (host mail.global.sprint.COM [216.32.181.22]) via SMTP

Note how the entry in brackets is dramatically different - different machines. Hardly surprising that the mail banks up when it is not trying to go to the correct host.

If you want more details, pls email me: jcantor@netforce.com.au as I am very keen to get this fixed.

Subject: Major bug: router cache not refreshed as expected

Hi,everything is explained in the IBM technote 1102680.

I think that the router’s cache is not refreshed as explained and this only can be forced when router is restarted and the cache completely cleared.

Unfortunately I tried to apply the 1102680 explanations on both domino on Linux and on win platforms without success:

The “reenum=1” process is performed under the following conditions:

  1. When Router is restarted.

  2. When Router detects a change in the Configuration document.

  3. By default, every 60 minutes after the last reenumeration; or per the interval set in the “Dynamic cost reset interval” field (in the Configuration document’s Router/SMTP, Advanced, and Controls tab, and Advanced Transfer Controls section).

Anyway, even changing this values, nothing changed and I am facing the same problem:

-route * → nothing happens.

-nslookup → returns correct values

-messages → pending with “server not responding” reason

-waiting hours, more than configuration settings-> nothing happens

-RESTART the ROUTER → MESSAGES IMMEDIATELY ROUTED

Is this a bug, and if not, what can be done to solve this issue?

Subject: see my post in nd8 forum for a “solution”

http://www-10.lotus.com/ldd/nd8forum.nsf/0/ea379eb72292e1d9852575a500017c82?OpenDocument

Subject: Hotfix Available

Call IBM Support - there is a hotfix for this DNS issue available - we installed 5/16 and the issue seems to be resolved.

Subject: Secondary DNS Fallback

I reported this problem to IBM over a year ago when we were still running 8.0.2. In the end they gave me some debug commands to run the next time it happened so I could capture additional output. The problem is that after enabling the settings I had to restart the router which would fix the problem. It only occurred every few months so I just accepted it and restarted the router when it happened.

When we upgraded to 8.5.1 (skipping 8.5.0) in October I was hoping the problem got fixed. I have still seen it though. The difference now is that it will resolve itself over time. Someone will report an outbound e-mail being returned a couple times. I’ll send a test message to the same domain without problems. They will try their message again and it works. I think maybe the router cache is clearing automatically now.

This morning the issue returned but was not fixed by restarting the router or Domino. The problem turned out to be our primary DNS server was down. However the secondary was still up and running. This wasn’t obvious at first because nothing else in our network was having issues. They had all fallen back on the secondary DNS server (as they should) and were working fine. Domino, however, was failing on the first server and giving up. As soon as I restarted the primary DNS server Domino started delivering outbound e-mails again.

Based on another “solution” posted in this thread it sounds like Domino doesn’t fallback onto the secondary DNS server as it should. Is this the fault of Domino or Windows?

Subject: try restarting your DNS server?

I think I’ve seen this problem as well but I always thought it was my DNS server…

Subject: I’m restarting by DNS. will report on whether this changes things