Cluster and replication conflicts

We put 2 servers in a cluster using a dedicated lan port.Server1 receives mail from outside, server2 -at least for now- just replicates from Server1. Users access only Server1.

Both servers have Symantec SMSDom installed.

From the start of the cluster we had a big number of documents being sent back to Server1 from Server2.

In proportion we had also a number of replication conflicts.

Say we had 20 messages being sent back to server1 every minute, and about 1 or 2 conflicts per minute.

Suspecting it was due to SmsDom, I stopped the task on server2 and the number of messages sent back diminished to about 4-5 per minute, and I did non see conflicts yet.

I’m just wondering if someone has any idea how to deal with this problem; i don’t think I can leave the antivirus down on the server2, especially in case server1 goes down …

Any suggestions ?

Subject: cluster and replication conflicts

Just to let others know how I got out of this mess…

I set up a two server test site, where I installed Domino and SMSDOM from scratch following all instructions from Symantec (i.e., I decided to replicate the settings, log and definitions DBs; SMSDOM runs on both primary and secondary server; and the workstation antivirus does not scan Notes DBs and the temporary SMSDOM directory). Then I created the cluster following the instructions appearing in the Administrator manual.

I then tried starting replicas from the primary, then from the secondary server, and all sort of message traffic (internet / internal, from agents, manual, etc), , just to find out that this installation didn’t have any problem of replication or conflict.

At this point I decided to go back to production. I could not redo completely the primary production server, so I wiped out completerly the secondary cluster server, and reinstalled Domino. Uninstalled SMSDOM on the primariy server, and reinstalled it both on the primary and on the secondary server. Recreated the cluster.

Everything is now working fine.

Believe it or not, all this took about 6 hours (but during the week-end…)

Subject: cluster and replication conflicts

To restate the problem:

Every time SMSDOM runs a scan, it seems to modify every document that contains an attachment, in every Notes database on the server. In a clustered environment, this problem can create a “replication storm” because one server updates the document, replicates it to other servers in the cluster, then those servers modify the document, and the process continues. On my servers, it created thousands of replication conflicts.

Cause of the problem:

  1. SMSDOM is not “cluster aware”

  2. A “cool” new setting called “Secure Scanning Optimization.” This must have been designed by some clown at Symantec who doesn’t understand how Domino works. What does it do? It adds a field (X-SSOTag) to every document that it scans. Why? This field tells SMSDOM that the document has already been scanned. So if a document is sent to 100 users, it only gets scanned once instead of 100 times.

Why is this flawed?

  1. Duh! Modifying every document (regardless of the reason) breaks replication!

  2. It makes sense to scan an email only once, instead of once for every recipient. But it doesn’t make sense to use an email-tracking mechanism for existing documents. I’m curious what the Symantec clowns think they gain by such methods. If I forward an existing document, it gets put into a new message without the internal tracking-field, thereby making that field useless.

How can you avoid this problem?

Change your SMSDOM settings to disable “Secure Scanning Optimization.”

How can Symantec fix this?

Change the process so that X-SSOTag is only used for documents that come into mail.box and for real-time scans; DO NOT use that process for scheduled scans!

Subject: cluster and replication conflicts

How many cluster replicators do you have?

JYR

Subject: RE: cluster and replication conflicts

I did a bit of testing. It’s 23:20 here so message traffic is low now; so I restarted ntask on Server2 and send an e-mail

from my hotmail account. Here’s what I get:

—on server1—

23/10/2007 23.19.07 Router: Message 00751B11 delivered to Cesare Bacchini/AR_ENT/IT

23/10/2007 23.19.12 SMTP Server: ardns1-i.ar-ent.net (223.100.50.19) disconnected. 1 message[s] received

23/10/2007 23.19.14 Pushing mail\cbacchin.nsf to CLUSTER arn2/AR_ENT/IT mail\CBacchin.nsf

23/10/2007 23.19.14 Replicator added 1 document(s) to CLUSTER arn2/AR_ENT/IT mail\CBacchin.nsf from mail\cbacchin.nsf

—on server2, immediately after—

23/10/2007 23.19.17 Pushing mail\CBacchin.nsf to CLUSTER arn1/AR_ENT/IT mail\cbacchin.nsf

23/10/2007 23.19.17 Replicator updated 1 document(s) in CLUSTER arn1/AR_ENT/IT mail\cbacchin.nsf from mail\CBacchin.nsf

If I have ntask down on server 2, I have only the events shown on Server1…

Subject: RE: cluster and replication conflicts

Hi

You can run this command on your clustered servers:

show stat replica.cluster*

It will give you something similar:

Replica.Cluster.Docs.Added = 5022

Replica.Cluster.Docs.Deleted = 62226

Replica.Cluster.Docs.Updated = 162746

Replica.Cluster.Failed = 4149

Replica.Cluster.Files.Local = 1057

Replica.Cluster.Files.Remote = 2066

Replica.Cluster.Retry.Skipped = 592538 <–check this

Replica.Cluster.Retry.Waiting = 0

Replica.Cluster.SecondsOnQueue = 0 <–check this

Replica.Cluster.SecondsOnQueue.Avg = 18 <–check this

Replica.Cluster.SecondsOnQueue.Max = 1894 <–check this

Replica.Cluster.Servers = 2

Replica.Cluster.SessionBytes.In = 378840196

Replica.Cluster.SessionBytes.Out = 1747925166

Replica.Cluster.Successful = 433515

Replica.Cluster.WorkQueueDepth = 0 <–check this

Replica.Cluster.WorkQueueDepth.Avg = 7 <–check this

Replica.Cluster.WorkQueueDepth.Max = 363 <–check this

If you run good servers, with dedicated network card, you should run at least 4 instances. One of my client was running 6 instances of cluster replicator without any problems.

Subject: RE: cluster and replication conflicts

thank you very muchI had to shut down the cluster but I’m going to set up a test environment in a few days, so i’ll do that and post back the result.

But there’s something which I can’t understand: is it right that if both servers run an antivirus, then both have to modify the incoming documents for scanning them ??

In other words: how should an antivirus be set up in a cluster environment ?

Subject: RE: cluster and replication conflicts

I think you should run it on one side only.

One of my client was scanniong every mail from the SMTP server, to the the hub, to the mail server and AGAIN on the client side.

So it was too much, also with this configuration it was “sometimes” a problem (NSD) on the client side.

The scans were removed from the client side.

Every

Subject: RE: cluster and replication conflicts

I see… I thought that too, but in case of failover, how would you start the antivirus? With a Notes agent running without interruption ? With an agent running every 5 minutes ?

It does not sound very “protected”…

Subject: RE: cluster and replication conflicts

one per server … here’s the show task on each server…


Server1 tasks


Database Server Perform console commands

Database Server Cluster Manager is idle

Database Server Cluster Administrator is idle

Database Server Listen for connect requests on TCPIP

Database Server Load Monitor is idle

Database Server Database Directory Manager Cache Refresher is idle

Database Server Organization Name Cache Refresher is idle

Database Server Idle task

Database Server Log Purge Task is idle

Database Server Idle task

Database Server Perform Database Cache maintenance

Database Server Idle task

12 more database server idle tasks

Database Server Platform Stats is idle

Database Server Shutdown Monitor

Database Server Process Monitor

Database Server Listen for connect requests on CLUSTER

Agent Manager Executive ‘3’: Idle

Admin Process Idle

Cluster Replicator Idle

Cluster Directory Idle

SMSDOM WriteScanner Idle x 30

SMSDOM Mail Scanner Idle x 30

SMSDOM Process Running. Use “tell sav help” for more info.

LDAP Server Listen for connect requests on TCP Port:389

LDAP Server Utility task

SMTP Server Listen for connect requests on TCP Port:25

SMTP Server Utility task

POP3 Server Listen for connect requests on TCP Port:110

POP3 Server Utility task

MT Collector Idle (next collection in 11 secs, interval is 900 secs)

Agent Manager Executive ‘1’: Idle

Agent Manager Executive ‘2’: Idle

Process Monitor Idle

SMTP Server Control task

SMSDOM PAS Process Running

HTTP Server Listen for connect requests on TCP Port:80, 443

POP3 Server Control task

Router Idle

LDAP Server Control task

Directory Indexer Idle

Indexer Idle

Rooms and Resources Idle

Calendar Connector Idle

Schedule Manager Idle

Agent Manager Idle

Admin Process Idle

Replicator Idle

Event Monitor Idle


Server 2 tasks


Database Server Perform console commands

Database Server Cluster Manager is idle

Database Server Cluster Administrator is idle

Database Server Listen for connect requests on CLUSTER

Database Server Listen for connect requests on TCPIP

Database Server Load Monitor is idle

Database Server Database Directory Manager Cache Refresher is idle

Database Server Organization Name Cache Refresher is idle

Database Server Idle task

Database Server Log Purge Task is idle

Database Server Idle task

Database Server Perform Database Cache maintenance

Database Server Idle task x 12

Database Server Platform Stats is idle

Database Server Shutdown Monitor

Database Server Process Monitor

Cluster Replicator Idle

Admin Process Idle

Agent Manager Executive ‘2’: Idle

Cluster Directory Idle

POP3 Server Listen for connect requests on TCP Port:110

POP3 Server Utility task

SMTP Server Listen for connect requests on TCP Port:25

SMTP Server Utility task

HTTP Server Listen for connect requests on TCP Port:80

Agent Manager Executive ‘1’: Idle

MT Collector Idle (next collection in 5 secs, interval is 900 secs)

Process Monitor Idle

SMTP Server Control task

Agent Manager Idle

Schedule Manager Idle

Calendar Connector Idle

Directory Indexer Idle

Admin Process Idle

Indexer Idle

POP3 Server Control task

Router Idle

Replicator Idle

Event Monitor Idle