Conundrum - to replicate or recreate?

I’ve inherited a suite of 20 databases which provide weekly financials. The application is replicated to a cluster server. The databases are informational only, they are not modified by users.

A weekly scheduled agent deletes all existing documents. Then a scheduled compaction runs. After that, another scheduled agent imports new financials and parses them into the databases.

The databases are small, but there can be 275,000 docs added per database. Deletion stubs are currently purged every 15 days, so there can be up to 750,000 stubs on a database. Replication can be problematic and time consuming.

Currently, I disable replication while the scheduled agents run, then enable it again.

I’m debating using an agent to delete and recreate the cluster replicas, instead of replicating them. Would that be less efficient or more efficient than using the replicator to update the data?

TIA,

Kristin

Subject: RE: Conundrum - to replicate or recreate?

I’m not sure exactly what you’re considering, but if you mean to delete all replicas of the databases and create brand new ones that therefore contain no deletion stubs, I wouldn’t do that. The databases will also will have new replica IDs and new database IDs (a hidden value), so people who have the database in their bookmarks, their clients would be expecting a different replica ID at the old path. This is likely to cause problems, especially if they create a local replica.

I would suggest rather than deleting and recreating all the documents, you only delete documents corresponding to rows that are actually deleted, and only update or create documents for rows that have been actually updated.

Performance basics for developers (whitepaper) briefly discusses this exact case. The sandbox download referenced in the article contains an agent showing how to replicate data with a relational database.

Subject: RE: Conundrum - to replicate or recreate?

There is a ServerA & ServerB. ServerA is the production server that runs all of the script to manage the content. ServerB is clustered with ServerA and resides on the other side of the firewall. The data in the application files needs to be accessable on ServerB and is read-only.

I am considering deleting the databases from ServerB just before the production agents run on ServerA, and then using xcopy to put a new copy of the replica’s onto ServerB from ServerA after the agents finish running. This would not generate a new replica ID, but it might cause other problems I am not aware of.

The benefit I am hoping to get from doing this is that the xcopy will take about 10 minutes where replication takes approx 6 hours.

I’m mostly concerned about what using file copy will do to the view indexes.

Subject: RE: Conundrum - to replicate or recreate?

Using file operations to delete an nsf from a server is a bit fraught, because the file might be locked if the server is using it, so you might not be able to. I don’t think the view indexes would be a problem. These are just stored in the NSF, and if the file is not in use, there shouldn’t be any information in memory about the old database that would be rendered invalid by your replacing it.

It’s generally a bad idea to create new replicas via file copy because then the replicas then have the same database ID, which raises the possibility of duplicate UNIDs between documents created on different servers. However, since there’s only one server creating documents, it shouldn’t be a problem in your case.

There will be a problem with replication if there are any other servers or local replicas that replicate with the copy you delete and replace. Because the new copy will not remember replicating with any of those other servers, it will want to do a full replication. Of course, since you’ve deleted and re-created every document, it effectively of needs to do that anyway, but that illustrates the whole problem. By doing the file copy, you will have fixed your hours-long replication, but you haven’t done one thing for everyone else’s hours-long replication.

In addition, by deleting and recreating every document, you invalidate all links people may create to the documents (doclinks and URL links). So if someone sends a co-worker an email saying, “look at this!” with a doclink, the recipient clicks the link the next morning and are told “document has been deleted.” The information they were sent a link to does exist in the database, but because it’s in a new document, they have no way to find it using the link.