We have database contains more thatn 200,000 documents. Is there any easy way to find the duplicate documents based on a particular field. If agent is used, it will throw time out after processing some documents. If view is used, it will take more time to refresh. These would bring performance issue.
Is any other way to approach this?
Subject: Easy way to find duplicate documents
Hi
-
create a view with the key field in first col. ascending.
-
write a agent, which will process each doc, one by one, and pick up the duplicate ones.
i am not very sure about any other better performance & easiest methods than this.
HTH
regards
ramesh
Subject: RE: Easy way to find duplicate documents
I’m not sure what you mean, “If view is used, it will take more time to refresh.” If this is the key identifier of the document, surely you have some view already that lists documents in that order.
Also, to save time, you might run this agent only on new and modified documents. Instead of scanning the view, use it to look up documents with a matching key to the new/modified document. This will take longer the first time it’s run, but even if it times out, the UnprocessedDocuments collection should be able to remember which ones it’s already looked at and pick up where it left off next time.
If you’re deleting the dups, make sure you use the NotesView.AutoUpdate property to avoid having to update the index each time. This might require some clever coding to remember which keys you already looked up and how many matches there were originally.