Question on Document Collection

Hi All –

I have a very large database that has almost 500,000 documents and is over 8 gig. This database is continually getting documents sent into it. We have several agents that process these documents in different ways depending on the type of document. These agents run on schedule throughout the day.

I wanted to make sure that I had the best method for handling these documents. We have usually used the standard walk the view method, but we also have a few agents that uses notesviewentrycollection class. We did have a problem with this agent when we had over 30,000 documents to process, but the agent is very complex. I’m not sure if the walk the view method would be faster. For each document, we are reading and editing fields.

I have checked the Performance Considerations redbook (http://www.redbooks.ibm.com/pubs/pdfs/redbooks/sg245602.pdf) and I could not find any specific references that said which way was faster. I did see a reference in a whitepaper by Team Studio that they thought the faster method when processing was probably the walk the view method (“If you wish to get a handle to a document, grab some data or do some processing to the document and then move on to the next document in a view. Then, it is probably most efficient to use db.GetView and iterate through all the documents in the view.” - http://www.teamstudio.com/OptimizingLotusScriptWhitePaper.pdf). Has anyone seen any benefit of using one class over the other? What about the NotesViewNavigator class?

Below is a sample script for notesviewentrycollection:

Dim view As NotesView

Dim vc As NotesViewEntryCollection

Dim e As NotesViewEntry

Dim ne As NotesViewEntry

Dim doc As NotesDocument

Set view = db.GetView(“(Process)”)

Set vc = view.AllEntries	

Set e = vc.GetFirstEntry()

Do Until e Is Nothing

	Set ne = vc.GetNextEntry(e)
	Set doc = e.Document

Set e = ne

		Loop

Below is sample notesview script:

Dim db As NotesDatabase

Dim view As NotesView

Dim doc As NotesDocument 

Set db = session.CurrentDatabase

Set view = db.GetView("(Process)")

view.AutoUpdate=False

Set doc = view.GetFirstDocument



While Not doc Is Nothing

	<processing>

Set doc = view.GetNextDocument(doc)

Wend

Donna

Subject: RE: Question on Document Collection

If you’re going to access the document’s items in any case, it’s probably faster to just use GetNextDocument rather than mess around with an extra object.

That’s just my opinion, though. You could do a test and find out.

Subject: RE: Question on Document Collection

How about using NotesViewEntryCollection for around 140,000 documents (in one case) and around 10,000 documents (in another case) “in one of the views in that database”, when compared to processing documents one by one, by GetFirstDocument and GetNextocument(doc) with view.AutoUpdate = false.

Kindly express your opinion in these two cases. NoteViewEntryCollection is faster than NotesDocumentCollection as everyone knows. But want to check with you all, the faster method in those two cases with view property AutoUpdate=False.

Thank you

Subject: Question on Document Collection

Donna,

 I don't know specifics of your DB design, and I have never dealt with such huge DBs, but generally speaking, too many views can slow performance and views with @Now definitely slow things down as the view never stops refreshing.  This may not be the case for you.



 We make great use of NotesDocumentCollection and the NotesDatabase.Search method to enable us to locate any document(s) we want without requiring any special (i.e., hidden for design purposes) views.  See below...

Dim s As NotesSession, db As NotesDatabase, dc As NotesDocumentCollection, doc As NotesDocument

Dim alternateDate As New NotesDateTime(“Today”), lastBusinessDate As String

Dim query As String

Call alternateDate.AdjustDay(-1) ’ Yesterday

lastBusinessDate = alternateDate.DateOnly ’ Strip out 12:00 AM time component.

’ Status and ProcessDate are field names, “Form” field you get by default.

’ Date comparisons require square brackets in @formula language.

query = {Form = “Products” & Status = “Approved” & ProcessDate = [} & lastBusinessDate & {]}

Set s = New NotesSession

Set db = s.CurrentDatabase

Set dc = db.Search(query, Nothing, 0)

If dc.Count = 0 Then End 'If no docs were found, stop processing.

For n=1 To dc.Count ’ Cycle through all docs.

Set doc = dc.GetNthDocument(n)

...etc.

Next n

Ken

Subject: Use notesView.autoUpdate

One thing that can greatly improve performance when iterating over a view using notesView.getFirstDocument() and notesView.getNextDocument() methods is to set notesView.autoUpdate=False prior to iterating.