File extraction need a solution urgently

I have an agent that goes through a view and extracts each attachment from each document (99.9% of time its 1 attachment per doc) to a file directory . The issue that I have is that the database has close to 3/4 million documents and the attachments are all TIF files ranging in size from 15KB to 30KB and sometimes a lot more but KBs only). For that number to be processed it sometimes takes about 24 to 48 hours for the agent to run.

This is excrutiating. Is there an easier (FASTER) way of doing this? What about using DECS?

The other thing is that these extractions happen frequently. If some of the TIF files already exist in the targeted directory could LS investigate that and extract only the attachments that do not have a matching filename in the targeted directory?

Thanks in advance for the prompt response(s). Any help will be greatly appreciated.

Best regards,

Dan

Subject: File extraction need a solution urgently

As long as people can not delete files from the directory could you not just save the files for new documents and documents where the attachments have changed.

You could set a flag on new documents say NewDoc = “True” and then use @AttachmentNames for existing documents that have been edited to see if a new file has been added or removed from a document and then set another flag say ModifiedDoc = “True”. There is nothing to stop someone from deleting an attachment and then adding an updated version with the same name so you may wish to set this flag whenever a document is edited and saved.

You could then set the agent to only run on documents with these flags and then save or delete files as necessary.

Also what is to stop more than one document from having an attachment with the same filename? Won’t one overwrite the other?

Subject: File extraction need a solution urgently

I don’t think you’ll find anything that will bring the full-database extraction down to a reasonable time (even if you can realise a quadrupling of speed, you’d be hard-pressed to fit the agent run and a data backup into the same day, given that it’s probably not all that’s running).

You CAN use Dir, though, to check the filepath and skip the extraction if the file is already present – it returns an empty string if the file isn’t present.

Subject: RE: File extraction need a solution urgently

Hi Stan,thanks for your solution. That is what I estimate as being what is needed to get the attachments, which do not already exist in the targeted filepath. extracted. But given the sample code LN Help gives I don’t know how this can be integrated in my existing code. Any idea(s) on how this can be done? This would be greatly appreciated. I forgot to include my code when I wrote my question. Thanks - Dan

Here it is:

Sub Initialize

Dim session As New NotesSession

Dim db As NotesDatabase

Dim object As NotesEmbeddedObject

Dim collection As NotesDocumentCollection

Dim doc As NotesDocument

Dim ProfileDoc As NotesDocument

Dim downloadfolder As String

Dim NewLine As String

Set db = session.CurrentDatabase

Set collection = db.UnprocessedDocuments



Dim profileAttach_ServerName As NotesItem

Dim profileAttach_Directory As NotesItem

Set ProfileDoc = db.GetProfileDocument("Extraction Settings")

Set profileAttach_ServerName = ProfileDoc.GetFirstItem("Attach_Directory")

Set profileAttach_Directory = ProfileDoc.GetFirstItem("Document_SubDirectories")

’ Error Handler

On Error Goto Error_Handler

NewLine = Chr(10) & Chr(13) ' For the error handler

downloadfolder = profileAttach_ServerName.Text & profileAttach_Directory.Text

If profileAttach_ServerName Is Nothing Or profileAttach_Directory Is Nothing Then

	Msgbox "... one of more of the expected items in the Extraction Settings Profile is null."		

End If	

If Dir(profileAttach_ServerName.Text + profileAttach_Directory.Text,16)="" Then Mkdir(profileAttach_ServerName.Text + profileAttach_Directory.Text)





Print "************************************************************************"

For i = 1 To collection.Count

	Set doc = collection.GetNthDocument( i )

	filen=Evaluate("@AttachmentNames",doc)

	antalfiler=Evaluate("@Attachments", doc)

	

	If Dir(profileAttach_ServerName.Text + profileAttach_Directory.Text ,16)="" Then Mkdir (profileAttach_ServerName.Text + profileAttach_Directory.Text )

	Print Str(i)+" ("+Str(collection.count)+")"

	

	If antalfiler(0)>0 Then

		For filecounter=0 To antalfiler(0)-1

			x=x+1

			Print ( filen(filecounter))

			Set Object = doc.GetAttachment( filen(filecounter) )

			If ( object.Type = EMBED_ATTACHMENT ) Then

				fileCount = fileCount + 1

				If Dir(downloadfolder + "\"+ filen(filecounter))="" Then 	

					extrachar="" 

				Else 

					extrachar=Left(doc.universalid,4)+"---" 'in case attachment with same name exists in several documents

				End If

				Call object.ExtractFile (downloadfolder +"\"+extrachar+ filen(filecounter) )

			End If

		Next filecounter

	End If

Next

Msgbox Str(fileCount ) + " Attachments were detached to the Attachments folder located in your Home directory " + downloadfolder + " on " + Format(Now(), "Long Date") +"."

Finished:

If filenum% > 0 Then

’ Close the file

	On Error Resume Next

	Close filenum% 

End If

Exit Sub

Error_Handler:

ErrorString = "The following error has occurred:" & NewLine

ErrorString = ErrorString & "Line number: " & Str(Erl) & NewLine

ErrorString = ErrorString & "Error number: " & Str(Err) & NewLine

ErrorString = ErrorString & "Description: " & Error$ & NewLine & NewLine

ErrorString = ErrorString & "Would you like to continue processing?"



If Messagebox (ErrorString, 4 + 16, "An error has occurred") = 7 Then 

’ The ‘No’ button was clicked…abort processing

	Goto Finished

End If 

End Sub

Subject: RE: File extraction need a solution urgently

Subject: RE: File extraction need a solution urgently

Why is the agent scanning a view, instead of processing new and modified documents, since you expect most documents have already been processed?If the document is edited and the attachment updated, don’t you still want to store the updated attachment in the directory? So checking for the existence of the file is maybe not the best plan. You could check the date of the file and see whether the attachment date is more recent (you might use Evaluate({@AttachmentNames + “/” + @Text(@AttachmentModifiedTimes)}, doc) to get this information).

What if two documents contain files with the same name? Do you care what happens then?

You might want to add a field to the document that is assigned when the file is edited, and removed when your agent processes the attachment, so that you can quickly tell which attachments have not been processed. Initially we would be assuming that all documents were processed (or you could do an intermediate agent to just process just documents modified in the last couple of days, to make sure).

I’m not sure whether the LC LSX would be any faster; it would be faster for getting the file dates, I think. See this article: file attachments in LCLSX.