I have an agent that goes through a view and extracts each attachment from each document (99.9% of time its 1 attachment per doc) to a file directory . The issue that I have is that the database has close to 3/4 million documents and the attachments are all TIF files ranging in size from 15KB to 30KB and sometimes a lot more but KBs only). For that number to be processed it sometimes takes about 24 to 48 hours for the agent to run.
This is excrutiating. Is there an easier (FASTER) way of doing this? What about using DECS?
The other thing is that these extractions happen frequently. If some of the TIF files already exist in the targeted directory could LS investigate that and extract only the attachments that do not have a matching filename in the targeted directory?
Thanks in advance for the prompt response(s). Any help will be greatly appreciated.
As long as people can not delete files from the directory could you not just save the files for new documents and documents where the attachments have changed.
You could set a flag on new documents say NewDoc = “True” and then use @AttachmentNames for existing documents that have been edited to see if a new file has been added or removed from a document and then set another flag say ModifiedDoc = “True”. There is nothing to stop someone from deleting an attachment and then adding an updated version with the same name so you may wish to set this flag whenever a document is edited and saved.
You could then set the agent to only run on documents with these flags and then save or delete files as necessary.
Also what is to stop more than one document from having an attachment with the same filename? Won’t one overwrite the other?
I don’t think you’ll find anything that will bring the full-database extraction down to a reasonable time (even if you can realise a quadrupling of speed, you’d be hard-pressed to fit the agent run and a data backup into the same day, given that it’s probably not all that’s running).
You CAN use Dir, though, to check the filepath and skip the extraction if the file is already present – it returns an empty string if the file isn’t present.
Subject: RE: File extraction need a solution urgently
Hi Stan,thanks for your solution. That is what I estimate as being what is needed to get the attachments, which do not already exist in the targeted filepath. extracted. But given the sample code LN Help gives I don’t know how this can be integrated in my existing code. Any idea(s) on how this can be done? This would be greatly appreciated. I forgot to include my code when I wrote my question. Thanks - Dan
Here it is:
Sub Initialize
Dim session As New NotesSession
Dim db As NotesDatabase
Dim object As NotesEmbeddedObject
Dim collection As NotesDocumentCollection
Dim doc As NotesDocument
Dim ProfileDoc As NotesDocument
Dim downloadfolder As String
Dim NewLine As String
Set db = session.CurrentDatabase
Set collection = db.UnprocessedDocuments
Dim profileAttach_ServerName As NotesItem
Dim profileAttach_Directory As NotesItem
Set ProfileDoc = db.GetProfileDocument("Extraction Settings")
Set profileAttach_ServerName = ProfileDoc.GetFirstItem("Attach_Directory")
Set profileAttach_Directory = ProfileDoc.GetFirstItem("Document_SubDirectories")
’ Error Handler
On Error Goto Error_Handler
NewLine = Chr(10) & Chr(13) ' For the error handler
downloadfolder = profileAttach_ServerName.Text & profileAttach_Directory.Text
If profileAttach_ServerName Is Nothing Or profileAttach_Directory Is Nothing Then
Msgbox "... one of more of the expected items in the Extraction Settings Profile is null."
End If
If Dir(profileAttach_ServerName.Text + profileAttach_Directory.Text,16)="" Then Mkdir(profileAttach_ServerName.Text + profileAttach_Directory.Text)
Print "************************************************************************"
For i = 1 To collection.Count
Set doc = collection.GetNthDocument( i )
filen=Evaluate("@AttachmentNames",doc)
antalfiler=Evaluate("@Attachments", doc)
If Dir(profileAttach_ServerName.Text + profileAttach_Directory.Text ,16)="" Then Mkdir (profileAttach_ServerName.Text + profileAttach_Directory.Text )
Print Str(i)+" ("+Str(collection.count)+")"
If antalfiler(0)>0 Then
For filecounter=0 To antalfiler(0)-1
x=x+1
Print ( filen(filecounter))
Set Object = doc.GetAttachment( filen(filecounter) )
If ( object.Type = EMBED_ATTACHMENT ) Then
fileCount = fileCount + 1
If Dir(downloadfolder + "\"+ filen(filecounter))="" Then
extrachar=""
Else
extrachar=Left(doc.universalid,4)+"---" 'in case attachment with same name exists in several documents
End If
Call object.ExtractFile (downloadfolder +"\"+extrachar+ filen(filecounter) )
End If
Next filecounter
End If
Next
Msgbox Str(fileCount ) + " Attachments were detached to the Attachments folder located in your Home directory " + downloadfolder + " on " + Format(Now(), "Long Date") +"."
Finished:
If filenum% > 0 Then
’ Close the file
On Error Resume Next
Close filenum%
End If
Exit Sub
Error_Handler:
ErrorString = "The following error has occurred:" & NewLine
ErrorString = ErrorString & "Line number: " & Str(Erl) & NewLine
ErrorString = ErrorString & "Error number: " & Str(Err) & NewLine
ErrorString = ErrorString & "Description: " & Error$ & NewLine & NewLine
ErrorString = ErrorString & "Would you like to continue processing?"
If Messagebox (ErrorString, 4 + 16, "An error has occurred") = 7 Then
Subject: RE: File extraction need a solution urgently
Why is the agent scanning a view, instead of processing new and modified documents, since you expect most documents have already been processed?If the document is edited and the attachment updated, don’t you still want to store the updated attachment in the directory? So checking for the existence of the file is maybe not the best plan. You could check the date of the file and see whether the attachment date is more recent (you might use Evaluate({@AttachmentNames + “/” + @Text(@AttachmentModifiedTimes)}, doc) to get this information).
What if two documents contain files with the same name? Do you care what happens then?
You might want to add a field to the document that is assigned when the file is edited, and removed when your agent processes the attachment, so that you can quickly tell which attachments have not been processed. Initially we would be assuming that all documents were processed (or you could do an intermediate agent to just process just documents modified in the last couple of days, to make sure).
I’m not sure whether the LC LSX would be any faster; it would be faster for getting the file dates, I think. See this article: file attachments in LCLSX.