DXL exporter and importer causes server to crash

hcl-bot · October 23, 2007, 8:28am

I need to replace hyper links in notes documents of a large number of databases. The best methode I could find was to export the document to DXL, replace the urllinks and than import the DXL back into the document.

It worked fine on my test databases. We activated the code on the production server to process the databases. The agent runs fine but several times a day the server crashes without any details in the log. When disableling the agent, the server don’t crash. So I’n sure my agent has something to do with it.

This is the way I works:

A list of databases to process is populated in a database.

The agent runs on server and use the list. Once a database is processed the status is changed.

Because we thought of memory problems on the server, the agent processes max x documents. The agent is scheduled every 15 minutes.

Every Sub/Function has errorhandling.

On some documents I get fatal errors like this

Process: document: F7A - exporter log: <?xml version='1.0'?>

- importer log: <?xml version='1.0'?>

Expected whitespace

Expected equal sign

Expected an attribute name

Attribute ‘s_New_in_BusinessObjects_XI_Release_2.pdf’ is not declared for element ‘urllink’

Attribute ‘false’ is not declared for element ‘urllink’

Import operation incomplete; 0 notes(s) imported successfully

DXL importer operation failed

Sub Initialize

'declarations/initializations

…

Set dcDb2Process = luDb2Process.GetAllDocumentsByKey(“open”)

Set docDb2Process = dcDb2Process.GetFirstDocument

Do While Not docDb2Process Is Nothing

'open the database on the current server

Dim strFilePath As String

strFilePath = GetFilePathOnServer(strServer, docDb2Process.Server, docDb2Process.Pathname)

Set db = New NotesDatabase(strServer, strFilePath)

If db.IsOpen Then

  If ProcessDatabase(db) = True Then

  'update status after we return from the routine, this database doesn't need to be processed again

    docDb2Process.Status = "processed"

    Call docDb2Process.Save(True, True)

  Elseif intNrProcessed => intMaxNrProcDocs Then

    'Database not processed completely. Next run will continue.

    db.Close        

    Exit Do

  Else

    'Database not processed

  End If

  db.Close

End If



Set docDb2Process = dcDb2Process.GetNextDocument(docDb2Process)

Loop

End Sub

Function ProcessDatabase(db As NotesDatabase) As Boolean

On Error Goto ErrorHandler

ProcessDatabase = False

'DECLARE LOCAL VARIABLES

Dim dc As NotesDocumentCollection

Dim doc As NotesDocument

Dim domParser As NotesDOMParser

Dim exporter As NotesDXLExporter

Dim importer As NotesDXLImporter

Set dc = db.AllDocuments

Set doc = dc.GetFirstDocument

Do While Not doc Is Nothing

'skip deletion stubs    

If doc.IsDeleted Then

  Goto NextDocument

  

'skip corrupt documents

Elseif doc.UniversalID=""  Then

  Goto NextDocument

  

End If



blnChanged = False



Set exporter = session.CreateDXLExporter

exporter.ExitOnFirstFatalError = False  

Set domParser = session.CreateDOMParser 

Set importer = session.CreateDXLImporter ( domParser, db )

importer.DocumentImportOption = DXLIMPORTOPTION_REPLACE_ELSE_IGNORE



Call exporter.SetInput ( doc )

Call exporter.SetOutput ( domParser )

Call domParser.SetOutput ( importer )

On Event PostDOMParse From domParser Call ProcessDocument

Call exporter.process



If blnChanged = True Then

  Call doc.Save(True, False)

End If



intNrProcessed = intNrProcessed + 1    

If intNrProcessed => intMaxNrProcDocs Then

  'max nr of documents to process is reached => stop

  Exit Function

End If

NextDocument:

Set doc = dc.GetNextDocument(doc)

Loop

ProcessDatabase = True

Exit Function

ErrorHandler:

Resume NextDocument

End Function

Sub ProcessDocument(Source As NotesDOMParser)

'*** GET URLLINK TAGS

Dim rootElement As NotesDOMElementNode

Set rootElement = Source.Document.DocumentElement

Dim docList As NotesDOMNodeList

Set docList = rootElement.GetElementsByTagName(“urllink”)

If docList.NumberOfEntries > 0 Then

Dim i As Integer

For i = 1 To docList.NumberOfEntries

  

  '*** GET HREF ATTRIBUTE  

  Dim eNode As NotesDOMElementNode

  Set eNode = docList.GetItem(i)

  

  Dim strAtt As String

  strAtt = eNode.GetAttribute("href")

  Dim strAdjAtt As String

  strAdjAtt = strAtt

  

  '*** REPLACE OLD LINKS WITH NEW

  Dim y As Integer

  For y = 0 To Ubound(varOldlink)

    strAdjAtt = ReplaceSubStringCaseIns(strAdjAtt, (Trim(varOldLink(y))), (Trim(varNewLink(y))))

  Next

        

  Call ReplaceParam(strAdjAtt)

  Call walkTree(eNode)

  

  '*** SET NEW HREF ATTRIBUTE

  If Lcase(strAtt) <> Lcase(strAdjAtt) Then

    Call eNode.SetAttribute("href", strAdjAtt)

    blnChanged = True

  End If

  

Next i

End If

Source.Serialize

End Sub

Sub ReplaceParam(strUrl As String)

'contains some if contitions to replace parts of strUrl

End Sub

Sub walkTree ( node As NotesDOMNode)

If Not node.IsNull Then

Select Case node.NodeType

  

Case DOMNODETYPE_TEXT_NODE:

  Dim strNodeValue As String

  strNodeValue = node.NodeValue

  Call ReplaceKBParam(strNodeValue)

  node.NodeValue = strNodeValue

  

Case DOMNODETYPE_ELEMENT_NODE:

  Dim numChildren As Integer      

  numChildren =  node.NumberOfChildNodes

  

  Dim child As NotesDOMNode

  Set child = node.FirstChild     ' Get child

  While numChildren > 0

    Call walkTree(child)

    Set child = child.NextSibling   ' Get next child

    numChildren = numChildren - 1

  Wend      

End Select

End If

End Sub

Do you have an ID what’s the problem? Or is there a better way to handle my needs?

Thanks!

Mieke

hcl-bot · October 24, 2007, 9:49am

Subject: DXL exporter and importer causes server to crash

Yes, I have had similar problems myself and has managed to find ways that (most of the time) seems to work around the issues.

The DXL classes are so full of bugs it is enough to make your hair go gray, but

perhaps these suggestion will help you.

First of all, in order to get the DXL in the correct UTF-8 format I have found

that this technique does the trick.

As you see, I use a stream:

Set stream = session.CreateStream

If Not stream.Open(strFileName) Then Error 1001, "Cannot open " & strFileName

Call stream.Truncate

Set exporter = session.CreateDXLExporter(doc, stream)

exporter.OutputDOCTYPE = False

Call exporter.Process

'-- Feed the DXL stream into a parser.

Set domParser = session.CreateDOMParser(stream)

Call domParser.Process

Call stream.Close

Second, despite the fact that this shouldn’t happen, Domino may have generated

incorrect values for quote-characters within attributes.

So I run this code on the DXL to fix that:

Set nodeList = domParser.Document.GetElementsByTagName(“*”)

For lx = 1 To nodeList.NumberOfEntries

Set domNode = nodeList.GetItem(lx)

Set attribList = domNode.Attributes

For ix = 1 To attribList.NumberOfEntries 

	Set attribNode = attribList.GetItem(ix)

	If Instr(1, attribNode.AttributeValue, {'}, 5) > 0 Then

		attribNode.AttributeValue =

Replace(attribNode.AttributeValue, {'}, {& # 39 ;})

	End If

Next

characters you will find yourself in all sorts of problems.

I have not been able to find an easy way to fix this.

What I do is a use the DOMParser to serialize the DXL back to a file, then call

a function to read each line of the DXL to fix it, then re-load the file into

the DOMParser.

I clumsy solution I know but one that has proven necessary at times.

Set stream = session.CreateStream

If Not stream.Open(strFileName) Then Error 1001, "Cannot open " & strFileName

Call stream.Truncate

Call domParser.SetOutput( stream )

Call domParser.Serialize( )

Call stream.Close

Call fixBrackets(strFileName)

Sub fixBrackets(strFileName As String)

Dim strFileNameTemp As String

Dim strLine As String

Dim fileNumIn As Integer

Dim fileNumOut As Integer

	

'-- Create a temp file with the fixed data.

fileNumIn = Freefile

Open strFileName For Input As #fileNumIn

fileNumOut = Freefile

strFileNameTemp = strFileName & ".temp"

Open strFileNameTemp For Output As #fileNumOut

	

Do While Not Eof(fileNumIn)

	Line Input #fileNumIn, strLine

	If Instr(1, strLine, "[") > 0 Then strLine = Replace(strLine,

“[”, “& # 91 ;”)

	If Instr(1, strLine, "]") > 0 Then strLine = Replace(strLine,

“]”, “& # 93 ;”)

	Print #fileNumOut, strLine

Loop

	

Close #fileNumIn

Close #fileNumOut

'-- Copy data back to original file.

Filecopy strFileNameTemp, strFileName

Kill strFileNameTemp

End Sub

NOTE!!! I couldn’t write the HTML codes ‘as-is’ in the posting, since they got automatically translated back to the character. I have therefore inserted spaces, as in ‘& # 93 ;’ which you of course should remove.

hcl-bot · May 20, 2008, 4:57am

Subject: Great! This got me out of a very deep hole

Thanks for this post Kenneth. I had all but given up a piece of work, after encountering lots of trouble. Your third fix - replace the [ and ] characters with explicit ASCII codes - did the trick for me. In my case, I was trying to import DXL that described a document. the document had a readers field, which contained a role like “[DbAdmin]”. When NotesDXLImporter tried to import it, it fell over with an error “missing tag ”. However if I parsed the DXL using XMLSpy, it reported no such error.

After adding your fix to my code, the problems with NotesDXLImported vanished.

I owe you a big favour now! Best wishes, Ian