Dealing with U+FFFD in HTML page streamed into rich text field

Good day Notes Gurus! I was hoping you could assist me with a little yet annoying problem I’m having.

I have been requested to copy a web page into a rich text field (store contents as HTML and MIME property is selected) so that when one opens the document containing this field, one sees the web page as close as possible to actually viewing it on the web.

The code I’ve posted below works great, except that the U+FFFD replacement character is appearing (it usually appears as ? in the rich text field), and I can’t seem to get rid of it. I suspect the problem is in the encoding I’m using on the SetContentAsText method of the mime entity, but trying various other encodings (posted below after the agent code) I still get the same results as in the original code.

Am I on the wrong track? Has anybody else experienced this and found a solution? Thanks in advance for any assistance.


Agent Code:

Sub Initialize

    Dim logAgent As New NotesLog("Agent")

Call logAgent.OpenAgentLog

On Error Goto tagErrorBatch



Dim session As New NotesSession

Dim db As NotesDatabase

Dim dc As NotesDocumentCollection

Dim doc As NotesDocument

Dim docNext As NotesDocument

Dim V As Variant

Dim url As String

Dim mime As NotesMIMEEntity 

Dim stream As NotesStream



Set db = session.CurrentDatabase

Set dc = db.UnprocessedDocuments

Set doc = dc.GetFirstDocument

session.ConvertMIME = False

Set V=CreateObject("Microsoft.XMLHTTP")

While Not doc Is Nothing

	url = doc.GetItemValue(FLD_URL)(0)

	Call v.open("GET",url,False)

	Call v.send(Null)

	If v.status = 200 Then

		If Instr(1, v.responseText, HTML_NO_PAGE_PATTERN) > 1 Then

			If Not session.IsOnServer Then Print "Unable to retreive " & url

			Error ERR_NUM_NO_WEB_PAGE, ERR_MSG_NO_WEB_PAGE & url

		Else

			If doc.HasItem( FLD_WEB_PAGE ) Then

				Call doc.RemoveItem( FLD_WEB_PAGE )

			End If

			Set stream = session.CreateStream

			Call stream.WriteText(v.responseText)

			Set mime = doc.CreateMIMEEntity(FLD_WEB_PAGE)

			Call mime.SetContentFromText (stream, "text/html;charset=UTF-8",ENC_NONE)

			Set mime = Nothing

			Set stream = Nothing

			Call doc.ReplaceitemValue( FLD_LAST_WEB_PAGE_UPDATE, Now )

			Call doc.Save(False, False)

		End If

	Else

		If Not session.IsOnServer Then Print "Unable to retreive " & url

		Error ERR_NUM_NO_WEB_PAGE, ERR_MSG_NO_WEB_PAGE & url

	End If

tagGetNextDoc:

	Set docNext = dc.GetNextDocument( doc )

	Delete doc

	Set doc = docNext

Wend

tagOut:

On Error Resume Next

Print ""

If Isobject(v) And Not v Is Nothing Then Set v = Nothing

session.ConvertMIME = True

Call logAgent.Close

Exit Sub

tagErrorBatch:

Select Case Err

Case ERR_NUM_NO_WEB_PAGE

	Call logAgent.LogError(Err, Error$)

	Resume tagGetNextDoc

Case Else

	If doc Is Nothing Then

		Call LogErrorEx("Error " & Cstr(Err) & " " & Error$, SEVERITY_MEDIUM, Nothing)

	Else

		Call LogErrorEx("Error " & Cstr(Err) & ": " & Error$, SEVERITY_MEDIUM, doc)

	End If

	Call logAgent.LogError(Err, Error$)

	Resume tagOut

End Select

End Sub


'other variations of the SetContentFromText method of the mime entity I’ve tried:

'Call mime.SetContentFromText (stream, {text/html;charset=“ISO-8859-1”},ENC_NONE)

'Call mime.SetContentFromText (stream, “text/html;charset=iso-8859-1”,ENC_IDENTITY_BINARY)

'Call mime.SetContentFromText (stream, “text/html;charset=iso-8859-1”, ENC_QUOTED_PRINTABLE)

'Call mime.SetContentFromText (stream, “text/html;charset=US-ACII”,ENC_NONE)

Subject: Dealing with U+FFFD in HTML page streamed into rich text field

Well it seems that I can remove the U+FFFD character (rendered as ?) by replacing any ? with “”, as long as I did it later in the process than my original effort.

Hope this helps somebody. If not, sorry for polluting this forum.

Have a wonderful day!