Determine character set in e-mails

Hi,

I have the following problem: I want to upload e-mails to an external server via HTTP using the multipart/form-data MIME content-type. Since e-mails could be written in several different character encodings, I need to somehow determine it by inspecting the e-mail document.

I stumbled across the GetMIMEEntity method of NotesDocument, but calling it on an e-mail body always returns Nothing. I furthermore read something about setting ConvertMime to false for a session, but that didn’t have any impact for me (I guess it’s only relevant when creating/sending e-mails, not reading from e-mails that have already been processed by Notes?)

So, how can I determine what character encoding is used to encode the Body field of an e-mail in my inbox?

Thanks,

Matt

Subject: Determine character set in e-mails

Well, convertMime=false should have worked – if the message was actually stored in natiive MIME format, that is. If the message was stored in CD format (due to the preference setting in the recipient’s person document in the Domino Directory), then the initial encoding information is gone. In that case, though, you can use UTF-16. (The data is stored in LMBCS, but it is converted to UTF-16 when LotusScript reads it.)

But on the other hand, for native MIME data (which has the $NoteHasNativeMIME item in the document) this might actually be more difficult than you already believe it is. If the content-type is text/html, then the charset in the MIME header can actually be over-ridden by an “http-equiv” meta tag inside the HTML data, and I’m not sure about this, but I don’t think that the LotusScript methods pick up on that.

-rich

Subject: RE: Determine character set in e-mails

Rich,

I just came back to the problem we discussed and I observed something interesting: I compose HTTP packages and send them to my server for evaluation. Since you said Lotus Notes uses UTF-16, I sent the payload (text from an email) using the mime type “text/plain; charset=UTF-16”. Interpreting the text on the server as UTF-16 however resulted only in garbage strings. So I tried setting the character set to UTF-8, and that worked perfectly.

So, doesn’t that mean Lotus Notes does NOT use UTF-16 for encoding documents, but UTF-8 or some subset of it?

Best,

Matt

Subject: RE: Determine character set in e-mails

Hm, that’s interesting, I didn’t know about $NoteHasNativeMIME. Actually, no email in my inbox has this field set. Where do you say one can turn off removing the MIME headers of incoming (internet)mail?