Character encoding of rich text

I’ve got an app using the C API to parse rich text fields (composite data). I am hung up on how to figure out what character encoding method is used for text. In most cases the encoding is LMBCS, but I’ve seen examples where the encoding is codepage 437 (OEM). Is there a field in one of the composite data types that has this information? I have not been able to find any documentation on this, nor do I see any other message property that would have this.

Subject: Are you sure you don’t have MIME data there?

I’ve never heard of rich text that’s not LMBCS.

Subject: all text in notes/domino is lmbcs…

…so the text you see is probably lmbcs, code group 1.

code group 1, which is very close to codepage 850. is the default code group used in notes/domino. cp850 is a character set designed for use in western europe…

Subject: rich text encoding - but but…

the guys are right to the extent that there is no intention to store any other encoding in rich text item than LMBCS, and contract is to interpret as LMBCS whatever text is contained in the RT item.

However that does not mean it is impossible. Especially if the rich text has been created in pre-unicode days on Win 95 or MacOS.

In that case there is no good way to tell this from RT properties. Probably if you know this situation can occur within your data (the RT will show up invalid in Notes client today, but there might have been a setup this has been working), you can have write some code that analyzes the content and decides if this is the “special encoding” and decode it accordingly.