XML encoding

I have an agent that exports notes documents to XML files written to an OS folder. The problem is that the XML encoding doesn’t seem to survive. I need for the XML file to be UTF-8. If I do a notesdxlexporter.Process to a stream, it is in UTF-8. And if I just send this straight to an XML file, no problem - the UTF-8 survives. But, if I send the stream to a string variable, and make any edits to it, it loses the UTF-8 encoding and reverts to something else. When I look at the content of the XML file itself, any characters that are not supported by the character set are replaced by a square . Note that these are not XML files to render on the web; these are XML files used to transmit data from one environment to the next.If anyone has more insight on this, or can recommend where to find the more technical info on Lotus XML to answer this question, it would be greatly appreciated.

Thanks,

Mark

Subject: Watch out for unexpected character set conversions

String variables in LotusScript and Java are in Unicode (UTF-16). So when you write data from a stream in UTF-8 format to a string, I would expect that you would get a UTF-8 to UTF-16 conversion. I would not expect any data loss in that conversion, because I believe that all characters that can be represented in UTF-8 can also be represented in UTF-16.

You do not say what you do with the string to write its data to a file. One common pitfall is to create a NotesStream, Open it, write data to it, close it, and not get what you expect. If you look at the doc for NoteStream.Open, you’ll see the second argument is charset and defaults to “System”. In U.S. English locales on Windows, the “System” charset is windows-1252. This charset basically only supports the Latin alphabet, so if you have non-Latin characters in the original data, they will be lost in translation. The simple workaround for this pitfall is to specify an appropriate charset in the NotesStream.Open call.