Character encoding problem with REST agent

hcl-bot · July 15, 2015, 3:40pm

I have developed a web agent acting as a RESTful service. The agent accepts POST requests to receive blocks of JSON data.

The JSON is UTF-8 encoded and the HTTP request is stating the encoding used.

However, the (JSON) text received in the Request_content field of the agent context document is somewhat encoded differently or plain wrong.

In my case I have difficulties with German Umlaute üÜöÖäÄß

I have created for test purposes an HTTP request with a Chrome extension. I do not send JSON here, but it doesn’t matter for the effect:

POST /SOAPGATEQ_5.NSF/REST4Documents?openagent HTTP/1.1 Host: domino.flexdomino.net Content-Type: application/json; charset=UTF-8 (I have used text/html, but the result is the same) Content-Length: 7
Müller (send as part of the body)

HTTP/1.1 200 OK Server: Lotus-Domino Date: Wed, 15 Jul 2015 20:31:13 GMT Connection: close Content-Type: application/json; charset=utf-8 Content-Length: 266

{“error”:{“text”:“[Error.REST4Documents] 27, CLASS:JSONREADER<PARSE: line 131> ERROR: 1000: Invalid JSON format. (Block character mismatch ASCII(77,114) M├╝ller… Context: Current character = ‘M’; Previous character = ‘M’; Remaining string = ‘M├╝ller’.”}}

Ignore the fact that the return is an error as quite naturally the JSON parser complains about the single word “Müller”,

HOWEVER, please note what the context document received … “M├╝ller”

Whatever I do from a sending point of view, change in content type or change in charset, the result is the same or similar,

I get funny characters instead of the German Umlaute.

The only way I get a result I can eventually work with is to use application/x-www-form-urlencoded,

in which case all UTF-8 characters that are not 1 byte length are encoded with %hex%hex.

I have not yet tried if I can work around the problem using this method with @URLDecode to get my proper UTF-8 JSON data with correct German Umlaute,

but then again, it is quite some overhead for something that should work without any decoding requirement.

Is there something I’m missing (server side settings), specific character code to be used other than UTF-8? Or is this simply a bug in Domino?

hcl-bot · July 22, 2015, 11:28pm

Subject: JSON snapps classes

Hi,

I see you are using the JSON Lotusscript classes from OpenNTF. I just rewrote them for performance reasons, however, they might also address your problem.
The original used the standard Lotusscript string handling functions. These are slow.
The rewrite gobbles up the JSON via a NotesStream and then uses Unicode (via Uni) to parse and build (via UChr$) the objects.
Parsing speed is now 150.000 JSON chars/sec instead of 30.000 JSON char/sec.
I haven’t completely finished (when an error happens, the entire stack of errors is shown instead of just the breadcrumb trail) but you’re welcome to try the code.

Oh dearie me I cannot post it here. I’ll try to find you… Or try and find me…

hcl-bot · July 16, 2015, 5:52am

Subject: Pardon my ignorance…

I haven’t been for a while on the IBM pages…where do I find the list of SPRs and where do I create a PMR?

IGNORE. Figured it out.

hcl-bot · July 15, 2015, 4:31pm

Subject: I’d create a PMR and reference SPR PHEY9Y9HTR

SPR PHEY9Y9HTR was created in the last week and looks pretty similar to this. No resolution at this time.