XML Encoding, UTF-8 / UTF-16 Confusion

Here’s a frustrating little problem I found when a service I deal with (we’ll call it SystemA for “Awesome”) suddenly changed character encoding…
My app was suddenly getting parse exceptions for XML messages after an upgrade to SystemA was deployed to a test environment. A peak at my logs showed the xml response looked funky, with extra spaces all throught it… no wonder my XML API went blooey:

< ? x m l v e r s i o n = ” 1 . 0 ” e n c o d i n g = ” U T F – 8 ” ? >

I blinked a little, then tried a copy paste from the log file to put into a bug note and got this little gem from textpad:

Cannot cut, copy, or drag and dtop text containing null (code = 0) characters.
Cannot cut, copy, or drag and dtop text containing null (code = 0) characters.

Sweet!

I opened the file up in a Hex Editor, and low and behold there were extra nulls chars all through it. Even though the xml header specified UTF-8, it looked like it was actually encoded in UTF-16.

(more…)