This project is read-only.

Character encoding issue

Oct 24, 2011 at 7:40 PM

I have noticed something unexpected in the character encoding of certain character strings
when retrieved via pstsdk.  I was assuming that everything is returned in Unicode, but that
does not seem to be true for all cases.  For example, I have a .pst file with the folder
name "Smazaná pošta". In my MAPI application, if I retrieve PR_DISPLAY_NAME_W, I get the
following bytes:

53 00 6d 00 61 00 7a 00 61 00 6e 00 e1 00 20 00   S.m.a.z.a.n... .
70 00 6f 00 61 01 74 00 61 00 00                  p.o.a.t.a..

Note that the lower case 'a' with acute accent is encoded as "E1 00",and the lower case
's' with caron is encoded as "61 01" which is correct Unicode.

Using pstsdk, I have retrieved the same property by using folder.get_name() and also by
reading directly the property 0x3001.  In both cases, the bytes returned, by looking at
c_str(), are:

53 00 6d 00 61 00 7a 00 61 00 6e 00 e1 ff 20 00   S.m.a.z.a.n... .
70 00 6f 00 9a ff 74 00 61 00 00                  p.o...t.a..

For the two characters in question, the second byte is FF, and the first byte represents
the ISO Latin-1 encoding of the character.  So I suppose I do have a means of correcting
the problem, i.e. looking for 'FF' then doing a one byte conversion, but I would like to
understand why pstsdk is not supplying the same bytes that MAPI does.  Is MAPI performing
some internal conversion for applications?  Is my suggested workaround safe for all cases?
I have tested other pst files containing Asian character sets and not had a problem
with pstsdk.