Body type (rtf, html, plain text)

Oct 14, 2010 at 4:49 AM


I'm new to the sdk and I was trying to retrieve the body of a message. The only functions I see are get_body and get_html_body but there is none for RTF or MIME type. So I tried retrieving them individually the following way but I don't see valid contents for rtf and mime.  How to identify the body type so that I can retrieve the appropriate body property?

if (m.get_property_bag().prop_exists(0x1000)) wstrBody = m.get_property_bag().read_prop<std::wstring>(0x1000);
if (m.get_property_bag().prop_exists(0x1009)) wstrBody = m.get_property_bag().read_prop<std::wstring>(0x1009);
if (m.get_property_bag().prop_exists(0x1013)) wstrBody = m.get_property_bag().read_prop<std::wstring>(0x1013);
if (m.get_property_bag().prop_exists(0x6659)) wstrBody = m.get_property_bag().read_prop<std::wstring>(0x6659);

Thanks in advance.

Oct 14, 2010 at 4:54 AM

It's complicated.

Read up on the "best body" algorithm described in MSDN. Basically the various body types (plain text, html, and rtf) are kept in sync as requested by the client and as the properties are set on the object by the client.

Oct 14, 2010 at 5:25 AM

Thanks for the quick response terrymah. That was helpful. I have another question though. I have an html body in the original message and when i retrieve it through get_html_body call, it returns me some garbage but it returns a valid body through get_body call but the original format of the html message is missing so it looks more like plain text, all pictures in original message are gone. The same is the case if I retrieve them individually using read_prop like mentioned above. Am I doing anything wrong?


Oct 16, 2010 at 3:21 AM

That's due to a naive implementation of read_prop<std::wstring> in object.h (line 329). 

Basically, it goes like this:

HTML is typically transfered in a 8bit character encoding (possibly in ANSI or UTF-8 or some other 7 or 8bit compatible encoding). This means one byte per character. When the property gets stored in the PST, it's stored as PT_BINARY... instead of PT_STRING8. This is unfortunate, because the caller who is reading the stored property has no way of knowing if that binary blob is 8bit character data or 16bit character data. 

The implementation of read_prop looks at the type specified for the property, and if it's PT_STRING8 (aka 001E), it will correctly cast to a ANSI (8bit) string, then convert that to a wchar string (16bit), and return the wchar.... If it's anything else, it simply casts it to wchar via bytes_to_wstring... Unfortunately, since the data was originally stored in 8bit ANSI format, this causes garbage data, as your 8bit character data is now being interpreted as 16bit character data. 

In the .NET wrapper we're working on, we compensated for this by just assuming that the HTML body will always be in 8bit format... as it really doesn't make sense to store it any other way. Our current implementation is a bit of a bad hack, and actually takes the wchar returned from get_HtmlBody() and converts it back to an ANSI string... so it's not ideal. But I'm about to change that to just not use the get_HtmlBody() method, and call the get_value_variable directly to get a byte vector.. then cast to an ANSI string. 

This does leave the potential that we may trash valid Unicode content, if that field is ever stored as 16bit char data (like many of the others are)... I'm not sure of a solid way to detect 16bit vs 8bit when faced with a binary blob of unknown char data. I guess there's a lot of heuristics that could come close in a detection scheme, but wouldn't be perfect all the time.

For reference, here's the implementation of read_prop that get_HtmlBody uses (from object.h)... 


inline std::wstring const_property_object::read_prop<std::wstring>(prop_id id) const
    std::vector<byte> buffer = get_value_variable(id); 

    if(get_prop_type(id) == prop_type_string)
        std::string s(buffer.begin(), buffer.end());
        return std::wstring(s.begin(), s.end());
        return bytes_to_wstring(buffer);


and bytes_to_wstring (util.h : line 221): 

inline std::wstring pstsdk::bytes_to_wstring(const std::vector<byte> &bytes)
    if(bytes.size() == 0)
        return std::wstring();

    return std::wstring(reinterpret_cast<const wchar_t *>(&bytes[0]), bytes.size()/sizeof(wchar_t));





Oct 16, 2010 at 6:22 AM

Very insightful. Thanks a lot Troy. I finally managed to get the correct body. What was looking like garbage was actually a valid body.