This project is read-only.

Error streaming attachments

Jan 1, 2011 at 12:16 AM

I haven't fully investigated this one yet (it's New Year's Eve and it's 4PM), but was wondering if there was any other report of this...

Trying to use trunk pstsdk 0_3_0 with the EID update to parse a large (800MB to 1.5GB) PST file, and it's blowing out when I call


    std::ofstream imgFile(newfn, std::ios::out | std::ios::binary);
    imgFile << attch;


I'm iterating through the attachments with 

    if(msg.get_attachment_count() > 0)
        for(pstsdk::message::attachment_iterator iter = msg.attachment_begin(); iter != msg.attachment_end(); ++iter)
            saveAttachment(foldername, messageID, wEid, *iter);

where "attch" in the snippet above is the parameter which "*iter" becomes.

Looking into the code, it appears that what is happening is we are using the ofstream write method to write a vector<pstsdk::byte>, but the vector is empty, so trying to write attch.length() bytes from it fails falling off the end of the vector. I don't think it's a problem with my code, because it only happens on some attachments, not on all, and not on all PST files; but it happens repeatably on the PST files where it does fail.

I expect it'll be Wednesday before I get back to look at this, but I thought I should at least throw the floor open to suggestions...

Jan 1, 2011 at 4:17 AM

Looks like the attachment streaming code assumes that all attachments are at least 1 byte. This seems (seemed?) like a pretty safe assumption to make.. Do the other properties on the attachment indicate it's zero sized? What is this attachment exactly?

By design pstsdk doesn't work around invalid data, but if it turns out that zero length attachments are valid it should handle it properly. Is there any chance I could have a PST with just one message demonstrating this behavior?

Jan 1, 2011 at 4:54 AM

What it looks like is that either the allocation for the vector failed, or the copy that is supposed to fill the vector from the attachment failed silently. The attachment size is valid (several hundred K in one case); the attachment in one failing PST file is an Excel spreadsheet, in another is an unknown DAT file.

I won't be able to get to the code again until Wednesday, likely, so I won't be able to give you many more details; I do know that it happens a few hundred messages in if it's going to happen at all, and there are, unfortunately, constraints on who I can ship these PST files to.

Jan 5, 2011 at 9:17 PM

Here's what I'm seeing - line 202 of propbag.h:

inline std::vector<pstsdk::byte> pstsdk::property_bag::get_value_variable(prop_id id) const
    heapnode_id h_id = (heapnode_id)get_value_4(id);
    std::vector<byte> buffer;

        node sub(m_pbth->get_node().lookup(h_id));
        buffer.resize(sub.size());, 0);
        buffer = m_pbth->get_heap_ptr()->read(h_id);

    return buffer;

id is 14081 (0x3701). h_id is 0. (!) Because is_subnode_id() returns false, we drop to the get_heap_ptr()->read call, which fails because h_id is 0.

So it looks like we have an attachment that is completely valid except for its content pointer...


Jan 6, 2011 at 6:21 AM

Yes, as you may have guessed 0 is a special value meaning there is a zero length allocation.

Is there anything else unique about this attachment? Try iterating over all of it's properties and compare them to a "normal" attachment. Is there any other distinguishing factor common to all of the "failed" attachments versus normal ones?

Jan 7, 2011 at 1:27 AM

All of the properties of the failing attachment seem to be normal, except its length, which is 0.

I wrote code based on the property bag iterator in another message, and came up with this for a good attachment:

MAPI3701_0102:Attachment body, length 96
MAPI370a_0102:192 bytes: 2A 86 48 86 F7 14 03 0A 04 

And this for one that fails.

MAPI3701_0102:Attachment body, length 0
MAPI370a_0102:160 bytes: 2A 86 48 86 F7 14 03 0A 04 

Not a lot of difference... The property name is MAPI, then property ID in hex, then property type in hex. Except for 3701, the attachment itself, the string following the colon is the value.

Feb 11, 2011 at 6:48 PM

Sounds like the PST layer should handle zero length attachments. In the meantime as a work around it sounds like you need to explicitly check the attachment length before trying to open a stream on it.

Feb 11, 2011 at 7:49 PM

I have managed to get it running by doing that. That PST is somewhat mangled anyway; I find that there is a message in it with invalid (truncated) compressed RTF as well. That one is particularly annoying as the RTF seems to have been truncated, then compressed, so length and CRC are correct but the resulting RTF is invalid.