Issue with iterating through all folders and messages

Dec 14, 2010 at 2:01 AM

I am trying to iterate through all messages in all the folders of a PST. I have pasted two functions that should be functionally the same.
iterate_folders_1(myfile) works fine, wherease iterate_folders_2(myfile) throws an out_of_bound exception at the very first folder.
Can someone point out where I am going wrong?

// This iterates properly through all the folders without issue.
void iterate_folders_1 (pst &myfile)
{
  wcout << "*** Iterating through folders using dereferenced message_iterator" << endl;
  for (pst::folder_iterator iter = myfile.folder_begin();
       iter != myfile.folder_end();
       ++iter) {
    folder fold = *iter;

    wcout << "Folder: "      << setw(30) << left << fold.get_name()
          << "# Messages : " << setw(6) << right << fold.get_message_count()
          << endl;

    int i = 0;
    folder::message_iterator iter_m = fold.message_begin();

    while (iter_m != fold.message_end()) {
      int num_recipients = iter_m->get_recipient_count();
      cout << "Message # " << setw(4) << i << " num_recipients = "
           << num_recipients << endl;
      
      ++iter_m;
      ++i;
    }
  }
}

// This causes an exception as the while loop does not terminate properly
void iterate_folders_2 (pst &myfile)
{
  wcout << "*** Iterating through folders using message_iterator" << endl;
  for (pst::folder_iterator iter = myfile.folder_begin();
       iter != myfile.folder_end();
       ++iter) {

    wcout << "Folder: "      << setw(30) << left << iter->get_name()
          << "# Messages : " << setw(6) << right << iter->get_message_count()
          << endl;

    int i = 0;
    folder::message_iterator iter_m = iter->message_begin();

    while (iter_m != iter->message_end()) {
      int num_recipients = iter_m->get_recipient_count();
      cout << "Message # " << setw(4) << i << " num_recipients = "
           << num_recipients << endl;
      
      ++iter_m;
      ++i;
    }
  }
}

int main (int argc, char *argv[])
{
  if (argc < 2) {
    std::cerr << "Usage: ./pst_reader <pst_file>" << std::endl;
    return -1;
  }

  std::string  s("a.pst.pst");
  std::wstring filename(s.begin(), s.end());

  pst myfile(filename);

  wcout << "Filename = " << filename << endl;
  
  iterate_folders_1(myfile);
  iterate_folders_2(myfile);

  return 0;
}

Coordinator
Dec 14, 2010 at 2:57 AM

The issue is that folder_iterators (and message_iterators) are what's called proxy iterators. Their operator* returns by value, constructing a new object each time it's dereferenced.

In your first example, you store off the result of *iter in a folder object. This is good.

In your second example, you're derefencing the folder iterator multiple times (in the message_begin and message_end call). Each deference returns a different folder object, and as such their iterators are not comparing equal, so you're walking over the end of the message range.

Basically, we don't actually have a collection of folder objects or message objects in memory to iterate over, like we would if they were in an STL container. The number if potentially unbounded, and we have to do disk I/O to construct that list. To be performant we have to construct them on demand as the iterator is moved through the file.

So how to do that is a design issue. An alternative design would call for the iterator to keep a local copy of the folder object it points to and return a reference to the same copy on each dereference. This has this problem though:

 

folder& f = *iter; // f is a reference to whatever iter points to now
wcout << f.get_name(); // works as you'd expect
++iter;
wcout << f.get_name(); // not the same as above!

 

Dec 14, 2010 at 3:42 AM

Terry, thanks for the prompt reply. It's a bit non-intuitive for a noob, but certainly not something one can't get used to with time :) Thanks, again.

Cheers.