btree on disk

btree on disk

am 26.10.2007 17:47:07 von Jonathan de Boyne Pollard

WA> It's also a fair assessment, I think, so say that it's a major
WA> pain that UW-IMAP doesn't support Maildir, and that a
WA> disinterested third party, with full view of all the other Unix
WA> software which supports Maildir (mutt, etc), would agree that
WA> the reticence is a bit puzzling. On _my_ system Maildir, with
WA> 20+ users, has proven far superior to mbox. But, for various
WA> reasons, I stick with UW-IMAP and mbox (for non-shell
WA> users), notwithstanding that the users who insist on keeping
WA> 10MB attachments in their folders continue to whine incessantly.






You can make it use Maildir format mailboxes if you want to.

Re: btree on disk

am 26.10.2007 20:13:47 von Mark Crispin

This stuff about mbox/maildir is a strawman based upon misrepresentation.

The filesystem tests that I performed were over a decade ago, on FFS and
(shudder!) AdvFS. Those filesystems did indeed perform quite poorly once
you get to about 1200 files per directory. Modern filesystems do quite a
bit better, but they do not eliminate this scaling problem.

I work with mail stores with 6 (soon 7) digit message counts containing
multiple GB in a single mailbox. Neither mbox nor maildir fufill that
requirement.

I never advocated mbox as a good mailbox format. Let's be clear: I have
always hated that format. mbox is the UW imapd default for one reason
only: mbox is the default in the default mailer on most UNIX systems. It
is otherwise a terrible format, unless you are an oldtimer in the habit of
running vi on mailbox files.

There are multiple, widely-distributed, third party implementations of
maildir for UW imapd. They are easy to get, and easy to add. They are
less than fully satisfactory. I do not believe than I can do a better job
and still comply with DJB's definition of maildir.

To address the technical problems with supporting maildir in IMAP, both
Courier (maildir++) and Dovecot extend maildir beyond DJB's definition.
To my knowledge, these extensions have not been formally published and
accepted as the replacement for DJB's definition. Should that ever
change, I will certainly revisit the question of developing and supporting
maildir in UW imapd.

UW imapd supports other formats, any of which are superior to mbox. One
format, designed in the mid 1990s, easily handles the users with 10MB
mailbox attachments on a 20 user system. Its 21st century replacement is
extensible, indexed, and directory structured (not one-file/one-message
though).

The criticism, noted on the maildir Wikipedia entry about the numerous
mutually-incompatible mail store formats used by different mail systems,
is valid. However, that does not mean that maildir is the one and only
true solution. The developers of various mail stores are all groping for
answers for the various technical problems that they face. To a
surprising degree they share their findings with each other.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.

Re: btree on disk

am 26.10.2007 21:03:56 von Mark Crispin

On Fri, 26 Oct 2007, Mark Crispin wrote:
> The filesystem tests that I performed were over a decade ago, on FFS and
> (shudder!) AdvFS. Those filesystems did indeed perform quite poorly once you
> get to about 1200 files per directory. Modern filesystems do quite a bit
> better, but they do not eliminate this scaling problem.

In an attempt to do a controlled experiment, I compared the times to do
the following IMAP operations
tag SELECT mailbox
tag THREAD REFERENCES UTF-8 ALL
on two mailboxes with 54,315 messages, one in UW imapd's mix format and
one in maildir-style one-message/one-file with Internet-style CRLF
newlines to control for newline conversion costs. Both were copied from a
common source.

du shows the mix format occupying 233368 blocks and the maildir-style
occupying 304520 blocks.

Creating each of these mailboxes took vastly different times. Mix took 2
minutes, while the maildir-style took 4.25 minutes. The big difference
was in system time: 5.45 seconds for mix vs. 150.43 seconds for the
maildir-style. That's all directory-manipulation time.

This test does not access message metadata such as RFC822.SIZE. In most
UW imapd formats, metadata is calculated as part of open time, but in
some other systems this is costly.

With mix holding a thread cache, the test took two seconds on an EXT3
filesystem. Without the thread cache, the test took 10 seconds. The
maildir-style used 18 seconds.

Both of the latter tests did the same amount of disk reads and computation
for message parsing. The difference was entirely in how the data was
organized; mix did far fewer disk opens.

Don't read too much into of these tests (or any others for that matter).
There's always smoke and mirrors involved, and your mileage WILL vary.
This was EXT3fs; other filesystems will deliver different times.

It's also all too easy to focus on the thing that a particular
implementation does well. Note my comments above on metadata and the
thread cache. If I had added a fetch of RFC822.SIZE and not did the
separate test without the thread cache, the different would have been much
more extreme.

If someone wants to suggest one of the third-party maildir drivers for UW
imapd for me to try, I will repeat the test with it. However, I think
that excessive reliance upon such tests is silly; the point is not that
one is better than the other, but that there are legitimate disagreements
over the proper way to design a mail store.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.