filesystem related probs with Maildir on MacOS X
filesystem related probs with Maildir on MacOS X
am 24.02.2005 23:50:16 von Christian Ebert
Hello,
Who has experience with maildir as storage format on MacOS X?
I am trying to switch from mbox to maildir but experience huge
performance problems with maildir folders that contain more than
approximately 5000 messages. After a lot of testing I can exclude
e.g the MUA as the culprit, also journaling or things like
running a clamav daemon.
Basically it boils down to the following:
$ time ls ~/Maildir/test/cur | wc -l
53203
real 5m21.234s
user 0m0.300s
sys 0m6.960s
and sometimes a lot longer.
As opposed to:
$ time ls -R / 2>/dev/null | wc -l
459319
real 4m3.746s
user 0m3.610s
sys 0m16.910s
So listing about ten times as much "normal" files takes less time
than listing the messages in a maildir!
Mark: the maildir is not corrupted, it can be read perfectly by
e.g mutt -- only it takes an impossible time to load.
Can anybody explain this? Any ideas what might be wrong?
TIA
c
--
[...] wirklich! wie ich jetzt bin,
hab ich keinen Namen für die Dinge
und es ist mir alles ungewiß.
_HÖLDERLIN: H Y P E R I O N_
Re: filesystem related probs with Maildir on MacOS X
am 25.02.2005 01:08:04 von Mark Crispin
On Thu, 24 Feb 2005, Christian Ebert wrote:
> I am trying to switch from mbox to maildir but experience huge
> performance problems with maildir folders that contain more than
> approximately 5000 messages.
>
> So listing about ten times as much "normal" files takes less time
> than listing the messages in a maildir!
Did you do this on different systems? If it's the same system, the list
from root should take longer than the list of your maildir since the
former will include the latter.
If it is the same system, then you're seeing the effects of buffer cache,
and to do an intelligent comparison you have to do something to make sure
that the buffer cache is flushed.
As for why your 5000 message maildir is slow, the answer is "because some
filesystems are that way." Unlike flat files, which more or less perform
similarly on all filesystems, one-file/one-message mail stores such as
maildir are very dependent upon the type of filesystem.
I don't know what filesystem is on MacOS X by default, but my experience
with older BSD type systems was that one-file/one-message was a disaster
after you had more than about 1000 messages. SVR4 is a different animal,
as is Linux; and of course many systems offer multiple filesystem type
choices.
If you can't switch your filesystem to something else on MacOS X and are
determined to use maildir, consider setting up a Linux system with an IMAP
server and access your mail via IMAP. The maildir fan community can
advise you as to what sort of Linux system and filesystem to use.
My own choice is to use a mailbox format that is not as dependent upon the
underlying filesystem to perform well.
-- Mark --
http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.
Re: filesystem related probs with Maildir on MacOS X
am 25.02.2005 01:09:35 von npc
In article <2005-02-24.22-33-41@krille.blacktrash.org>,
Christian Ebert wrote:
>Hello,
>
>Who has experience with maildir as storage format on MacOS X?
I don't have direct experience with this, but I do have a lot of
experience with email performance and filesystem issues in general.
>I am trying to switch from mbox to maildir but experience huge
>performance problems with maildir folders that contain more than
>approximately 5000 messages. After a lot of testing I can exclude
>e.g the MUA as the culprit, also journaling or things like
>running a clamav daemon.
Yeah. I presume that you're using HFS+, the default filesystem for
OS X. Lots of good information on it is available here:
http://developer.apple.com/technotes/tn/tn1150.html
>Can anybody explain this? Any ideas what might be wrong?
Based on the information you've provided, and taking it all at face value,
I'm betting it's something in the structure of the filesystem, although
at first glance I can't tell you what that would be. My first thought
was that HFS+ might be doing linear directory lookups, but it would seem
that it uses b-trees for directory storage, so that's probably not it.
This is only based on a quick glance, but it sounds like HFS+ might store
metadata for each file in a bunch of different places on each volume.
If that's the case, then the overhead associated with accessing a single
file might be large requiring several disk head seeks. If so, and I don'
t know that this theory is correct, that would account for the problems
you're seeing.
In any case, you're experiencing large per-file overhead, and this is
causing you performance problems. Unless there's something unknown
going on, as I see it, you have four choices:
1) Improve your disk I/O (buying more/faster spindles) so much that you
don't notice your overhead problems. I really don't like this solution,
but you might be able to "throw money at it".
2) Replace HFS+ with a different filesystem. As I recall, OS X supports
the UFS filesystem as well. I'm guessing, but don't know, that this version
of UFS dates before directory hashing and soft updates features were added
to UFS. If it doesn't support these features, then switching filesystems
may not help you much or at all.
3) Give up maildir. If your performance is better using a one file per
mailbox format, then so be it. I don't know what it was that drove you
to switch. If you can switch back, that would seem to be worth doing.
4) Move your mail service to an OS that supports a filesystem that reacts
better to large directories. Either Linux or any BSD would qualify
(although I'll reserve my diatribe against several of the Linux filesystems
for another time).
There are probably other pathological choices you could make (such as
restricting the number of messages per mailbox), but I'm guessing you're
going to have to choose from one of these four options. If I were in
your shoes, I'd go with #3 unless I couldn't, in which case I'd go with
#4. Maybe that's just me.
Hope this helps.
--
Nick Christenson
npc@gangofone.com
Re: filesystem related probs with Maildir on MacOS X
am 25.02.2005 20:48:04 von Christian Ebert
* Mark Crispin on Fri, Feb 25, 2005:
> On Thu, 24 Feb 2005, Christian Ebert wrote:
>> I am trying to switch from mbox to maildir but experience huge
>> performance problems with maildir folders that contain more than
>> approximately 5000 messages.
>>
>> So listing about ten times as much "normal" files takes less time
>> than listing the messages in a maildir!
>
> Did you do this on different systems?
No.
> If it's the same system, the list
> from root should take longer than the list of your maildir since the
> former will include the latter.
Oops. Sorry, forgot to mention that I removed/stuffed in tarball
the maildir before listing from root.
Sort of becomes a habit because if I try a 53000 msg maildir the
machine may almost hang up to 20 minutes.
Strange thing: after extracting such a maildir from a tarball the
problems seem to have vanished ...
> If it is the same system, then you're seeing the effects of buffer cache,
> and to do an intelligent comparison you have to do something to make sure
> that the buffer cache is flushed.
>
> As for why your 5000 message maildir is slow, the answer is "because some
> filesystems are that way."
I see ;-)
> Unlike flat files, which more or less perform
> similarly on all filesystems, one-file/one-message mail stores such as
> maildir are very dependent upon the type of filesystem.
Hm.
> I don't know what filesystem is on MacOS X by default,
It's called HFS+.
Apparently it has also a strange manner of handling atime:
[ this time for mbox ]
$ cd ~/Mail/
$ ls -ul testmbox; cat testmbox >/dev/null; date; ls -ul testmbox
-rw------- 1 chris chris 12103547 25 Feb 10:33 testmbox
Fri Feb 25 20:40:52 CET 2005
-rw------- 1 chris chris 12103547 25 Feb 10:33 testmbox
which forced to compile mutt with either --enable-buffy-size or
--enable-nfs-fix options to have it handle message flags
reliantly. I am wondering if this could also have an influence on
the maildir issue.
> but my experience
> with older BSD type systems was that one-file/one-message was a disaster
> after you had more than about 1000 messages. SVR4 is a different animal,
> as is Linux; and of course many systems offer multiple filesystem type
> choices.
Well, I won't change the fs, and am in no pressure to /have/ to
switch to maildir; I'm more wondering what could exactly cause
this slow down of performance.
THX for answering.
c
--
So dacht ich. Nächstens mehr.
_HÖLDERLIN: H Y P E R I O N_
Re: filesystem related probs with Maildir on MacOS X
am 25.02.2005 21:04:10 von Christian Ebert
* Nick Christenson on Fri, Feb 25, 2005:
> In article <2005-02-24.22-33-41@krille.blacktrash.org>,
> Christian Ebert wrote:
>> Who has experience with maildir as storage format on MacOS X?
>
> I don't have direct experience with this, but I do have a lot of
> experience with email performance and filesystem issues in general.
>
>> I am trying to switch from mbox to maildir but experience huge
>> performance problems with maildir folders that contain more than
>> approximately 5000 messages. After a lot of testing I can exclude
>> e.g the MUA as the culprit, also journaling or things like
>> running a clamav daemon.
>
> Yeah. I presume that you're using HFS+, the default filesystem for
> OS X. Lots of good information on it is available here:
>
> http://developer.apple.com/technotes/tn/tn1150.html
Thanks for the link. In ten years I might actually understand ten
percent of what's written there.
There might even be the answer to question there, but I probably
wouldn't recognize even if it jumped right into my eyes.
>> Can anybody explain this? Any ideas what might be wrong?
>
> Based on the information you've provided, and taking it all at face value,
> I'm betting it's something in the structure of the filesystem, although
> at first glance I can't tell you what that would be. My first thought
> was that HFS+ might be doing linear directory lookups, but it would seem
> that it uses b-trees for directory storage, so that's probably not it.
I am quite sure that the disk diagnosis tools talk about b-trees.
> This is only based on a quick glance, but it sounds like HFS+ might store
> metadata for each file in a bunch of different places on each volume.
> If that's the case, then the overhead associated with accessing a single
> file might be large requiring several disk head seeks. If so, and I don'
> t know that this theory is correct, that would account for the problems
> you're seeing.
I was only thinking that there must be other Mac users that use
maildir and was wondering whether they don't experience those
problems. -- Google brought up only very vague hints that MacOS
is not suited for maildir.
[ snipped choices ]
I am not *desperate* to switch to maildir, it all came from
experimenting with a patch for mutt that creates a header
database and lets you open your mailboxes faster -- if the fs
plays along.
I still find it strange that a listing from / takes so much less
time, and why a listing of a maildir can almost hang up my
machine. To my naïve eyes this looks almost like messages in a
maildir contain some dirty secret ;-)
c
--
Der Feind ist unsere eigene Frage als Gestalt.
- Carl Schmitt
Re: filesystem related probs with Maildir on MacOS X
am 26.02.2005 00:53:33 von Sam
This is a MIME GnuPG-signed message. If you see this text, it means that
your E-mail or Usenet software does not support MIME signed messages.
--=_mimegpg-commodore.email-scan.com-15761-1109375612-0007
Content-Type: text/plain; format=flowed; charset="US-ASCII"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
Christian Ebert writes:
> Sort of becomes a habit because if I try a 53000 msg maildir the
> machine may almost hang up to 20 minutes.
>
> Strange thing: after extracting such a maildir from a tarball the
> problems seem to have vanished ...
Sounds to me like your disk is failing. It's got a bunch of
marginally-readable blocks, and it's spinning a few minutes per block before
it finally manages to read each one succesfully.
Then, your extracted tarball landed elsewhere on the disk, that's still
readable without any problems.
--=_mimegpg-commodore.email-scan.com-15761-1109375612-0007
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQBCH7p8x9p3GYHlUOIRAhh7AJ9x3RsTwIFwhqCPFDVmbXgvyirnfwCc CNga
qyEe8cPGXoadW/oc8U+il5o=
=7ZL5
-----END PGP SIGNATURE-----
--=_mimegpg-commodore.email-scan.com-15761-1109375612-0007--
Re: filesystem related probs with Maildir on MacOS X
am 26.02.2005 01:24:37 von Mark Crispin
On Fri, 25 Feb 2005, Christian Ebert wrote:
> Strange thing: after extracting such a maildir from a tarball the
> problems seem to have vanished ...
OK, this is important. Without adding, removing, or altering any messages
in the maildir, what happens once the buffer cache no longer contains it?
Best way to be sure it's out of the buffer cache; after extracting the
maildir from the tarball, reboot the system. [There are better ways, but
this is the easiest.]
What happens afterwards? Is the maildir slow again? Or does it remain
nice and crisp?
If it's slow again, then the likely reason why it was fast was because of
the buffer cache.
Similarly, if it's still nice and crisp after it is definitely out of the
buffer cache, then the like reason is directory related; e.g., the refresh
put all the messages in order in the directory.
> Well, I won't change the fs, and am in no pressure to /have/ to
> switch to maildir; I'm more wondering what could exactly cause
> this slow down of performance.
Well, I've offered some reasons. The maildir fan community can probably
offer others since they would have more experience in getting maildir to
perform well. In general, though, in order for maildir to perform well,
the filesystem must offer excellent directory search and fast metadata
lookup. Not all filesystems do this.
Also, in your ls test, note that ls sorts the results; it's possible that
ls might have a badly-implemented sort (e.g. bubble sort) that goes away
when the directory is recently refreshed and sorted. This assumes,
however, that the tar/extract step sorted during compression or
extraction; and that may not be a valid assumption.
-- Mark --
http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.
Re: filesystem related probs with Maildir on MacOS X
am 26.02.2005 01:31:46 von Mark Crispin
On Fri, 25 Feb 2005, Sam wrote:
> Sounds to me like your disk is failing. It's got a bunch of
> marginally-readable blocks, and it's spinning a few minutes per block before
> it finally manages to read each one succesfully.
> Then, your extracted tarball landed elsewhere on the disk, that's still
> readable without any problems.
That's a definite possibility and is worth investigating. But I'd check
the buffer cache possibility first.
-- Mark --
http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.
Re: filesystem related probs with Maildir on MacOS X
am 26.02.2005 03:41:09 von Christian Ebert
* Mark Crispin on Sat, Feb 26, 2005:
> On Fri, 25 Feb 2005, Christian Ebert wrote:
>> Strange thing: after extracting such a maildir from a tarball the
>> problems seem to have vanished ...
>
> OK, this is important. Without adding, removing, or altering any messages
> in the maildir, what happens once the buffer cache no longer contains it?
>
> Best way to be sure it's out of the buffer cache; after extracting the
> maildir from the tarball, reboot the system. [There are better ways, but
> this is the easiest.]
>
> What happens afterwards? Is the maildir slow again? Or does it remain
> nice and crisp?
Nice and crisp.
> If it's slow again, then the likely reason why it was fast was because of
> the buffer cache.
>
> Similarly, if it's still nice and crisp after it is definitely out of the
> buffer cache, then the like reason is directory related; e.g., the refresh
> put all the messages in order in the directory.
>
>> Well, I won't change the fs, and am in no pressure to /have/ to
>> switch to maildir; I'm more wondering what could exactly cause
>> this slow down of performance.
>
> Well, I've offered some reasons. The maildir fan community can probably
> offer others since they would have more experience in getting maildir to
> perform well. In general, though, in order for maildir to perform well,
> the filesystem must offer excellent directory search and fast metadata
> lookup. Not all filesystems do this.
>
> Also, in your ls test, note that ls sorts the results; it's possible that
> ls might have a badly-implemented sort (e.g. bubble sort) that goes away
> when the directory is recently refreshed and sorted. This assumes,
> however, that the tar/extract step sorted during compression or
> extraction; and that may not be a valid assumption.
ok, I try with ls -f (output not sorted) on a maildir that wasn't
compressed before:
$ cd ~/testmdir/cur ; time ls -f | wc -l
12953
real 0m25.552s
user 0m0.060s
sys 0m0.740s
$ cd ../.. ; time tar c testmdir | gzip -c >testmdir.tar.gz
real 1m31.801s
user 0m6.250s
sys 0m6.110s
$ time rm -r testmdir/
real 3m58.442s
user 0m0.120s
sys 0m11.320s
[ errmh, almost 4 minutes ]
$ time gzip -dc testmdir.tar.gz | tar xf -
real 0m21.913s
user 0m1.430s
sys 0m6.480s
reboot ... back on the chain gang:
$ cd ~/testmdir/cur ; time ls -f | wc -l
12953
real 0m0.698s
user 0m0.060s
sys 0m0.120s
$ cd ../.. ; time rm -r testmdir/
real 0m6.210s
user 0m0.090s
sys 0m2.910s
That's 6 seconds compared to 4 minutes!
c
--
So is das Leben / Eben. / Eben und flach. / Ach.
--->
Re: filesystem related probs with Maildir on MacOS X
am 26.02.2005 03:52:39 von Christian Ebert
* Sam on Fri, Feb 25, 2005:
> Christian Ebert writes:
>> Sort of becomes a habit because if I try a 53000 msg maildir the
>> machine may almost hang up to 20 minutes.
>>
>> Strange thing: after extracting such a maildir from a tarball the
>> problems seem to have vanished ...
>
> Sounds to me like your disk is failing. It's got a bunch of
> marginally-readable blocks, and it's spinning a few minutes per block before
> it finally manages to read each one succesfully.
>
> Then, your extracted tarball landed elsewhere on the disk, that's still
> readable without any problems.
Wouldn't I experience problems elsewhere then? This maildir thing
is the only issue I've met so far; what kind of task would show
this disk failing also? I have no problems with loads of video
editing, LaTeX compiling ...
c
--
Wer auf sein Elend tritt, steht höher.
_HÖLDERLIN: H Y P E R I O N_
Re: filesystem related probs with Maildir on MacOS X
am 26.02.2005 04:17:04 von Sam
This is a MIME GnuPG-signed message. If you see this text, it means that
your E-mail or Usenet software does not support MIME signed messages.
--=_mimegpg-commodore.email-scan.com-15761-1109387824-0010
Content-Type: text/plain; format=flowed; charset="US-ASCII"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
Christian Ebert writes:
> * Sam on Fri, Feb 25, 2005:
>> Christian Ebert writes:
>>> Sort of becomes a habit because if I try a 53000 msg maildir the
>>> machine may almost hang up to 20 minutes.
>>>
>>> Strange thing: after extracting such a maildir from a tarball the
>>> problems seem to have vanished ...
>>
>> Sounds to me like your disk is failing. It's got a bunch of
>> marginally-readable blocks, and it's spinning a few minutes per block before
>> it finally manages to read each one succesfully.
>>
>> Then, your extracted tarball landed elsewhere on the disk, that's still
>> readable without any problems.
>
> Wouldn't I experience problems elsewhere then? This maildir thing
Only if that "elsewhere" also happens to use a failing portion of the disk.
--=_mimegpg-commodore.email-scan.com-15761-1109387824-0010
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQBCH+owx9p3GYHlUOIRAmKRAJ9ZjKRkRQUOCnr5AGTwqVrMcxOANwCd HQSQ
91uoRRFlVXxKctf5dPzsY6w=
=ALay
-----END PGP SIGNATURE-----
--=_mimegpg-commodore.email-scan.com-15761-1109387824-0010--
Re: filesystem related probs with Maildir on MacOS X
am 26.02.2005 11:01:16 von Christian Ebert
* Sam on Sat, Feb 26, 2005:
>>> Sounds to me like your disk is failing. It's got a bunch of
>>> marginally-readable blocks, and it's spinning a few minutes per block before
>>> it finally manages to read each one succesfully.
>>>
>>> Then, your extracted tarball landed elsewhere on the disk, that's still
>>> readable without any problems.
>>
>> Wouldn't I experience problems elsewhere then? This maildir thing
>
> Only if that "elsewhere" also happens to use a failing portion of the disk.
Hm. As I did this for testing at least ten times, creating
maildirs either by mutt or procmail /always/ would go to a
failing portion of the disc, whereas extracting etc. -- not to
speak of the rest of my work -- /always/ would happen on a
"healthy" portion.
BTW. fsck -y -f runs fine.
c
--
--->>