btree on disk

am 26.10.2007 21:32:00 von Jonathan de Boyne Pollard

WA> The maildir analysis is specious because it assumes usage
WA> patterns that are far from universal, and in fact may be becoming
WA> less common. And it was specious back then as now because it
WA> made a universal statement based on a set of localized conditions
WA> and metrics, which even _then_ were suspect.

M. Weikusat may not comprehend what you are saying, but I do.
However, I think that you are being unjust. M. Crispin _didn't_
overgeneralize from one set of data. His analysis was a deductive
one, not an inductive one. The flaw that people pointed out was not
overgeneralization, but the erroneous axiomatic assumption that all
filesystems behaved in a certain way, from which the deductions were
then made.

WA> That maildir suffered dismal performance for the usage patterns
WA> of interest to the developers at the time they made their
analysis,
WA> I don't doubt. But the totality of the argument against maildir
rose
WA> to more a general level [of] argumentation. There arose a more
WA> general dispute in the IMAP community.

I remember.

WA> Thus, my point about people using "micro-benchmarks" to shoot
WA> down simple solutions. Often people generalize to the extreme,
WA> and unless one has solid data about a specific operating
condition,
WA> *and* that the specific operating condition will actually be the
WA> normal condition (which is a much harder task), then such a
WA> person has failed their burden of proof. All else equal, in
WA> engineering the rule of thumb is to assume the simpler
WA> solution, layered on established mechanism, to be superior.

It's true that people generalize. But often in my experience lack of
context and lack of knowledge play a part, too. For example: There
was a discussion in alt.folklore.computers last month where one poster
was reluctant to run xyr own example program, on the grounds that it
would create over 17,000 files in a single directory. (To some of us,
that's simply a medium-sized news spool directory.) The worry about
creating that number of files all too often stems from a vague
knowledge that it is bad, gleaned thirdhand as received wisdom based
upon sources that were talking about the FAT filesystem format. It's
not necessarily that people are generalizing from FAT to everything
else. It is sometimes that they don't know that (a) the books and
articles that they may have read were only talking about FAT, because
those books and articles were written at a time and for an audience
where FAT was the only format available and so the fact that FAT was
the filesystem format was implicit and often unstated; or that (b)
other filesystems formats don't have this problem because they don't
have linear directory organizations like FAT does, and that the
problem is fomat-specific. They lack the context to realize that the
old article talking about how bad large directory sizes were was
written from the viewpoint of someone who only had the option of FAT,
and they lack the knowledge of filesystem design to know that not all
filesytem formats are the same in this regard.