About word-encoding (RFC2047) design

am 09.08.2006 18:30:20 von stefano.sabatini-lala

>From RFC 2047:

>While it is unfortunate that these programs do not correctly
>interpret RFC 822 headers, to "break" these programs would cause
>severe operational problems for the Internet mail system.
CUT
>The syntax of encoded-words is such that they are unlikely to
>"accidentally" appear as normal text in message headers.
>[end of quote]

Does it mean that in some case a text containing *accidentally* encoded
words is misinterpreted?

If this is the case, for example a subject containing reference to some
encoded word will be decoded, instead of being interpreted literally.

The problem is that in order to get a sort of semi-functionality with
brain-damaged software we get something that doesn't work perfectly
*ever* (and that brokes unexpectedly sometimes, maybe in critical
situation).

It appears to me a very bad design choice. The only reasonable choice
would be to simply reject all brain-damaged software and design
something that could work in each case, and not simply in *most* cases,
for example an header specifying the encoding to use for decoding the
following headers.

I would like to hear other opinions.

Regards.

Re: About word-encoding (RFC2047) design

am 09.08.2006 19:13:49 von Mark Crispin

Trust me. As a veteran of many standards-compliance wars over a period of
3 decades, I can say with authority that it is completely unrealistic to
expect that existing widely-deployed brain-damaged software will ever get
fixed.

The only chance of winning any standards-compliance battle is to break
brain-damaged software that is not yet deployed, so that it is fixed prior
to deployment.

All other battles end with "your client is the only one that has a problem
with Microsoft Blurdybloop server", or "your server is the only one that
causes a problem with Netscape sarasoop." The names of Very Big Vendors
are considered to be trump cards that override explicit text in standards
documents.

I'm all for fighting the good fight for standards compliance; but if there
is a reasonable way to avoid a particular fight, it's better to do so and
allows you to save resources to fight the wars that can not be avoided.

Encoded-words have worked very well over the past 15 or so years.

If you want to see why we have such problems, just look at the thread with
subject "ESMTP AUTH PLAIN" in this newsgroup. A correct understanding of
how to do this requires thorough reading and understanding of three
different RFCs. Most people refuse to do this; instead, they guess by
empirical testing and they get it wrong.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.

Re: About word-encoding (RFC2047) design

am 09.08.2006 19:57:58 von mem

In article ,
Mark Crispin wrote:
>
>If you want to see why we have such problems, just look at the thread with
>subject "ESMTP AUTH PLAIN" in this newsgroup. A correct understanding of
>how to do this requires thorough reading and understanding of three
>different RFCs. Most people refuse to do this; instead, they guess by
>empirical testing and they get it wrong.

Oh-oh, have I said something wrong?

Personally, I find necessary:

- a ton of reading of RFCs and documents;
- referring to same RFCs and documents continually forever after;
- often, reading of existing code;
- a bunch of empirical testing to test understanding of the readings;
- a bunch of interoperability testing to pit others' understandings
of the reading against one's own;
- passage of time to find out all the gotchas, one by one, if ever.

And then, once you think you are through all that, others to find problems
and show you just how wrong you can be :)

mm

Re: About word-encoding (RFC2047) design

am 09.08.2006 21:54:52 von stefano.sabatini-lala

Mark Crispin wrote:

> Trust me. As a veteran of many standards-compliance wars over a period of
> 3 decades, I can say with authority that it is completely unrealistic to
> expect that existing widely-deployed brain-damaged software will ever get
> fixed.
>
> The only chance of winning any standards-compliance battle is to break
> brain-damaged software that is not yet deployed, so that it is fixed prior
> to deployment.
>
> All other battles end with "your client is the only one that has a problem
> with Microsoft Blurdybloop server", or "your server is the only one that
> causes a problem with Netscape sarasoop." The names of Very Big Vendors
> are considered to be trump cards that override explicit text in standards
> documents.
>
> I'm all for fighting the good fight for standards compliance; but if there
> is a reasonable way to avoid a particular fight, it's better to do so and
> allows you to save resources to fight the wars that can not be avoided.
>

The RFC seems to me build an error on top of another error. Maybe the
author was trying to standardize something that was already deployed in
some software.

The original error in this case is the resorting of the message headers
performed by some relays of which the author himself write in the RFC.
If a relay resort or delete some messages headers, then the information
on the encoding of the following headers may be lost when the next
relay (or the mua) read the message.

It wasn't a great problem to fix, even in brain designed software and
policies.

>
> Encoded-words have worked very well over the past 15 or so years.
>

Look at this:
$ mail -s"Look at this RFC 2047 encoded word: \
=?iso-8859-1?q?this=20is=20some=20text?=" someone
^D

The information that I wanted to transmit is lost for the recipient.
It's a very pathological case, I admit, but from a *standard committe*
I expected something better that a "most cases" solution.

Suppose a program that parse the subject of a mail: the behaviour of
the program depends on the command given in the subject. The syntax of
the mini language of this program can easily grow to produce something
that can conflict with the word-encoding mechanism. Of course the
designer of the mini language syntax has to know the problem and work
around the problem, adding other complexity to the system (and
increasing the probability to introduce bugs).

Even in this case it's possible to work around the problem and make the
recipient receive the intended message: encoding with the word-encoding
mechanism the literal text (it could even be implemented at the mua
level an option to word-encode the headers text). But it's a very
unelegant solution. In the second place complex solutions fool
developers and users, producing something that is usually unusable
and/or buggy. It's a limit of the human mind.

Standardizing something that doesn't work perfectly in some way
legitimize its deployment.
It's strange to me that the author of the RFC doesn't either mention
the problem.

Sorry for the bad English.

Regards

Re: About word-encoding (RFC2047) design

am 09.08.2006 23:42:27 von Mark Crispin

On Wed, 9 Aug 2006, stefano.sabatini-lala@poste.it wrote:
> The RFC seems to me build an error on top of another error. Maybe the
> author was trying to standardize something that was already deployed in
> some software.

Correction: already deployed in almost all software.

> The original error in this case is the resorting of the message headers
> performed by some relays of which the author himself write in the RFC.
> If a relay resort or delete some messages headers, then the information
> on the encoding of the following headers may be lost when the next
> relay (or the mua) read the message.

There was no prohibition for a relay to add, change the order of, or
delete headers.

There was also a very specific statement in the previous standard that
email headers were guaranteed to be 7-bit. That presented an enormous
difficulty, and continues to irritate software developers to this day
because it is difficult, if not impossible, to repeal a guarantee once
made.

> It wasn't a great problem to fix, even in brain designed software and
> policies.

Please look up "Monday morning quarterback" in your dictionary.

More to the point: the only people who think that solutions to these
classes of problems are "simple" are either inexperienced novices or
quacks who ignore real-world matters.

Assuming that you are an inexperienced novice, the only teacher is
experience. No matter how much wisdom more experienced individuals may
attempt to impart, wisdom won't sink in until you have experienced a few
episodes of doing "the simple thing" and suffering the terrible penalty of
having unleashed an Unintended Consequence.

The Law of Unintended Consequences is that "they happen".

Although many of the problems that were of concern to the headers and MIME
working groups seem quaint today, and some very real problems (from
today's perspective) seem to be ignored, the fact is that they did a
remarkable job which has stood the test of time. I admit to some bias,
since I was a member of these working groups.

I doubt very much that a better job can be done today without building a
completely new email architecture from the ground up, with no
compatibility (or even gatewaying) with the existing 2822/MIME/SMTP
architecture. The moment you consider any such compatibility, you have to
let in all these nasty issues from the past that bedeviled the old working
groups.

The old working groups had to deal with such nasty issues to be compatible
(and allow gatewaying) with some truly bizarre ad hoc systems. What we
came up with could have been worse. A lot worse.

> Look at this:
> $ mail -s"Look at this RFC 2047 encoded word: \
> =?iso-8859-1?q?this=20is=20some=20text?=" someone
> ^D
> The information that I wanted to transmit is lost for the recipient.
> It's a very pathological case, I admit, but from a *standard committe*
> I expected something better that a "most cases" solution.

Life's tough sometimes.

You actually can transmit the intended Subject line. You just have to
wrap it in an additional level of quoting.

> Suppose a program that parse the subject of a mail: the behaviour of
> the program depends on the command given in the subject. The syntax of
> the mini language of this program can easily grow to produce something
> that can conflict with the word-encoding mechanism.

Life's tough sometimes.

Anyone who wishes to implement software that processes email messages,
must read (and thorougly understand) a very large body of RFCs. It is not
a task for a novice, unless the novice is willing to invest the time and
effort to become an expert as a necessary part of the project.

> But it's a very
> unelegant solution.

Life's tough sometimes.

If you want elegant solutions, you have to do something completely new, in
a field that is currently unexplored. Then you have the power to make
things elegant without having to deal with the nasty realities that occur
when other people fail to share your view of elegance.

But that's the problem. Either nobody uses your product, or presently
other people will start to enter into that field. And those other people,
who lack your precise understanding of what is elegant, will begin to
change things.

Perhaps it is to conform to their competing idea of what is elegant.
Perhaps they simply don't care about what is elegant as long as it's a
quick and dirty solution to their task. Either way, your elegance will be
eroded and presently destroyed.

> It's strange to me that the author of the RFC doesn't either mention
> the problem.

To the contrary, RFC 2047 seems to discuss the problem at some length.
But, since it is unsolvable, it's mostly a matter of academic interest.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.

Re: About word-encoding (RFC2047) design

am 09.08.2006 23:57:41 von Mark Crispin

On Wed, 9 Aug 2006, Mark E. Mallett wrote:
>> If you want to see why we have such problems, just look at the thread with
>> subject "ESMTP AUTH PLAIN" in this newsgroup. A correct understanding of
>> how to do this requires thorough reading and understanding of three
>> different RFCs. Most people refuse to do this; instead, they guess by
>> empirical testing and they get it wrong.
> Oh-oh, have I said something wrong?

No, I was referring to the original poster's message.

Your answer, for as far as it went, was pretty good. You even got your
example of "omitted initial client response in a mechanism that has no
initial server challenge" in SMTP SASL mostly correct.

There were some things that you omitted. You pointed to RFC 2595 (and the
I-D that will eventually replace it); that's needed to understand the
PLAIN SASL mechanism. However, to get it right, it's also necessary to
read RFC 4422 to understand the mechanics of challenge and response, and
RFC 2554 to understand how SASL is implemented in SMTP.

Your example had a server challenge of "334" which, although visually
correct, is not quite right. The correct server challenge is "334 ". I
don't know if you just forgot to type the space, or if you didn't realize
that it was needed.

The OP's examples of "334 Username:", "334 Password:", and "334 ?" are all
completely wrong, even for the LOGIN SASL mechanism. It's important that
implementors, especially server implementors, understand why. Otherwise
they're not going to implement SMTP SASL correctly.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.

Re: About word-encoding (RFC2047) design

am 10.08.2006 01:42:11 von stefano.sabatini-lala

Mark Crispin wrote:

> Please look up "Monday morning quarterback" in your dictionary.
>
> More to the point: the only people who think that solutions to these
> classes of problems are "simple" are either inexperienced novices or
> quacks who ignore real-world matters.
>
> Assuming that you are an inexperienced novice, the only teacher is
> experience. No matter how much wisdom more experienced individuals may
> attempt to impart, wisdom won't sink in until you have experienced a few
> episodes of doing "the simple thing" and suffering the terrible penalty of
> having unleashed an Unintended Consequence.
>
> The Law of Unintended Consequences is that "they happen".
>
> Although many of the problems that were of concern to the headers and MIME
> working groups seem quaint today, and some very real problems (from
> today's perspective) seem to be ignored, the fact is that they did a
> remarkable job which has stood the test of time. I admit to some bias,
> since I was a member of these working groups.
>
> I doubt very much that a better job can be done today without building a
> completely new email architecture from the ground up, with no
> compatibility (or even gatewaying) with the existing 2822/MIME/SMTP
> architecture. The moment you consider any such compatibility, you have to
> let in all these nasty issues from the past that bedeviled the old working
> groups.
>
> The old working groups had to deal with such nasty issues to be compatible
> (and allow gatewaying) with some truly bizarre ad hoc systems. What we
> came up with could have been worse. A lot worse.
>
[cut]
> If you want elegant solutions, you have to do something completely new, in
> a field that is currently unexplored. Then you have the power to make
> things elegant without having to deal with the nasty realities that occur
> when other people fail to share your view of elegance.
>
> But that's the problem. Either nobody uses your product, or presently
> other people will start to enter into that field. And those other people,
> who lack your precise understanding of what is elegant, will begin to
> change things.
>
> Perhaps it is to conform to their competing idea of what is elegant.
> Perhaps they simply don't care about what is elegant as long as it's a
> quick and dirty solution to their task. Either way, your elegance will be
> eroded and presently destroyed.
>

Thank you Mark, a very exhaustive and useful reply, precisely the kind
of reply I was looking for.

Only quite discouraging. What I was trying to understand was the lack
of consistency and apparently unnecessary complexity of many system (of
which the Internet mail system is a quite significative example).

Of course I agree with you, and I believe you that when standardization
happens there many factors to be considered, from pragmatical
(backward/de facto standards compatibility) to political ones, and in
many cases it's very difficult if not impossible to predict if not the
present one, the future behaviour of the technology standardized.

>From the RFC 2045:

> HISTORICAL NOTE: Several of the mechanisms described in this set of
> documents may seem somewhat strange or even baroque at first reading.
> It is important to note that compatibility with existing standards
> AND robustness across existing practice were two of the highest
> priorities of the working group that developed this set of documents.
> In particular, compatibility was always favored over elegance.

The problem is that this level of complexity tends to grow, when on the
other hand
the capacity to deal with such complexity doesn't grow at the same
pace.

In practical terms: the deal of things I have to know to deal with a
problem today is generally wider that the deal of information required
to solve the same class of problem in the past.

If the tools today are generally better and many kind of right
solutions have been already discovered, people (developers,
administrators, users) tend to be lazy and study the less they can: the
consequence is that errors occurrs more frequently, or at least this is
my sensation. If this is not the case, since we can't really know all
the consequences of our technology, it means that we are losing the
control on it: it's not necessarily something bad, if at least we're
gaining power.

In an ideal world computer science standards should be rewritten from
scratch every ten years, based on the previous experience, but this is
clearly naive and unrealistic. In other fields of technology (e.g.
mechanics, building engineering) the retrocompatibilty doesn't appear a
big problem since for example cars and buildings have not to be
compatible with old ones (only partially true).

Finally if standardization committes decide to sacrifice elegance
against compatibility, I can easily believe this choice is based on
wise considerations (at the end they know better than me), and it takes
no effort to imagine that all could have been *a lot* worse.

Regards

Re: About word-encoding (RFC2047) design

am 10.08.2006 04:12:58 von DFS

stefano.sabatini-lala@poste.it wrote:

> It appears to me a very bad design choice. The only reasonable choice
> would be to simply reject all brain-damaged software and design
> something that could work in each case, and not simply in *most* cases,
> for example an header specifying the encoding to use for decoding the
> following headers.

Headers can be reordered, alas.

> I would like to hear other opinions.

Well, if you want to put a literal encoded-word in the headers, just
encode it (so that when it's decoded, you get what you want.)

If you think RFC 2047 is bad, take a look at RFC 2231. You may need
an air-sickness bag nearby...

Regards,

David.

Re: About word-encoding (RFC2047) design

am 10.08.2006 09:40:13 von gtaylor

On 08/09/06 21:12, David F. Skoll wrote:
> If you think RFC 2047 is bad, take a look at RFC 2231. You may need
> an air-sickness bag nearby...

Thanks David, I needed a good laugh, which I got when I read the title of
the RFC.

Grant. . . .

Re: About word-encoding (RFC2047) design

am 10.08.2006 21:43:48 von mem

In article ,
Mark Crispin wrote:
>
>Your example had a server challenge of "334" which, although visually
>correct, is not quite right. The correct server challenge is "334 ". I
>don't know if you just forgot to type the space, or if you didn't realize
>that it was needed.

Realize it indeed, but then again, it was just an informal outline, not
a spec, for somebody who was asking about conducting a telnet query.
Unless I missed the thrust of the question, which is certainly possible.
I wish I had put the space in for cleverness sake though, for anybody
who looked; so I suppose you could say I forgot :-)

-mm-

Re: About word-encoding (RFC2047) design

am 11.08.2006 06:54:46 von Kari Hurtta

stefano.sabatini-lala@poste.it writes:

> Look at this:
> $ mail -s"Look at this RFC 2047 encoded word: \
> =?iso-8859-1?q?this=20is=20some=20text?=" someone
> ^D
>
> The information that I wanted to transmit is lost for the recipient.
> It's a very pathological case, I admit, but from a *standard committe*
> I expected something better that a "most cases" solution.

Is that really problem ?

Script started on Fri Aug 11 07:49:07 2006
[hurtta@attruh hurtta]$ elm -s"Look at this RFC 2047 encoded word: \
> =?iso-8859-1?q?this=20is=20some=20text?=" hurtta@localhost < /dev/null
Looking up localhost ...
Looking up localhost ... OK
Connecting to localhost [127.0.0.1]... (0)
Mail to hurtta@localhost...
Mail sent!
[hurtta@attruh hurtta]$
Script done on Fri Aug 11 07:49:37 2006

( control characters edited out from typescript. )

produces to mailbox (viewed with less so no any decoding is done)

Return-Path:
Delivered-To: hurtta@localhost.keh.iki.fi
Received: from attruh.keh.iki.fi (localhost.localdomain [127.0.0.1])
by attruh.keh.iki.fi (Postfix) with ESMTP id 9410330860
for ; Fri, 11 Aug 2006 07:49:35 +0300 (EEST)
Subject: Look at this RFC 2047 encoded word:
=?US-ASCII?Q?=?iso-8859-1=3Fq=3Fthis=3D20is=3D20some=3D2 0text?=?=
To: hurtta@localhost.keh.iki.fi
Date: Fri, 11 Aug 2006 07:49:35 +0300 (EEST)
Sender: hurtta@attruh.keh.iki.fi

and these look what is wanted when looked with mail reader:

Delivered-To: hurtta@localhost.keh.iki.fi
Subject: Look at this RFC 2047 encoded word:
=?iso-8859-1?q?this=20is=20some=20text?=
To: hurtta@localhost.keh.iki.fi
Date: Fri, 11 Aug 2006 07:49:35 +0300 (EEST)
Sender: hurtta@attruh.keh.iki.fi

Re: About word-encoding (RFC2047) design

am 11.08.2006 15:17:44 von stefano.sabatini-lala

Kari Hurtta wrote:

> Script started on Fri Aug 11 07:49:07 2006
> [hurtta@attruh hurtta]$ elm -s"Look at this RFC 2047 encoded word: \
> > =?iso-8859-1?q?this=20is=20some=20text?=" hurtta@localhost < /dev/null
> Looking up localhost ...
> Looking up localhost ... OK
> Connecting to localhost [127.0.0.1]... (0)
> Mail to hurtta@localhost...
> Mail sent!
> [hurtta@attruh hurtta]$
> Script done on Fri Aug 11 07:49:37 2006
>
> ( control characters edited out from typescript. )
>
> produces to mailbox (viewed with less so no any decoding is done)
>
> Return-Path:
> Delivered-To: hurtta@localhost.keh.iki.fi
> Received: from attruh.keh.iki.fi (localhost.localdomain [127.0.0.1])
> by attruh.keh.iki.fi (Postfix) with ESMTP id 9410330860
> for ; Fri, 11 Aug 2006 07:49:35 +0300 (EEST)
> Subject: Look at this RFC 2047 encoded word:
> =?US-ASCII?Q?=?iso-8859-1=3Fq=3Fthis=3D20is=3D20some=3D2 0text?=?=
> To: hurtta@localhost.keh.iki.fi
> Date: Fri, 11 Aug 2006 07:49:35 +0300 (EEST)
> Sender: hurtta@attruh.keh.iki.fi
>
> and these look what is wanted when looked with mail reader:
>
> Delivered-To: hurtta@localhost.keh.iki.fi
> Subject: Look at this RFC 2047 encoded word:
> =?iso-8859-1?q?this=20is=20some=20text?=
> To: hurtta@localhost.keh.iki.fi
> Date: Fri, 11 Aug 2006 07:49:35 +0300 (EEST)
> Sender: hurtta@attruh.keh.iki.fi

Yes, at the end I think this is the correct solution, or even better,
the correct way to look at the problem.

It's a mua (Mail User Agent) problem to translate what the user wants
to express literally to something that can be decoded in the same way.
Elm and Mutt (and I think every sophisitcated mua) act in this way.

On the other end a program that has to parse an header content doesn't
need to decode 'em, since the header content is assumed to be
interpreted literally.
The only limitation for the language of such a program will be to use
the ascii charset.

If this is a problem, then the same logic discussed before can be
applied: if the text to deliver contains not ascii chars it's encoded,
if the text contains words that can be interpreted as encoded words,
they're encoded, and the program reading the text has to embed the
decoding mechanic to interpret the message correctly.

Thank you all for the replies.

Best regards.