Proposal for on-the-wire message body scanning

Proposal for on-the-wire message body scanning

am 29.10.2007 10:16:03 von Jan Gutter

My company is interested in implementing an "on-the-wire"
functionality for Sendmail milters. I'm in charge of developing this
patch, and I'm interested in some feedback from the community before I
submit anything.

So, here's the proposal:

Background:

The current method that Sendmail employs with milter is to send the
envelope information to the milter while the message is still "on-the-
wire". Once the envelope information is received, Sendmail calls the
"collect()" function to transfer the message body to a file. After the
message body has been collected, the milter is called blockwise with
the data in this file.

The obvious side-effect is that envelope information can be used "on-
the-wire", but message body (including message headers, etc.) can only
be processed after the whole mail has been collected. We want to
implement an option that allows the body to be sent to the milter "on-
the-fly".

Proposed architectural change:

A flag in the sendmail.mc INPUT_MAIL_FILTER function setting the
milter's mf_flags to allow on-the-fly scanning (I propose 'I', for "in-
line" scanning).

smtp_data(), collect(), milter_data() and milter_body() need to be
modified to be interleaved, in stead of pipelined.

How it should work:

Everything happens the same until right before the body needs to be
collected.

All the milters with the `F=I' flag will be called with the body
chunks as they are being received. The current copy-to-file mechanism
will still be used (i.e. collect() will copy the data to a file, but
the file will be forwarded on a chunk-by-chunk basis through to the
milters).

Then all the milters is without the `F=I' flag, the milter(s) will be
called on the collected data.

Optional:

milter_data has a comment:

XXX: Should actually send body chunks to each filter a chunk at a
time ....

I'm perfectly willing to implement this: in actual fact it would unify
the way milters are called: in this case we just need to decide what
to do in the case that a milter changes/rejects the body. Reject might
be simple, change(s) can be queued until all the milters have finished
processing.

I've also seen comments in the code about SuperSafe and collect() in
smtp_data, I can look into implementing them along with this patch.

Pro:

Processing of body data can be done at a much earlier stage: the
message doesn't have to be received before it is scanned. It might
actually be possible to reject the mail before the rest of the payload
is received. Most milters should work with no modification in the
inline mode.

Con:

More processing. A slow milter might cause SMTP timeouts. There
shouldn't be any extra overhead added to current systems, though.

And now, the Queries:

1) Is this the right forum for me to talk about this? Anywhere else
Sendmail developers congregate?
2) I've looked at RFC2821 (page 17 in particular). I'm pretty sure it
*should* be OK to reject messages while they are being received.
(Otherwise SMTP is vulnerable to a particularly bad DoS attack right
here.) Some certainty would be nice, though...

So, I'm on a bit of a deadline here.... If I don't see any "Please,
for Heaven's sake, NO!" messages, I'm going to start coding.

-- Jan Gutter

Re: Proposal for on-the-wire message body scanning

am 29.10.2007 22:38:26 von per

In article <1193649363.988722.316600@v3g2000hsg.googlegroups.com> Jan
Gutter writes:
>2) I've looked at RFC2821 (page 17 in particular). I'm pretty sure it
>*should* be OK to reject messages while they are being received.

What gave you that impression? It is wrong, the response to the actual
body should only be sent after the final dot has been received. There
may not be any explicit text that forbids a server to send a reply
earlier, but there is also surely no text that even suggests that a
client should look for a reply earlier - including the special case that
the client is unable to send the final dot due to the server abruptly
closing the connection. See also the sequence diagram on page 48.

>(Otherwise SMTP is vulnerable to a particularly bad DoS attack right
>here.)

Of course nothing prevents the server from simply closing the
connection. This will however be treated as a temporary failure by a
client operating per the RFC, i.e. it will queue the message and retry
later.

--Per Hedeland
per@hedeland.org

Re: Proposal for on-the-wire message body scanning

am 30.10.2007 11:05:19 von Jan Gutter

On Oct 29, 11:38 pm, p...@hedeland.org (Per Hedeland) wrote:
> In article <1193649363.988722.316...@v3g2000hsg.googlegroups.com> Jan
>
> Gutter writes:
> >2) I've looked at RFC2821 (page 17 in particular). I'm pretty sure it
> >*should* be OK to reject messages while they are being received.
>
> What gave you that impression? It is wrong, the response to the actual
> body should only be sent after the final dot has been received.

Yes, you're right. Serves me right from (mis)reading the code rather
than
the whole standard. After looking at collect(), I see what you mean:
once
collect() gets called, there's very little you can do to stop the
remote side
from sending, errors just get queued.

> >(Otherwise SMTP is vulnerable to a particularly bad DoS attack right
> >here.)
>
> Of course nothing prevents the server from simply closing the
> connection. This will however be treated as a temporary failure by a
> client operating per the RFC, i.e. it will queue the message and retry
> later.

So there's nothing stopping a misbehaving SMTP server to just open up
a connection, send valid envelope headers and send an arbitrarily
large
amount of data? At least in collect() the data isn't spooled to disk
if
MS_DISCARD is set, but that doesn't prevent the line from being used
by
the remote baddy?

Anyway, even if reject-on-the-wire isn't feasible, we actually still
want the
feature of mail parsing on-the-wire. This is to present a nice view of
mails
that are actually in transit for one of our applications. Putting the
parsing
code for message headers in the milter would only work with this type
of
scanning: otherwise you can only show the envelope while the body is
in transit.

Proposal for on-the-wire message body scanning

am 30.10.2007 16:55:57 von Joseph Brennan

On Oct 30, 6:05 am, Jan Gutter wrote:

> So there's nothing stopping a misbehaving SMTP server to just open up
> a connection, send valid envelope headers and send an arbitrarily
> large amount of data?

Well-behaved servers would honor your maximum message size as shown in
the esmtp response, but as we know some servers still do old smtp. It
seems like it would be useful to detect when the maximum length has
been exceeded and stop writing data to the df file-- since the message
won't be accepted anyway. You still have to wait for them to stop
sending data before you can give them a 550, so you have to pay enough
attention to see when the dot comes, but you don't need to write any
more to disk.

Generalizing, there might be other cases where you can make the
decision to reject a message while it is still coming in. While you
have to wait for them to stop sending to give them a 550, in the
meantime you don't need to write or parse any more of the data.

Joe Brennan

Re: Proposal for on-the-wire message body scanning

am 31.10.2007 02:10:34 von per

In article <1193759757.828962.175580@o80g2000hse.googlegroups.com> Joe
Brennan writes:
>On Oct 30, 6:05 am, Jan Gutter wrote:
>
>> So there's nothing stopping a misbehaving SMTP server to just open up
>> a connection, send valid envelope headers and send an arbitrarily
>> large amount of data?

Uh, s/server/client/ I guess. And as I wrote, the server *could* just
close the connection - or stop reading from it, which would block the
client - but sendmail doesn't do either.

>Well-behaved servers would honor your maximum message size as shown in
>the esmtp response, but as we know some servers still do old smtp. It
>seems like it would be useful to detect when the maximum length has
>been exceeded and stop writing data to the df file-- since the message
>won't be accepted anyway. You still have to wait for them to stop
>sending data before you can give them a 550, so you have to pay enough
>attention to see when the dot comes, but you don't need to write any
>more to disk.

Sendmail already does this, see collect.c:collect().

>Generalizing, there might be other cases where you can make the
>decision to reject a message while it is still coming in. While you
>have to wait for them to stop sending to give them a 550, in the
>meantime you don't need to write or parse any more of the data.

Good point. Then again, it may not be worth the extra complexity in
practice, as most spams aren't big enough that it would matter. (Yes
there may be other reasons to reject, but in comparison the frequency of
their occurrence is probably negligible.)

--Per Hedeland
per@hedeland.org

Re: Proposal for on-the-wire message body scanning

am 01.11.2007 03:13:07 von DFS

Jan Gutter wrote:

> My company is interested in implementing an "on-the-wire"
> functionality for Sendmail milters.

I believe Claus Aßmann's "MeTA1" (formerly known as Sendmail-X)
MTA does on-the-wire filtering. Depending on how tied into Sendmail 8.x
you are, it might be easier just to look at MeTA1.

http://www.meta1.org/

It's still in "pre-alpha" development, however.

Regards,

David.

Re: Proposal for on-the-wire message body scanning

am 01.11.2007 03:15:46 von DFS

Per Hedeland wrote:

> Good point. Then again, it may not be worth the extra complexity in
> practice, as most spams aren't big enough that it would matter. (Yes
> there may be other reasons to reject, but in comparison the frequency of
> their occurrence is probably negligible.)

I agree. I think most messages can either be rejected before any body
information (based on sender/recipients/IP address/HELO information)
or only after a complete body scan that might involve decoding MIME parts.
In particular, it's very awkward to deal with partial MIME messages safely.

Regards,

David.

Re: Proposal for on-the-wire message body scanning

am 01.11.2007 19:07:40 von gtaylor

On 10/31/07 21:15, David F. Skoll wrote:
> I agree. I think most messages can either be rejected before any
> body information (based on sender/recipients/IP address/HELO
> information) or only after a complete body scan that might involve
> decoding MIME parts. In particular, it's very awkward to deal with
> partial MIME messages safely.

Um, I may be wrong here, but if you have a partial / incomplete MIME
message with out the final trailing "." isn't it safe to treat this as
an incomplete message and disconnect the session and / or return a 4xy
error inviting the sending server to re-transmit the complete message at
a later point in time?



Grant. . . .

Re: Proposal for on-the-wire message body scanning

am 02.11.2007 02:29:46 von DFS

Grant Taylor wrote:

> Um, I may be wrong here, but if you have a partial / incomplete MIME
> message with out the final trailing "." isn't it safe to treat this as
> an incomplete message and disconnect the session and / or return a 4xy
> error inviting the sending server to re-transmit the complete message at
> a later point in time?

You misunderstood my point. My point was that if you do "on-the-fly"
processing of mail, you might not have the entire message body yet.
It's very awkward to do anything intelligent with partial MIME messages.
(Not impossible, but probably very difficult to do correctly and safely.)

-- David.