POP3 email link breaks

POP3 email link breaks

am 03.01.2007 23:24:32 von Rob

I've written a POP3 and Internet Email set of modules for VB.NET and they're
working pretty well. I've got a reasonable understanding of the various RFC
around the subject. However, I'm trying to parse some data submitted by
email and I'm having trouble with line breaks, i.e. line breaks are being
added around the 78 character limit as describe in RFC2822.

What I can't work out is that when I send the same email and let Outlook
2003 download it, it doesn't have the same line breaks - it is somehow
putting the lines back together. I've telnet'd direct to the POP3 server and
this is the output:

----cut here----
top 1 6
+OK
Return-Path:
From: "Rob Nicholson"
To:
Subject: Macc Scores.txt
Date: Wed, 3 Jan 2007 19:19:45 -0000
Message-ID: <003a01c72f6c$220ae020$0400a8c0@middleearth.co.uk>
MIME-Version: 1.0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Office Outlook 11
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
Thread-Index: AccvbCCXEYsOZd5OQ161aTscrR1Q4Q==

Score: "Legh Arms" "Adlington" 13/04/2006 3 # Courage Directors - Wacky
Warehouse. Awful.
Score: "Miners Arms" "Adlington" 13/04/2006 3 # Theakston Best Bitter - =
Very
busy food focussed pub in the middle of nowhere. (Not in Out Inn =
Cheshire
..
----cut here----

I know about the "=" character on the end which is part of the
quoted-printable encoding system. It's that first Score line that's got the
problem:

Score: "Legh Arms" "Adlington" 13/04/2006 3 # Courage Directors - Wacky
Warehouse. Awful.

That was actually sent as one single line. However, the raw POP3 feed is
putting in a CRLF in there when it's read back. I just cannot work out how
Outlook is glueing it back together! I thought that it might have been the
email sender putting in the breaks but even if it was, how does Outlook know
when to glue the lines back together???

Any ideas?

Thanks, Rob.

Re: POP3 email link breaks

am 03.01.2007 23:28:19 von Rob

> That was actually sent as one single line. However, the raw POP3 feed is
> putting in a CRLF in there when it's read back. I just cannot work out how
> Outlook is glueing it back together! I thought that it might have been the
> email sender putting in the breaks but even if it was, how does Outlook
> know when to glue the lines back together???

Later...

Hmm, display the same email in Google and the same line breaks are there so
it suffers from the same problem.

So Outlook is somehow doing something very clever in putting them back
together.

Rob.

Re: POP3 email link breaks

am 03.01.2007 23:37:44 von Rob

> So Outlook is somehow doing something very clever in putting them back
> together.

Hmm, it really is clever :-) Even hotmail.com has the line breaks in there.

Wonder if I can in touch with an Outlook programmer!

Cheers, Rob.

Re: POP3 email link breaks

am 03.01.2007 23:58:32 von Rob

> Wonder if I can in touch with an Outlook programmer!

I know I'm talking to myself but Outlook has a autoclean feature:

>
> Cheers, Rob.
>

Re: POP3 email link breaks

am 03.01.2007 23:59:20 von Rob

> Wonder if I can in touch with an Outlook programmer!

Try again - Outlook autoclean:

http://support.microsoft.com/default.aspx?scid=kb;EN-US;q295 014

Rob.

Re: POP3 email link breaks

am 04.01.2007 00:11:57 von Sam

This is a MIME GnuPG-signed message. If you see this text, it means that
your E-mail or Usenet software does not support MIME signed messages.
The Internet standard for MIME PGP messages, RFC 2015, was published in 1996.
To open this message correctly you will need to install E-mail or Usenet
software that supports modern Internet standards.

--=_mimegpg-commodore.email-scan.com-25471-1167865917-0002
Content-Type: text/plain; format=flowed; charset="US-ASCII"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

Rob writes:

> I've written a POP3 and Internet Email set of modules for VB.NET and they're
> working pretty well. I've got a reasonable understanding of the various RFC
> around the subject. However, I'm trying to parse some data submitted by
> email and I'm having trouble with line breaks, i.e. line breaks are being
> added around the 78 character limit as describe in RFC2822.
>
> What I can't work out is that when I send the same email and let Outlook
> 2003 download it, it doesn't have the same line breaks - it is somehow
> putting the lines back together. I've telnet'd direct to the POP3 server and
> this is the output:
>
> ----cut here----
> top 1 6
> +OK
> Return-Path:
> From: "Rob Nicholson"
> To:
> Subject: Macc Scores.txt
> Date: Wed, 3 Jan 2007 19:19:45 -0000
> Message-ID: <003a01c72f6c$220ae020$0400a8c0@middleearth.co.uk>
> MIME-Version: 1.0
> Content-Type: text/plain;
> charset="iso-8859-1"
> Content-Transfer-Encoding: quoted-printable

Check the specification for quoted-printable line-encoding, paying special
attention to line breaks. That will answer your question.


--=_mimegpg-commodore.email-scan.com-25471-1167865917-0002
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQBFnDg9x9p3GYHlUOIRAuzdAJ99vCPSpZq2j1pOfE+LxlWBz6zFGgCf fOj8
ZaZ7CuS5X3vT/WCOVqLbAmQ=
=qLE2
-----END PGP SIGNATURE-----

--=_mimegpg-commodore.email-scan.com-25471-1167865917-0002--

Re: POP3 email link breaks

am 04.01.2007 00:50:58 von Rob

"Sam" wrote in message
news:cone.1167865917.687681.25471.500@commodore.email-scan.c om...

Hi Sam - as I said, I'm already handling the quoted-printable parts of the
message body. These line breaks unfortunately aren't anything to do with
that. They have been added by (old) email software somewhere on the internet
whereby hard line breaks are being added to keep the lines short.

It's one of those really old bits of internet history :-) About the same
level as emails still being in 7 bit ASCII in places.

After reading further, I've come to the conclusion there isn't a solution to
this. Outlook has special coding in there that very cleverly removes these
extra line breaks. As a human reading it, I can re-glue the lines together
but getting a program to do the same is reasonably tricky.

The "auto clean" feature sometimes removes line breaks incorrectly so there
are many web pages on how to turn the feature off.

So it's nothing I'm doing wrong as such in reading via POP3 or parsing the
resultant internet format email.

A possible solution is to also parse the HTML that's often included in email
messages. When you send an email, it's send twice when using HTML format -
one chunk is text/plain, the other is text/html.

I was ignoring the HTML part hoping that I could get the information I
needed out of the text/plain section. But unfortunately I can't because of
this line break "feature" of the internet :-)

Cheers, Rob.

Re: POP3 email link breaks

am 04.01.2007 01:52:25 von Sam

This is a MIME GnuPG-signed message. If you see this text, it means that
your E-mail or Usenet software does not support MIME signed messages.
The Internet standard for MIME PGP messages, RFC 2015, was published in 1996.
To open this message correctly you will need to install E-mail or Usenet
software that supports modern Internet standards.

--=_mimegpg-commodore.email-scan.com-25471-1167871944-0004
Content-Type: text/plain; format=flowed; charset="US-ASCII"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit

Rob writes:

> "Sam" wrote in message
> news:cone.1167865917.687681.25471.500@commodore.email-scan.c om...
>
> Hi Sam - as I said, I'm already handling the quoted-printable parts of the
> message body. These line breaks unfortunately aren't anything to do with
> that. They have been added by (old) email software somewhere on the internet
> whereby hard line breaks are being added to keep the lines short.
>
> It's one of those really old bits of internet history :-) About the same
> level as emails still being in 7 bit ASCII in places.

I've seen a lot of Internet mail. I've not seen anything like you describe.



--=_mimegpg-commodore.email-scan.com-25471-1167871944-0004
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQBFnE/Ix9p3GYHlUOIRAojmAJ90KbEy4qzjjIA63ubPriR6CxqGUgCe K1cI
xTvf1dtBTH7Zrq/8RNtpdno=
=zaQr
-----END PGP SIGNATURE-----

--=_mimegpg-commodore.email-scan.com-25471-1167871944-0004--

Re: POP3 email link breaks

am 04.01.2007 09:41:10 von Kari Hurtta

"Rob" writes:

> I've written a POP3 and Internet Email set of modules for VB.NET and they're
> working pretty well. I've got a reasonable understanding of the various RFC
> around the subject. However, I'm trying to parse some data submitted by
> email and I'm having trouble with line breaks, i.e. line breaks are being
> added around the 78 character limit as describe in RFC2822.
>
> What I can't work out is that when I send the same email and let Outlook
> 2003 download it, it doesn't have the same line breaks - it is somehow
> putting the lines back together. I've telnet'd direct to the POP3 server and
> this is the output:
>
> ----cut here----
> top 1 6
> +OK
> Return-Path:
> From: "Rob Nicholson"
> To:
> Subject: Macc Scores.txt
> Date: Wed, 3 Jan 2007 19:19:45 -0000
> Message-ID: <003a01c72f6c$220ae020$0400a8c0@middleearth.co.uk>
> MIME-Version: 1.0
> Content-Type: text/plain;
> charset="iso-8859-1"
> Content-Transfer-Encoding: quoted-printable
> X-Mailer: Microsoft Office Outlook 11
> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
> Thread-Index: AccvbCCXEYsOZd5OQ161aTscrR1Q4Q==
>
> Score: "Legh Arms" "Adlington" 13/04/2006 3 # Courage Directors - Wacky
> Warehouse. Awful.
> Score: "Miners Arms" "Adlington" 13/04/2006 3 # Theakston Best Bitter - =
> Very
> busy food focussed pub in the middle of nowhere. (Not in Out Inn =
> Cheshire
> .
> ----cut here----
>
> I know about the "=" character on the end which is part of the
> quoted-printable encoding system. It's that first Score line that's got the
> problem:
>
> Score: "Legh Arms" "Adlington" 13/04/2006 3 # Courage Directors - Wacky
> Warehouse. Awful.
>
> That was actually sent as one single line. However, the raw POP3 feed is
> putting in a CRLF in there when it's read back. I just cannot work out how
> Outlook is glueing it back together! I thought that it might have been the
> email sender putting in the breaks but even if it was, how does Outlook know
> when to glue the lines back together???

IMHO Outlook shoul not have glued it back to together.

If there have
Content-Type: text/plain; format=flowed;
charset="iso-8859-1"

and there is space after 'Wacky' then line should have gluead and then re-breaked
for display, but there is not. I guess that Outlook is using some it's heurestics.

> Any ideas?
>
> Thanks, Rob.

/ Kari Hurtta

Re: POP3 email link breaks

am 06.01.2007 17:59:22 von Rob

> IMHO Outlook shoul not have glued it back to together.

I was hoping there was a space at the end of that line but alas not.

Outlook has an option to turn this feature off - it's labelled "Remove extra
line breaks in plain text messages" so you can decide whether you want it or
not.

I amazed email works at all sometimes with all the historical baggage in
there :-)

Cheers, Rob.

Re: POP3 email link breaks

am 07.01.2007 08:15:49 von Kari Hurtta

"Rob" writes in comp.mail.misc:

> > IMHO Outlook shoul not have glued it back to together.
>
> I was hoping there was a space at the end of that line but alas not.

And reformatting apply only for format=flowed text anyway.

There is RFC 3676 (The Text/Plain Format and DelSp Parameters).

(obsoletes RFC 2646)

| If Format is not specified, or if the value is not recognized, a
| value of Fixed is assumed. The semantics of the Fixed value are the
| usual associated with Text/Plain [MIME-IMT].

> Outlook has an option to turn this feature off - it's labelled "Remove extra
> line breaks in plain text messages" so you can decide whether you want it or
> not.

I see.

> I amazed email works at all sometimes with all the historical baggage in
> there :-)
>
> Cheers, Rob.

/ Kari Hurtta