SAX-style mail parsing library
am 27.12.2005 13:55:53 von Przemyslaw Wegrzyn
Hi !
Can anyone here recommend good e-mail parsing library ?
I have to write a program that scans e-mail message piped to it on STDIN.
Most libraries I've found parses e-mail from file building a message part
tree in RAM, or even loading whole message into RAM. I'd like to avoid such
excessive memory usage, thus I'm looking for a library with
"SAX-like" (comparing to XML parsers) API, that is, a library that would
call my functions while going through the stream, reaching headers, part
begginigs/ends etc.
Aby hints ?
Regards,
PW
Re: SAX-style mail parsing library
am 27.12.2005 14:32:17 von Trevor.Jenkins
On Tue, 27 Dec 2005 13:55:53 +0100, Przemyslaw Wegrzyn wrote:
> Can anyone here recommend good e-mail parsing library ?
> I have to write a program that scans e-mail message piped to it on STDIN.
Rather than "parse" messages why not use say formail (from procmail) to
extract the pieces you need.
> Most libraries I've found parses e-mail from file building a message part
> tree in RAM, or even loading whole message into RAM. I'd like to avoid such
> excessive memory usage, thus I'm looking for a library with
> "SAX-like" (comparing to XML parsers) API, that is, a library that would
> call my functions while going through the stream, reaching headers, part
> begginigs/ends etc.
It depends upon what you are going to do with the messages once you've
"parsed" them.
Regards, Trevor
<>< Re: deemed!
Re: SAX-style mail parsing library
am 27.12.2005 17:34:28 von AK
Przemyslaw Wegrzyn wrote:
> Hi !
>
> Can anyone here recommend good e-mail parsing library ?
> I have to write a program that scans e-mail message piped to it on STDIN.
> Most libraries I've found parses e-mail from file building a message part
> tree in RAM, or even loading whole message into RAM. I'd like to avoid such
> excessive memory usage, thus I'm looking for a library with
> "SAX-like" (comparing to XML parsers) API, that is, a library that would
> call my functions while going through the stream, reaching headers, part
> begginigs/ends etc.
>
> Aby hints ?
>
> Regards,
> PW
PW,
On what platform are your application/functions will be running?
What information are you trying to extract from these messages?
There is no uniformity on the formating of an email message beyong the
first part is the header of the message and the rest is the body. A
single empty line separates the body from the header.
Beyond that you have to contend with MIME (base64/Quoted-printables),
uuencoded, HTML encoded, XML encoded, plain text, various other encoding
options with and without attachments.
AK