procmail filters not working

am 26.05.2006 19:11:26 von nooneinparticular314159

I've had a number of spam get through of late, even though I use a set
of procmail filters that should be catching them. Note that I always
filter on the raw source of the e-mail, not on what is normally
displayed in my mail reader. Perhaps I am making a mistake. Can you
tell me what I am doing wrong?

Examples:

P R O z & C

M e R / D / A

A m B / E N

should have been filtered by one of:
:0
* a.m.b.?.e.n
/dev/null

:0
* a m b
/dev/null

But was not deleted.

Likewise, an e-mail containing:
This offer is free and WE DONT CARE ABOUT YOUR CREDIT!

should have been filtered out by:
:0
* your credit
/dev/null

but was not filtered.

What am I doing wrong?

Thanks!

Re: procmail filters not working

am 27.05.2006 02:10:14 von Alan Connor

On comp.mail.misc, in <1148663486.081039.258480@u72g2000cwu.googlegroups.com>, "nooneinparticular314159@yahoo.com" wrote:
> Path: newsspool1.news.pas.earthlink.net!stamper.news.pas.earthlink .net!elnk-nf2-pas!newsfeed.earthlink.net!newshub.sdsu.edu!po stnews.google.com!u72g2000cwu.googlegroups.com!not-for-mail
> From: "nooneinparticular314159@yahoo.com"

Throwaway email address and no name, just an email address.

> Newsgroups: comp.mail.misc
> Subject: procmail filters not working
> Date: 26 May 2006 10:11:26 -0700
> Organization: http://groups.google.com
> Lines: 37
> Message-ID: <1148663486.081039.258480@u72g2000cwu.googlegroups.com>
> NNTP-Posting-Host: 192.80.55.74

$ host 192.80.55.74
Name: webproxy4x.mitre.org
Address: 192.80.55.74

But that header can easily be forged by anyone using googlegroups
to post through.

> Mime-Version: 1.0
> Content-Type: text/plain; charset="iso-8859-1"
> X-Trace: posting.google.com 1148663491 17210 127.0.0.1 (26 May 2006 17:11:31 GMT)
> X-Complaints-To: groups-abuse@google.com
> NNTP-Posting-Date: Fri, 26 May 2006 17:11:31 +0000 (UTC)
> User-Agent: G2/0.2
> X-HTTP-UserAgent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.3) Gecko/20060426 Firefox/1.5.0.3,gzip(gfe),gzip(gfe)
> X-HTTP-Via: 1.1 proxy-wash2.mitre.org:80 (squid/2.5.STABLE10)
> Complaints-To: groups-abuse@google.com
> Injection-Info: u72g2000cwu.googlegroups.com; posting-host=192.80.55.74; posting-account=A1f6RQ0AAADldr2edLdMLvZ0G2s8oVeu
> Xref: news.earthlink.net comp.mail.misc:77525
> X-Received-Date: Fri, 26 May 2006 10:11:31 PDT (newsspool1.news.pas.earthlink.net)

http://slrn.sourceforge.net/docs/README.offline>

Someone working so hard at anonymity is always suspected of beinga spammer on these groups.

Is it that _your_ filter isn't working, or is it that someone
else's _is_ working and you can't get through it?

Procmail knowhow works in both directions.

I think I will give you noanswerinparticular.

Why hide if you are just an ordinary guy? No, the Government
and the Corporations are not out to get you. If they were,
you'd have been gotten long ago.

[Note: I don't read the articles of "Sam" or his numerous
sockpuppets or his 'friends', nor any responses to them, and
haven't for years. He follows me all over the Usenet, and I
still don't read his articles. This _really_ pisses him off.
.]

Alan

--
http://home.earthlink.net/~alanconnor/contact.html
Other URLs of possible interest in my headers.

Re: procmail filters not working

am 27.05.2006 02:45:04 von Garen Erdoisa

nooneinparticular314159@yahoo.com wrote:
> I've had a number of spam get through of late, even though I use a set
> of procmail filters that should be catching them. Note that I always
> filter on the raw source of the e-mail, not on what is normally
> displayed in my mail reader. Perhaps I am making a mistake. Can you
> tell me what I am doing wrong?
>
> Examples:
>
> P R O z & C

> M e R / D / A

> A m B / E N

>
> should have been filtered by one of:
> :0
> * a.m.b.?.e.n
> /dev/null

procmail defaults to filtering on the headers only unless you
specifically tell it to look at the body of the message.
To do that use one of the following forms:

:0 B
* a.m.b.?.e.n
/dev/null

:0
* B ?? a.m.b.?.e.n
/dev/null

>
> :0
> * a m b
> /dev/null

Same thing. You are expecting procmail to look at the message body but
not telling it to do so specifically, so it's looking for that string
only in the headers.

>
> But was not deleted.
>
> Likewise, an e-mail containing:
> This offer is free and WE DONT CARE ABOUT YOUR CREDIT!
>
> should have been filtered out by:
> :0
> * your credit
> /dev/null

Same thing again.

>
> but was not filtered.
>
> What am I doing wrong?
>
> Thanks!
>

--
Garen

Re: procmail filters not working

am 27.05.2006 16:06:15 von nooneinparticular314159

Thank you, Garen! That was very helpful! Hopefully, that will stop
the latest onslaught of spam.

(To the other poster: I post from a throwaway account because I don't
like getting spam. That account gets around 100 spam per week because
I use it to post on newsgroups. Why would I want to give myself more
spam? The object here is to get less.)

Re: procmail filters not working

am 27.05.2006 19:19:17 von AK

nooneinparticular314159@yahoo.com wrote:

> Thank you, Garen! That was very helpful! Hopefully, that will stop
> the latest onslaught of spam.
>
> (To the other poster: I post from a throwaway account because I don't
> like getting spam. That account gets around 100 spam per week because
> I use it to post on newsgroups. Why would I want to give myself more
> spam? The object here is to get less.)
>

Have a look at http://bogofilter.org. This bayesian filter could be
taught to catch many inbound spam without the need to continually come
up with procmail rules.

If you do start using bogofilter, make sure you do not run it in a
learning configuration.

AK

Re: procmail filters not working

am 28.05.2006 03:07:03 von nooneinparticular314159

Well, another spam got through, even though I had updated my filters to
0B to search the message bodies. I really don't understand why my
filters are failing.

I don't want to use a Baysian filter because I do not want to risk
loosing valid e-mail. Every once in a while, I get something valid
from someone unexpected, and I don't want to have that get deleted.
That's why I write my own filters. My filters work very well, except
on this latest round of spam.

Re: procmail filters not working

am 28.05.2006 03:08:07 von nooneinparticular314159

By any chance, is the space required between the 0 and the B? Or
should (can) it be :0B?

Thanks.

Re: procmail filters not working

am 28.05.2006 07:02:48 von Garen Erdoisa

nooneinparticular314159@yahoo.com wrote:
> By any chance, is the space required between the 0 and the B? Or
> should (can) it be :0B?
>

The recipe flags line can be written without the spaces. I usually
include the spaces for readability. Personal preference.

Note that procmail regular expressions will ignore leading spaces or
tabs, but will include trailing spaces or tabs. Something to keep in
mind as you are writing the regular expressions used by your procmail
recipes.

--
Garen

Re: procmail filters not working

am 28.05.2006 07:52:31 von Garen Erdoisa

nooneinparticular314159@yahoo.com wrote:
> Well, another spam got through, even though I had updated my filters to
> 0B to search the message bodies. I really don't understand why my
> filters are failing.

It's kind of hard to troubleshoot without data to look at. :)
However if you want to test a recipe you can use the following technique:

Save a raw email you want to use for testing in a file called
spam.txt

Save the following recipe in a test.rc recipe
-=-=-=-test.rc-=-=-=-

NL="
"

VERBOSE=yes
:0
* B ?? some regular expression to look for in the message body
{
VERBOSE=no
LOG="[$$]$_: Test passed${NL}"
}
VERBOSE=no

# else
:0 E
{
LOG="[$$]$_: Test failed${NL}"
}

LOGABSTRACT=no
:0
/dev/null

-=-=-=-
Then run something like the following command from the command line:

cat spam.txt |procmail /path/to/test.rc

>
> I don't want to use a Baysian filter because I do not want to risk
> loosing valid e-mail. Every once in a while, I get something valid
> from someone unexpected, and I don't want to have that get deleted.
> That's why I write my own filters. My filters work very well, except
> on this latest round of spam.
>

I use bogofilter (a bayesian filter) http://bogofilter.sourceforge.net/
from inside procmail. You don't have to have a bayesian filter delete
anything, you can use it to sort your email into different folders if
you want.

example of a very simple procmail recipe that uses bogofilter:

:0 HB
* ? bogofilter -u
spamfolder

:0 E
${DEFAULT}

The main problem with bayesian filters is that over time the database(s)
can become very bloated with "gibberish" data if you run it in a self
teaching mode as the -u switch above will do.

In my case after letting it learn on just one account for about a year
the database grew to be about 57meg in size. I chose to write a script
to prune the database of tokens that had not been seen in over 6 months
and that cut it down to only 30meg. That script now runs once a month on
a cron job to keep the filter's database pruned.

Another minor problem is that you need to train the filter. They don't
become very accurate until after they've seen at least 4000 spam and
4000 ham (non-spam) messages.

Training the filter is easy, but you do need that fairly large
collection of spam and ham messages to train it with initially.

If you were to allow every user on a large site to use their own
bayesian database it could end up consuming large amounts of disk space
just for that one purpose. If however you chose to run it on a system
wide basis as a bayesian pre-filter, then allow each user to have their
own smaller database this problem can be somewhat alleviated at the
sacrifice of some accuracy. Like any good filter you need to fiddle with
it a bit to get it working to your liking.

For a small system with only a handful of users the database size isn't
much of an issue and allowing each user to run their own bayesian filter
will be the most accurate. The only issue there is that to run a
bayesian filter, you really need to study and understand the theory of
how such filters work, and you need to periodically prune the bayesian
databases of old irrelevant data so they don't become excessively bloated.

There is a good FAQ up on the sourceforge site about how bayesian
filters work.

In my case I use bogofilter to tag messages with a bayesian score, and
to set procmail folder names and variables based on the bayesian score.

That is followed up by custom procmail recipes that whitelist email from
known good sources, and finally to feed the remaining email to a spam
filter. (SpamBouncer in my case).

If the bayesian filter says a message isn't spam, then depending on
other factors the spamfilter step may be skipped.

This combination of techniques has proven to work very well in my case
for keeping email sorted to the point where I see false negatives in my
inbox once or twice a year, and false positives in my spam folder less
than once a month usually because of something I've subscribed to that
changed email providers.

You can improve the accuracy of your whitelists by issuing unique email
aliases to each website you sign up for, and to your various
friends/relatives etc. Map those aliases to your real account and use
procmail to sort them out for filter bypass delivery. If one of the
aliases you give out becomes compromised, it's easy enough to setup an
access list entry on your mail server to deny further delivery for that
alias, then issue a new alias for that web site etc.

The accuracy of this system is good enough for my purposes. Even so,
opinions vary on bayesian filtering and with other filtering methods, so
what works for me may not work to your satisfaction.

In the end you pretty much have to try stuff and decide for yourself
what filtering techniques will suit your needs.

--
Garen

Re: procmail filters not working

am 28.05.2006 14:48:46 von AK

Garen Erdoisa wrote:
> nooneinparticular314159@yahoo.com wrote:
>
>> Well, another spam got through, even though I had updated my filters to
>> 0B to search the message bodies. I really don't understand why my
>> filters are failing.
>
>
> It's kind of hard to troubleshoot without data to look at. :)
> However if you want to test a recipe you can use the following technique:
>
> Save a raw email you want to use for testing in a file called
> spam.txt
>
> Save the following recipe in a test.rc recipe
> -=-=-=-test.rc-=-=-=-
>
> NL="
> "
>
> VERBOSE=yes
> :0
> * B ?? some regular expression to look for in the message body
> {
> VERBOSE=no
> LOG="[$$]$_: Test passed${NL}"
> }
> VERBOSE=no
>
> # else
> :0 E
> {
> LOG="[$$]$_: Test failed${NL}"
> }
>
> LOGABSTRACT=no
> :0
> /dev/null
>
> -=-=-=-
> Then run something like the following command from the command line:
>
> cat spam.txt |procmail /path/to/test.rc
>
>
>>
>> I don't want to use a Baysian filter because I do not want to risk
>> loosing valid e-mail. Every once in a while, I get something valid
>> from someone unexpected, and I don't want to have that get deleted.
>> That's why I write my own filters. My filters work very well, except
>> on this latest round of spam.
>>
>
> I use bogofilter (a bayesian filter) http://bogofilter.sourceforge.net/
> from inside procmail. You don't have to have a bayesian filter delete
> anything, you can use it to sort your email into different folders if
> you want.
>
> example of a very simple procmail recipe that uses bogofilter:
>
> :0 HB
> * ? bogofilter -u
> spamfolder
>
> :0 E
> ${DEFAULT}
>
> The main problem with bayesian filters is that over time the database(s)
> can become very bloated with "gibberish" data if you run it in a self
> teaching mode as the -u switch above will do.
>
> In my case after letting it learn on just one account for about a year
> the database grew to be about 57meg in size. I chose to write a script
> to prune the database of tokens that had not been seen in over 6 months
> and that cut it down to only 30meg. That script now runs once a month on
> a cron job to keep the filter's database pruned.
>
> Another minor problem is that you need to train the filter. They don't
> become very accurate until after they've seen at least 4000 spam and
> 4000 ham (non-spam) messages.
>
> Training the filter is easy, but you do need that fairly large
> collection of spam and ham messages to train it with initially.
>
> If you were to allow every user on a large site to use their own
> bayesian database it could end up consuming large amounts of disk space
> just for that one purpose. If however you chose to run it on a system
> wide basis as a bayesian pre-filter, then allow each user to have their
> own smaller database this problem can be somewhat alleviated at the
> sacrifice of some accuracy. Like any good filter you need to fiddle with
> it a bit to get it working to your liking.
>
> For a small system with only a handful of users the database size isn't
> much of an issue and allowing each user to run their own bayesian filter
> will be the most accurate. The only issue there is that to run a
> bayesian filter, you really need to study and understand the theory of
> how such filters work, and you need to periodically prune the bayesian
> databases of old irrelevant data so they don't become excessively bloated.
>
> There is a good FAQ up on the sourceforge site about how bayesian
> filters work.
>
> In my case I use bogofilter to tag messages with a bayesian score, and
> to set procmail folder names and variables based on the bayesian score.
>
> That is followed up by custom procmail recipes that whitelist email from
> known good sources, and finally to feed the remaining email to a spam
> filter. (SpamBouncer in my case).
>
> If the bayesian filter says a message isn't spam, then depending on
> other factors the spamfilter step may be skipped.
>
> This combination of techniques has proven to work very well in my case
> for keeping email sorted to the point where I see false negatives in my
> inbox once or twice a year, and false positives in my spam folder less
> than once a month usually because of something I've subscribed to that
> changed email providers.
>
> You can improve the accuracy of your whitelists by issuing unique email
> aliases to each website you sign up for, and to your various
> friends/relatives etc. Map those aliases to your real account and use
> procmail to sort them out for filter bypass delivery. If one of the
> aliases you give out becomes compromised, it's easy enough to setup an
> access list entry on your mail server to deny further delivery for that
> alias, then issue a new alias for that web site etc.
>
> The accuracy of this system is good enough for my purposes. Even so,
> opinions vary on bayesian filtering and with other filtering methods, so
> what works for me may not work to your satisfaction.
>
> In the end you pretty much have to try stuff and decide for yourself
> what filtering techniques will suit your needs.
>
> --
> Garen

I would agree with Garen that the options are not all or nothing. But
using bogofilter to continuously learn unhindered leads to the enourmous
file space issues.

The process I followed was to train it with a several messages and then
let it do its work. If a messages was misclassified, I would retrain
the filter. My word file is about 2MB and catches. The continual
training could run into deadlocks and at times if it exceeds the TIMEOUT
for procmail, will get the message delivered.

AK

Re: procmail filters not working

am 29.05.2006 00:02:07 von Allodoxaphobia

On 27 May 2006 18:07:03 -0700, nooneinparticular314159@yahoo.com wrote:
> Well, another spam got through, even though I had updated my filters to
> 0B to search the message bodies. I really don't understand why my
> filters are failing.

If you are reading your mail with a text MUA, what you see is not what
you got -- v-a-v html-constructed bodies. Most spam comes either as a
weirdly formatted html payload _only_ , or with both a text and a html
component. Dump, [E]xport (pine), or whatever the full email body to a
file that you can examine without any 'helpful' html rendering.

I declare "spam!" with email having 'too many'

, too many ,
too many tags, etc. It usually takes them 6 or more

Re: procmail filters not working

am 30.05.2006 16:10:08 von nooneinparticular314159

I implemented the :0B suggestion made earlier, and that solved my
problem. Procmail simply wasn't applying some of my filters properly,
because I was using :0 instead of :0B. I'm now catching 100% of spam.
:-)

I do use pine, however I view spam in headers on mode, which shows you
the complete header and all HTML. So I know what to write filters
against.

Allodoxaphobia: Do you really still use OS/2? If so, what do you use
it for? I've seen it being run at Bank of America in the last few
years, but nowhere else since the mid-90's.

Re: [OT] OS/2 -- was: procmail filters not working

am 30.05.2006 20:17:18 von Allodoxaphobia

On 30 May 2006 07:10:08 -0700, nooneinparticular314159@yahoo.com wrote:
>
> Allodoxaphobia: Do you really still use OS/2? If so, what do you use
> it for? I've seen it being run at Bank of America in the last few
> years, but nowhere else since the mid-90's.

I still have a copy running on an old, screamin' 185MHz PC here. I keep
it around because it is a fond reminder of That Time In The Past when I
threw all my M$ bloatware overboard. Since it's been with me for way
more than a decade, it has a lot of files on it that I have occasion to
retrieve from time-to-time. At one time I was running OS/2 on three
PC's and a laptop here. Those days still stand out as the most fun, the
most rewarding, and the least frustrating in my life with Personal
Computers (the first being a used PDP-8/L in 1976.) And, we had no Alan
Connor in the comp.os.os2... ng's back then. :-\

I've been wrasslin' with linux now - for 2 1/2 years. I'm not yet at
the techie level with it that I had achieved in my first 6 months with
OS/2. But, it's coming.

Jonesy
--
Marvin L Jones | jonz | W3DHJ | linux
38.24N 104.55W | @ config.com | Jonesy | OS/2
*** Killfiling google posts:

Re: procmail filters not working

am 30.05.2006 21:50:05 von nooneinparticular314159

Another question: In some of my filters, I have something of the form:

:0H
* ^Subject: Thing I want to filter on

Is the ^Subject: necessary, or can I just tell it to look in the whole
header, ie:

:0H
* Thing I want to filter on

Thanks!

Re: procmail filters not working

am 30.05.2006 23:44:25 von Garen Erdoisa

nooneinparticular314159@yahoo.com wrote:
> Another question: In some of my filters, I have something of the form:
>
> :0H
> * ^Subject: Thing I want to filter on
>
> Is the ^Subject: necessary, or can I just tell it to look in the whole
> header, ie:
>
> :0H
> * Thing I want to filter on
>
> Thanks!
>

Procmail follows the rules for procmail regular expressions in either case.

Note that procmail regular expressions do have a few minor syntax
differences from other programs that support regular expressions so you
might have to fiddle a bit with the expressions to get them to work
properly in procmail.

If you begin with an ^ symbol, then it will anchor the search starting
at the beginning of the line. If you leave that symbol out, the search
string can be potentially embedded anywhere in the line.

Because of that your first example above will only match a subject
header line and must contain the string you gave, while the 2nd example
will match any header line containing that string anywhere in the line.

I suggest that you study the following procmail man pages if you haven't
done so already. For examples of procmail usage, I recommend that you
study SpamBouncer procmail recipes. SpamBouncer is a spam filter by
Catherine Hampton that is written almost entirely in procmail so is a
good resource for examples. It's also GNU GPL licenced.

man procmail
man procmailrc man page for procmail run control file
man procmailsc man page for procmail scoring
man procmailex man page with some limited procmail examples.

Other resources and FAQS can be found via links from
http://www.procmail.org/

--
Garen