[spamassassin] sa-learn the same messages

am 28.06.2005 03:00:59 von Troy Piggins

Been using spamassassin in combination with my own procmail recipes for
some time now - my recipes first, then spamassassin, then some other
recipes to filter based on score. I have not yet used the bayes ability
and sa-learn. Was surfing the other night and came across some mutt
macros that I thought would make training the bayes database simpler, so
thought I would implement it for a while.

Got me thinking, however - the way it is set up is to just run a cron
job on a particular mbox once a week:

sa-learn --mbox --spam $SPAM

There is a similar one for a $HAM mbox.

I do not get hundreds of spam emails per week, and the $SPAM mbox (at
the moment) does not get emptied after the above cron task is run.

So my questions:

- Does it matter if the same messages are run through sa-learn
again?
- Or should the mbox be emptied upon completion, leaving say 20
messages to process every week? I understand you need about 200
messages for the bayes to be effective, but is that total, or every
time sa-learn is run?
- For my volume of mail, perhaps once a week is too regular?

Thanks for any help.

--
T R O Y P I G G I N S
e : usenet@piggo.com

Re: [spamassassin] sa-learn the same messages

am 28.06.2005 18:31:09 von Neil Woods

>>>>> Troy Piggins writes:

> I do not get hundreds of spam emails per week, and the $SPAM mbox (at
> the moment) does not get emptied after the above cron task is run.

> So my questions:

> - Does it matter if the same messages are run through sa-learn
> again?

No. Specifically, from the sa-learn(1p) man page:

SpamAssassin remembers which mail messages it has learnt already, and
will not re-learn those messages again, unless you use the --forget
option. Messages learnt as spam will have SpamAssassin markup removed,
on the fly.

> - Or should the mbox be emptied upon completion, leaving say 20
> messages to process every week? I understand you need about 200
> messages for the bayes to be effective, but is that total, or every
> time sa-learn is run?

I very much think it's total. However, I read the total was much higher
for bayes to be effective - in the order of 2000+ messages.

> - For my volume of mail, perhaps once a week is too regular?

I really don't think it matters.

> Thanks for any help.

--
Neil

Re: [spamassassin] sa-learn the same messages

am 28.06.2005 23:05:01 von Troy Piggins

* Neil Woods wrote:
>>>>>> Troy Piggins writes:
>
>> I do not get hundreds of spam emails per week, and the $SPAM mbox (at
>> the moment) does not get emptied after the above cron task is run.
>
>> So my questions:
>
>> - Does it matter if the same messages are run through sa-learn again?
>
> No. Specifically, from the sa-learn(1p) man page:
>
> SpamAssassin remembers which mail messages it has learnt already,
> and will not re-learn those messages again, unless you use the
> --forget option. Messages learnt as spam will have SpamAssassin
> markup removed, on the fly.

Doh. Thanks for not RTFMing me!

>> - Or should the mbox be emptied upon completion, leaving say 20
>> messages to process every week? I understand you need about 200
>> messages for the bayes to be effective, but is that total, or every
>> time sa-learn is run?
>
> I very much think it's total. However, I read the total was much
> higher for bayes to be effective - in the order of 2000+ messages.

Ok, shouldn't take too long to get to 2000+ anyway.

>> - For my volume of mail, perhaps once a week is too regular?
>
> I really don't think it matters.

Thought so. Thanks Neil.

Peace.
--
T R O Y P I G G I N S
e : usenet@piggo.com