[Procmail] body tests (:0B) , 2 questions

[Procmail] body tests (:0B) , 2 questions

am 20.07.2005 02:10:07 von Allodoxaphobia

In poking around in Google all I can dig up to answer my (1st) question
is:
"Procmail uses multi line matches by default."
So, for my first question, I want to catch (for example) all body phrases
of the ilk:

|make money fast
,
| make money
| fast
,
| make
| money fast
, and
| make
| money
| fast

So, does the following (simplistic) recipe cover each of the above 'flavors'?

|:0B
| * make.*money.*fast

It doesn't seem too. If I am wrong and it does I'll go back to my
recipe and (re)try to beat it into submission.
(Actually, I'm trying to detect a 'common' spammer html quirk -- but I
see the sob's are spanning lines at various points for the text sequence
I'm looking for.)

Question 2.

In the same vein: Do multiple tests (AND's) in the body re-start from the
top of the body -- or does each test start from the end point of the
previous test? For example,
in:
|Dear sir,
| Yaa-daa yaa-daa yaa-daa
|yaa-daa yaa-daa yaa-daa
|yaa-daa Nigeria yaa-daa
:
, is
|:0B
| * Dear sir
| * Nigeria
identical to:
|:0B
| * Nigeria
| * Dear sir
or,
am I required to
do:
|:0B
| * Nigeria
| {
| :0B
| * Dear sir

to catch a 'pairing' of words|phrases that may occur in any order?

Thanks,
Jonesy
--
Marvin L Jones | jonz | W3DHJ | linux
Pueblo, Colorado | @ | Jonesy | OS/2 __
38.24N 104.55W | config.com | DM78rf | SK

Re: [Procmail] body tests (:0B) , 2 questions

am 20.07.2005 17:42:58 von unknown

Post removed (X-No-Archive: yes)

Re: [Procmail] body tests (:0B) , 2 questions

am 21.07.2005 20:08:18 von Allodoxaphobia

On Wed, 20 Jul 2005 15:42:58 GMT, s. keeling wrote:
> Allodoxaphobia :
>> In poking around in Google all I can dig up to answer my (1st) question
>> is:
>> "Procmail uses multi line matches by default."
>> So, for my first question, I want to catch (for example) all body phrases
>> of the ilk:
>>
>> |make money fast
>> ,
>> | make money
>> | fast
>> ,
>> | make
>> | money fast
>> , and
>> | make
>> | money
>> | fast
>>
>> So, does the following (simplistic) recipe cover each of the above 'flavors'?
>>
>> |:0B
>> | * make.*money.*fast
>
> That will find those words, in that order, separated by any number of
> _any_ characters. You can limit it to spaces with "[ ]*".

Maybe you did not understand my question (or, my question was posed
awkwardly) -- and, maybe I do not understand your answer.
But, with extensive testing today I determined that procmail does not
include new-lines in its pattern matching.
What I'm trying to do is find emails with an 'unreasonable' number of
tags. The email I'm testing on has 12 (yes, twelve), with the first 2
occurrences of The third (and subsequent)
(For testing I went KISS and just looked for the "words" 'img'.)
This test works:

:0B
* ()\/\.*\
{
LOG=">> $MATCH $NL"
:0:
ZZ_img
}

procmail: Match on "()\/\.*\"
procmail: Assigning "LOG=>> ^^^- VERY long line with 2 (two) 'img' "words" therein.
procmail: Locking "ZZ_img.lock"
procmail: Assigning "LASTFOLDER=ZZ_img"
procmail: Opening "ZZ_img"
procmail: Acquiring kernel-lock
procmail: Unlocking "ZZ_img.lock"
procmail: Notified comsat: "jonz@4302:/home/jonz/mail/ZZ_img"
From Bender@example.com Mon Jul 18 08:58:03 2005
Subject: Top brand new products directly from the manufactor!
Folder: ZZ_img


This test does _not_ work:

:0B
* ()\/\.*\.*\
{
LOG=">> $MATCH $NL"
:0:
ZZ_img
}

procmail: No match on "()\/\.*\.*\"


> For multi-line tests, try something like:
>
> :0 B
> * ()(make money fast\
> |make money\
> |fast\
> )

hmmmmmm... Your example above would get a 'hit' on every email
with simply the word "fast" in it -- in any context. Nicht wahr?

> [The leading () is a null string intended to un-confuse procmail; it
> may not be necessary.]

Seems to be necessary when trying to _start_ a pattern with an html tag:

* * ()

Regards,
Jonesy
--
Marvin L Jones | jonz | W3DHJ | linux
Pueblo, Colorado | @ | Jonesy | OS/2 __
38.24N 104.55W | config.com | DM78rf | SK

Re: [Procmail] body tests (:0B) , 2 questions

am 22.07.2005 17:40:02 von unknown

Post removed (X-No-Archive: yes)

Re: [Procmail] body tests (:0B) , 2 questions

am 22.07.2005 18:48:52 von Allodoxaphobia

On Fri, 22 Jul 2005 15:40:02 GMT, s. keeling wrote:
> Allodoxaphobia :
>> On Wed, 20 Jul 2005 15:42:58 GMT, s. keeling wrote:
>> > Allodoxaphobia :
>> >> In poking around in Google all I can dig up to answer my (1st) question
>> >> is:
>> >> "Procmail uses multi line matches by default."
>> >> So, for my first question, I want to catch (for example) all body phrases
>> >> of the ilk:
>> >>
>> >> |make money fast
>> >> ,
>> >> | make money
>> >> | fast
>>
>> Maybe you did not understand my question (or, my question was posed
>
> Ah, pardon me.

No apologies necessary. It was no doubt my awkward composition.

> You should try scoring:
>
> :0B
> * -2^0
> * 1^0 ()\\ > ...
>
> That'll initialize the value to -2, then add 1 to the score for every
> ", and you need to
> escape the \ (I think). Correct me if I'm wrong, but once the value
> hits 0, the action will be performed.

Ahhh! Now I _should have_ thought of that!

>> * ()\/\.*\
>> {
>> LOG=">> $MATCH $NL"
>> :0:
>> ZZ_img
>> }
>>
>> procmail: Match on "()\/\.*\"
>> procmail: Assigning "LOG=>> >> procmail: No match on "()\/\.*\.*\"
>
> That's expecting "\.*\", which isn't there. What may be
> there is "\
I am using "\>" and "\>" as 'word-break' anchors. Very confusing when
the subject text concerns html tags with leading <'s. In my dead-simple
examples|tests, I was just trying to get a 'hit' on the 'words' "img" --
which should not fail in render-able html text.
And, the 2-word test for "img" did work - there being 1 line with 2 of'em.
The 3-word test for "img" did not work - there being no such line in the
html payload -- which had 12 "" tags in total.
(A sure sign of spam -- in _my_ universe.)

>> > For multi-line tests, try something like:
>> >
>> > :0 B
>> > * ()(make money fast\
>> > |make money\
>> > |fast\
>> > )
>>
>> hmmmmmm... Your example above would get a 'hit' on every email
>> with simply the word "fast" in it -- in any context. Nicht wahr?
>
> True, sorry. So go with scoring.
>
> :0 B
> * -2^0
> * 1^0 ()make
> * 1^0 ()money
> * 1^0 ()fast
> /dev/null

That, too, would be problematic -- since 'make' and 'money' and 'fast'
could occur innocently spread out over a large text body. But, I
understand -- it's all in the heat of the moment, thinking on your feet,
while pecking away at the keyboard, trying to help some Phule, when you'd
really rather be moving on down usenet. :-)

I guess it would be very involved to catch 'in context' occurances of
make+money+fast. You'd probably need to pipe the body through `tr` to
strip \n, convert \t and blanks to single spaces, and _then_ check _that_
text stream for the offending phrase(s).

> man procmailsc

Yep. Time to (re)visit that.

Many thanks. You've been a great help.
Jonesy
--
Marvin L Jones | jonz | W3DHJ | linux
Pueblo, Colorado | @ | Jonesy | OS/2 __
38.24N 104.55W | config.com | DM78rf | SK