String Matching Question

String Matching Question

am 11.07.2005 18:56:56 von ralphNOSPAM

I'm trying to write a perl statement that can tell me if all the text
characters in a string contain english characters from ascii table
space (dec 20) through tidle char ~ (dec 126).

This is what I have:

if ($hSubject =~ m/[\x020-\x7E]/)

Will this check ALL the characters in the Subject string against the
pattern?

What I am trying to do is detect when I get those perky foreign
character spam emails.

Thanks...

Re: String Matching Question

am 11.07.2005 19:48:45 von Gunnar Hjalmarsson

ralphNOSPAM@primemail.com wrote:
> I'm trying to write a perl statement that can tell me if all the text
> characters in a string contain english characters from ascii table
> space (dec 20) through tidle char ~ (dec 126).

20 is the hex number for space; the decimal number is 32.

> This is what I have:
>
> if ($hSubject =~ m/[\x020-\x7E]/)
>
> Will this check ALL the characters in the Subject string against the
> pattern?

Yes - and it will return true as long as it finds *any* character within
the range in $hSubject, which is probably not what you want. This is one
way to do the reverse:

$hSubject =~ /[^\x020-\x7E]/
-------------------^

OTOH, it should be noted that since you are testing for single
characters, the tr/// operator is better suited for the task than the
m// operator:

$hSubject =~ tr/\x020-\x7E]//c

See "perldoc perlop".

> What I am trying to do is detect when I get those perky foreign
> character spam emails.

Where I live, the characters åÅäÄöÖ are not foreign at all since they
are included in our alphabet, and emails with such characters are indeed
not "spam" per definition. ;-)

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Re: String Matching Question

am 11.07.2005 19:51:48 von Gunnar Hjalmarsson

Gunnar Hjalmarsson wrote:
>
> $hSubject =~ tr/\x020-\x7E]//c
--------------------------------^
included by mistake; should be

$hSubject =~ tr/\x020-\x7E//c

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Re: String Matching Question

am 11.07.2005 22:19:19 von ralphNOSPAM

Hmmmm...maybe I not very clear in what I want to do or maybe there is
a better way. I want to detect if ALL the chars in the Subject string
fall within the \x20-\x7E range.

My assumption is that if any of the chars fall outside of this ascii
range I'll assume it is spam.

Thanks...

>Yes - and it will return true as long as it finds *any* character within
>the range in $hSubject, which is probably not what you want. This is one
>way to do the reverse:
>
> $hSubject =~ /[^\x020-\x7E]/
>-------------------^
>
>OTOH, it should be noted that since you are testing for single
>characters, the tr/// operator is better suited for the task than the
>m// operator:
>
> $hSubject =~ tr/\x020-\x7E]//c
>
>See "perldoc perlop".
>
>> What I am trying to do is detect when I get those perky foreign
>> character spam emails.
>
>Where I live, the characters åÅäÄöÖ are not foreign at all since they
>are included in our alphabet, and emails with such characters are indeed
>not "spam" per definition. ;-)

Re: String Matching Question

am 11.07.2005 23:56:12 von Gunnar Hjalmarsson

[ Please do not top-post! ]

ralphNOSPAM@primemail.com wrote:
> Gunnar Hjalmarsson wrote:
>> ralphNOSPAM@primemail.com wrote:
>>> I'm trying to write a perl statement that can tell me if all the text
>>> characters in a string contain english characters from ascii table
>>> space (dec 20) through tidle char ~ (dec 126).



>>> if ($hSubject =~ m/[\x020-\x7E]/)
>>>
>>> Will this check ALL the characters in the Subject string against the
>>> pattern?
>>
>> Yes - and it will return true as long as it finds *any* character within
>> the range in $hSubject, which is probably not what you want. This is one
>> way to do the reverse:
>>
>> $hSubject =~ /[^\x020-\x7E]/
>> -------------------^
>>
>> OTOH, it should be noted that since you are testing for single
>> characters, the tr/// operator is better suited for the task than the
>> m// operator:
>>
>> $hSubject =~ tr/\x020-\x7E//c
>
> Hmmmm...maybe I not very clear in what I want to do or maybe there is
> a better way.

Think you were clear enough; I just chose to not spoon-feed you with a
complete expression. Did you read about the tr/// operator in "perldoc
perlop" as I suggested?

My code suggestions presupposes that you look at it from another angle.

unless ( $hSubject =~ tr/\x020-\x7E//c ) {
# all clear
}

>>> What I am trying to do is detect when I get those perky foreign
>>> character spam emails.
>>
>> Where I live, the characters åÅäÄöÖ are not foreign at all since they
>> are included in our alphabet, and emails with such characters are indeed
>> not "spam" per definition. ;-)
>
> My assumption is that if any of the chars fall outside of this ascii
> range I'll assume it is spam.

Yeah, yeah. Or it could be e.g. Jürgen Exner, one of the regular
answerers in the Perl newsgroups, who wants to help you with a Perl problem.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Re: String Matching Question

am 12.07.2005 00:23:00 von ralphNOSPAM

On Mon, 11 Jul 2005 23:56:12 +0200, Gunnar Hjalmarsson
wrote:

>[ Please do not top-post! ]
>
>ralphNOSPAM@primemail.com wrote:
>> Gunnar Hjalmarsson wrote:
>>> ralphNOSPAM@primemail.com wrote:
>>>> I'm trying to write a perl statement that can tell me if all the text
>>>> characters in a string contain english characters from ascii table
>>>> space (dec 20) through tidle char ~ (dec 126).
>
>
>
>>>> if ($hSubject =~ m/[\x020-\x7E]/)
>>>>
>>>> Will this check ALL the characters in the Subject string against the
>>>> pattern?
>>>
>>> Yes - and it will return true as long as it finds *any* character within
>>> the range in $hSubject, which is probably not what you want. This is one
>>> way to do the reverse:
>>>
>>> $hSubject =~ /[^\x020-\x7E]/
>>> -------------------^
>>>
>>> OTOH, it should be noted that since you are testing for single
>>> characters, the tr/// operator is better suited for the task than the
>>> m// operator:
>>>
>>> $hSubject =~ tr/\x020-\x7E//c
>>
>> Hmmmm...maybe I not very clear in what I want to do or maybe there is
>> a better way.
>
>Think you were clear enough; I just chose to not spoon-feed you with a
>complete expression. Did you read about the tr/// operator in "perldoc
>perlop" as I suggested?
>
>My code suggestions presupposes that you look at it from another angle.
>
> unless ( $hSubject =~ tr/\x020-\x7E//c ) {
> # all clear
> }
>
>>>> What I am trying to do is detect when I get those perky foreign
>>>> character spam emails.
>>>
>>> Where I live, the characters åÅäÄöÖ are not foreign at all since they
>>> are included in our alphabet, and emails with such characters are indeed
>>> not "spam" per definition. ;-)
>>
>> My assumption is that if any of the chars fall outside of this ascii
>> range I'll assume it is spam.
>
>Yeah, yeah. Or it could be e.g. Jürgen Exner, one of the regular
>answerers in the Perl newsgroups, who wants to help you with a Perl problem.


Yes, I read about the tr// operator however, I'm not trying to
translate any characters thus I stayed away fromt that.

Good point re the Jurgen Exner name example - maybe I should just
check for, say, more than 5 chars within x20 and x7E. That would tell
me enough that it's an email I want to see and still allow some other
chars to be there. Any of the typical ones that contain all of these
'other' chars would not pass.

Thanks again...I'll work on it some more...

Re: String Matching Question

am 12.07.2005 01:25:09 von Gunnar Hjalmarsson

ralphNOSPAM@primemail.com wrote:
> I read about the tr// operator however, I'm not trying to
> translate any characters thus I stayed away fromt that.

Rash decision. From "perldoc perlop":

"If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated. This
latter is useful for counting characters in a class ..."

> Good point re the Jurgen Exner name example - maybe I should just
> check for, say, more than 5 chars within x20 and x7E.

unless ( $hSubject =~ tr/\x020-\x7E//c > 5 ) {
# all clear
}

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Re: String Matching Question

am 12.07.2005 08:26:42 von Joe Smith

ralphNOSPAM@primemail.com wrote:

> Yes, I read about the tr// operator however, I'm not trying to
> translate any characters thus I stayed away fromt that.

You're not paying attention.

The tr/// operator has two functions:
1) It can be used to translate and/or delete characters, and
2) It can be used to count characters.

In particular, it can be used to obtain a nonzero value if
there are *any* characters outside a given character range.
That is exactly what you want, isn't it?
-Joe

Re: String Matching Question

am 12.07.2005 17:09:01 von ralphNOSPAM

On Mon, 11 Jul 2005 23:26:42 -0700, Joe Smith wrote:

>ralphNOSPAM@primemail.com wrote:
>
>> Yes, I read about the tr// operator however, I'm not trying to
>> translate any characters thus I stayed away fromt that.
>
>You're not paying attention.
>
>The tr/// operator has two functions:
> 1) It can be used to translate and/or delete characters, and
> 2) It can be used to count characters.
>
>In particular, it can be used to obtain a nonzero value if
>there are *any* characters outside a given character range.
>That is exactly what you want, isn't it?
> -Joe

Yes, I see. Thank you for the clarification.