SubStr routine

SubStr routine

am 28.10.2004 16:15:53 von Mailing Lists

Hello,

Can anybody think of a more elegant way of doing the following:

my $longartist = qq~rolling stones, the (mick jagger and ronnie
wood)~;
my $shortartist;
my $x = 0;
foreach my $character (split(//, $longartist)) {
if ($character =~ /[A-Za-z0-9]/) {
$x++;
}
$shortartist .= $character;
last if ($x == 30);
}


What I'm trying to achieve is a string that is truncated to the point where
30 [A-Za-z0-9] characters have been counted.

So in the above example the result would be:

rolling stones, the (mick jagger and r



Thanks in advance for any ideas!

Martyn

Re: SubStr routine

am 28.10.2004 17:47:13 von Gunnar Hjalmarsson

Martyn Wendon wrote:
> Can anybody think of a more elegant way of doing the following:
>
> my $longartist = qq~rolling stones, the (mick jagger and ronnie
> wood)~;
> my $shortartist;
> my $x = 0;
> foreach my $character (split(//, $longartist)) {
> if ($character =~ /[A-Za-z0-9]/) {
> $x++;
> }
> $shortartist .= $character;
> last if ($x == 30);
> }
>
>
> What I'm trying to achieve is a string that is truncated to the point where
> 30 [A-Za-z0-9] characters have been counted.
>
> So in the above example the result would be:
>
> rolling stones, the (mick jagger and r

perldoc -f substr

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Re: SubStr routine

am 28.10.2004 19:31:25 von Mailing Lists

thanks, I know how to use substr, but that doesn't achieve what I want


"Gunnar Hjalmarsson" wrote in message
news:2uciqmF277iccU3@uni-berlin.de...
Martyn Wendon wrote:
> Can anybody think of a more elegant way of doing the following:
>
> my $longartist = qq~rolling stones, the (mick jagger and ronnie
> wood)~;
> my $shortartist;
> my $x = 0;
> foreach my $character (split(//, $longartist)) {
> if ($character =~ /[A-Za-z0-9]/) {
> $x++;
> }
> $shortartist .= $character;
> last if ($x == 30);
> }
>
>
> What I'm trying to achieve is a string that is truncated to the point
where
> 30 [A-Za-z0-9] characters have been counted.
>
> So in the above example the result would be:
>
> rolling stones, the (mick jagger and r

perldoc -f substr

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Re: SubStr routine

am 28.10.2004 20:17:45 von Gunnar Hjalmarsson

Martyn Wendon wrote:
> thanks, I know how to use substr, but that doesn't achieve what I
> want

Sorry, didn't read your question carefully enough...

Then I have nothing better to suggest.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Re: SubStr routine

am 28.10.2004 20:35:07 von dha

On 2004-10-28, Gunnar Hjalmarsson wrote:
> Martyn Wendon wrote:
>> thanks, I know how to use substr, but that doesn't achieve what I
>> want
>
> Sorry, didn't read your question carefully enough...

Well, if I've read it correctly, what you want is the string up to the
30th non-whitespace character. In which case, substr probably *does*
achieve an important part of what you want. You just need to figure out
how much of the string you need.

Off the top of my head and completely untested:

$whitecount = ($string =~ tr/a-zA-Z0-9//c);
$newstring = substr($string, 0, 30+$whitecount);

checking again to make sure that none of the characters from the 30th
character on aren't whitespace is left as an exercise for the reader...

dha

--
David H. Adler - - http://www.panix.com/~dha/
"Perl Porters, Inc. today announced the release of version .006 of
their popular Perl5 compiler suite, codenamed `Rabid Rat'."
- Nathan Torkington on p5p (this was a *joke*)

Re: SubStr routine

am 28.10.2004 22:01:24 von dwall

David H. Adler wrote:

> On 2004-10-28, Gunnar Hjalmarsson wrote:
>> Martyn Wendon wrote:
>>> thanks, I know how to use substr, but that doesn't achieve what
>>> I want
>>
>> Sorry, didn't read your question carefully enough...
>
> Well, if I've read it correctly, what you want is the string up to
> the 30th non-whitespace character. In which case, substr probably

Um, that's "string up the the 30th character that matches
/[a-zA-Z0-9]/", like the OP said. Slightly different, but I'm sure
you would have caught that.

> *does* achieve an important part of what you want. You just need
> to figure out how much of the string you need.
>
> Off the top of my head and completely untested:
>
> $whitecount = ($string =~ tr/a-zA-Z0-9//c);
> $newstring = substr($string, 0, 30+$whitecount);
>
> checking again to make sure that none of the characters from the
> 30th character on aren't whitespace is left as an exercise for the
> reader...

I'm glad you added that last sentence... :-)


How about this? It's vaguely like the OP's solution, and not terribly
elegant, but at least it doesn't start testing at the very first
character. And it has some checks on the number of desired
characters.


my $longartist = qq~rolling stones, the (mick jagger and ronnie
wood)~;

my $short = my_substr($longartist, 30);
print $short;

sub my_substr {
my ($string, $n) = @_;
return $string if $n >= length $string; # short string
my $len = $n;
while ( $len < length($string) ) {
my $str = substr($string, 0, $len);
my $chars = ($str =~ tr/a-zA-Z0-9//);
return $str if $n == $chars;
$len++;
}
return $string; # number of preferred characters < n
}

Re: SubStr routine

am 29.10.2004 00:42:21 von someone

Martyn Wendon wrote:
>
> Can anybody think of a more elegant way of doing the following:
>
> my $longartist = qq~rolling stones, the (mick jagger and ronnie
> wood)~;
> my $shortartist;
> my $x = 0;
> foreach my $character (split(//, $longartist)) {
> if ($character =~ /[A-Za-z0-9]/) {
> $x++;
> }
> $shortartist .= $character;
> last if ($x == 30);
> }
>
>
> What I'm trying to achieve is a string that is truncated to the point where
> 30 [A-Za-z0-9] characters have been counted.
>
> So in the above example the result would be:
>
> rolling stones, the (mick jagger and r

Here is one way to do it:

$ perl -e'
my $shortartist = my $longartist = 'rolling stones, the (mick jagger and
ronnie wood)';

my $min = 30;
my $max = length( $longartist ) - 1;

for my $len ( $min .. $max ) {
$shortartist = substr $longartist, 0, $len;
last if $min == $shortartist =~ tr/A-Za-z0-9//;
}

print "$longartist\n$shortartist\n";
'
rolling stones, the (mick jagger and ronnie wood)
rolling stones, the (mick jagger and r




John
--
use Perl;
program
fulfillment

Re: SubStr routine

am 29.10.2004 03:02:30 von Uri Guttman

>>>>> "JWK" == John W Krahn writes:

JWK> my $min = 30;
JWK> my $max = length( $longartist ) - 1;

JWK> for my $len ( $min .. $max ) {
JWK> $shortartist = substr $longartist, 0, $len;
JWK> last if $min == $shortartist =~ tr/A-Za-z0-9//;
JWK> }

JWK> print "$longartist\n$shortartist\n";
JWK> '
JWK> rolling stones, the (mick jagger and ronnie wood)
JWK> rolling stones, the (mick jagger and r

no one seems to grasp how easy this is in a regex. (untested)

my ($str) = $text =~ /^((?:\w\s*){1,30})/ ;

i uses \w which adds _ to the allowed chars. easy to change to a char
class.

that matches 1-30 alphanums and each can have optional following
whitespace.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Re: SubStr routine

am 29.10.2004 05:23:58 von someone

Uri Guttman wrote:
>>>>>>"JWK" == John W Krahn writes:
>
> JWK> my $min = 30;
> JWK> my $max = length( $longartist ) - 1;
>
> JWK> for my $len ( $min .. $max ) {
> JWK> $shortartist = substr $longartist, 0, $len;
> JWK> last if $min == $shortartist =~ tr/A-Za-z0-9//;
> JWK> }
>
> JWK> print "$longartist\n$shortartist\n";
> JWK> '
> JWK> rolling stones, the (mick jagger and ronnie wood)
> JWK> rolling stones, the (mick jagger and r
>
> no one seems to grasp how easy this is in a regex. (untested)
>
> my ($str) = $text =~ /^((?:\w\s*){1,30})/ ;
>
> i uses \w which adds _ to the allowed chars. easy to change to a char
> class.
>
> that matches 1-30 alphanums and each can have optional following
> whitespace.

Yes but you also have punctuation in there as well as whitespace.

my ($shortartist) = $longartist =~ /^((?:[A-Za-z0-9][^A-Za-z0-9]*){1,30})/;

That will work. :-)


John
--
use Perl;
program
fulfillment

Re: SubStr routine

am 29.10.2004 05:40:35 von Uri Guttman

>>>>> "JWK" == John W Krahn writes:

>> no one seems to grasp how easy this is in a regex. (untested)
>> my ($str) = $text =~ /^((?:\w\s*){1,30})/ ;
>> i uses \w which adds _ to the allowed chars. easy to change to a char
>> class.
>> that matches 1-30 alphanums and each can have optional following
>> whitespace.

JWK> Yes but you also have punctuation in there as well as whitespace.

JWK> my ($shortartist) = $longartist =~ /^((?:[A-Za-z0-9][^A-Za-z0-9]*){1,30})/;

JWK> That will work. :-)

i did say untested! :) and it was an easy fix because the solution is a
good one :)

and given that _ will most likely never be in those strings, i would use
\w and \W. or at least drop a-z and use /i.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Re: SubStr routine

am 29.10.2004 14:58:50 von someone

Uri Guttman wrote:
>>>>>>"JWK" == John W Krahn writes:
>
> JWK> Yes but you also have punctuation in there as well as whitespace.
>
> JWK> my ($shortartist) = $longartist =~ /^((?:[A-Za-z0-9][^A-Za-z0-9]*){1,30})/;
>
> JWK> That will work. :-)
>
> i did say untested! :) and it was an easy fix because the solution is a
> good one :)
>
> and given that _ will most likely never be in those strings, i would use
> \w and \W. or at least drop a-z and use /i.

Yes, of course _ will never be there ... and you work for Microsoft, right?

;-}

How about this for completeness:

/^((?:[[:alnum:]][[:^alnum:]]*){1,30})/;



John
--
use Perl;
program
fulfillment

Re: SubStr routine

am 29.10.2004 17:08:05 von dwall

{OP: What I'm trying to achieve is a string that is truncated to
the point where 30 [A-Za-z0-9] characters have been counted.}

John W. Krahn wrote:

> Uri Guttman wrote:
>>
>> no one seems to grasp how easy this is in a regex. (untested)
>>
>> my ($str) = $text =~ /^((?:\w\s*){1,30})/ ;
>>
>> i uses \w which adds _ to the allowed chars. easy to change to a
>> char class.
>>
>> that matches 1-30 alphanums and each can have optional following
>> whitespace.
>
> Yes but you also have punctuation in there as well as whitespace.
>
> my ($shortartist) = $longartist =~
> /^((?:[A-Za-z0-9][^A-Za-z0-9]*){1,30})/;
>
> That will work. :-)

Well, it barfs if the first character in $longartist doesn't
match /[A-Za-z0-9]/, but as Uri says, that's an easy fix.

my ($shortartist) = $longartist
=~ /^(
(?:
(?: [A-Za-z0-9] [^A-Za-z0-9]* )
|
(?: [^A-Za-z0-9]* [A-Za-z0-9] )
){1,30}
)
/x;

Re: SubStr routine

am 30.10.2004 04:38:43 von someone

David K. Wall wrote:
> {OP: What I'm trying to achieve is a string that is truncated to
> the point where 30 [A-Za-z0-9] characters have been counted.}
>
> John W. Krahn wrote:
>
>>Uri Guttman wrote:
>>
>>>no one seems to grasp how easy this is in a regex. (untested)
>>>
>>> my ($str) = $text =~ /^((?:\w\s*){1,30})/ ;
>>>
>>>i uses \w which adds _ to the allowed chars. easy to change to a
>>>char class.
>>>
>>>that matches 1-30 alphanums and each can have optional following
>>>whitespace.
>>
>>Yes but you also have punctuation in there as well as whitespace.
>>
>>my ($shortartist) = $longartist =~
>>/^((?:[A-Za-z0-9][^A-Za-z0-9]*){1,30})/;
>>
>>That will work. :-)
>
> Well, it barfs if the first character in $longartist doesn't
> match /[A-Za-z0-9]/, but as Uri says, that's an easy fix.
>
> my ($shortartist) = $longartist
> =~ /^(
> (?:
> (?: [A-Za-z0-9] [^A-Za-z0-9]* )
> |
> (?: [^A-Za-z0-9]* [A-Za-z0-9] )
> ){1,30}
> )
> /x;

Or just add a non-alphanumeric character class at the front.

my ($shortartist) = $longartist =~
/^((?:[[:^alnum:]]*[[:alnum:]][[:^alnum:]]*){1,30})/;


John
--
use Perl;
program
fulfillment

Re: SubStr routine

am 31.10.2004 13:16:47 von Mailing Lists

"John W. Krahn" wrote in message
news:e8rgd.28548$df2.322@edtnps89...
>
>Yes, of course _ will never be there ... and you work for Microsoft, right?
>
>;-}

>How about this for completeness:

/^((?:[[:alnum:]][[:^alnum:]]*){1,30})/;


Thanks to all that responded,

John would you mind breaking this down and explaining how it works? I'd
like to understand what the regex is doing!


Martyn

Re: SubStr routine

am 31.10.2004 20:19:40 von Joe Smith

Martyn Wendon wrote:

> "John W. Krahn" wrote in message
> /^((?:[[:alnum:]][[:^alnum:]]*){1,30})/;
>
> Thanks to all that responded,
>
> John would you mind breaking this down and explaining how it works? I'd
> like to understand what the regex is doing!

If you understand how /^((?:[A-Za-z0-9][^A-Za-z0-9]*){1,30})/ works, then
you can understand how /^((?:[[:alnum:]][[:^alnum:]]*){1,30})/ works.
They are two ways of saying the same thing.
-Joe

Re: SubStr routine

am 01.11.2004 14:05:44 von Mailing Lists

"Joe Smith" wrote in message
news:gVahd.40123$R05.30151@attbi_s53...

>If you understand how /^((?:[A-Za-z0-9][^A-Za-z0-9]*){1,30})/ works, then
>you can understand how /^((?:[[:alnum:]][[:^alnum:]]*){1,30})/ works.
>They are two ways of saying the same thing.
>Joe

Sorry I don't understand either one of them, I've only just started learning
about regex's

Thanks,

Martyn

Re: SubStr routine

am 01.11.2004 14:54:16 von Matt Garrish

"Martyn Wendon" wrote in message
news:cm5cb8$ht$1@sparta.btinternet.com...
>
> "Joe Smith" wrote in message
> news:gVahd.40123$R05.30151@attbi_s53...
>
>>If you understand how /^((?:[A-Za-z0-9][^A-Za-z0-9]*){1,30})/ works, then
>>you can understand how /^((?:[[:alnum:]][[:^alnum:]]*){1,30})/ works.
>>They are two ways of saying the same thing.
>>Joe
>
> Sorry I don't understand either one of them, I've only just started
> learning
> about regex's
>

Take a look at the perlre doc page for the complete details, but:to break it
down from the outside in:

/^()/ is just the outer capturing group.This will put the result in $1.

(?: ) {1,30} means we're using a non-capturing cluster to wrap the pattern
we're looking for (and it must match 1 to 30 times (30 alphanumerics being
the most you're looking for)). You could just wrap the pattern in (), but
see perlre for more info on why this syntax is better (under the Extended
Patterns section).

[[:alnum:]] is a posix character class for all alphanumerics.
[[:^alnum]] is the negated version of the above (i.e., all
non-alphanumerics)

[[:alnum:]][[:^alnum:]]* means we're looking for an alphanumeric character
followed by 0 or more non-alphanumeric characters.

Put it all together and you have you regex that finds 1 to 30 alphanumerics
in a string interspersed by any amount of non-alphanumeric characters.

Matt

Re: SubStr routine

am 01.11.2004 17:37:49 von Mailing Lists

"Matt Garrish" wrote in message
news:7erhd.498$dj2.113153@news20.bellglobal.com...

>Take a look at the perlre doc page for the complete details, but:to break
it
>down from the outside in:

>/^()/ is just the outer capturing group.This will put the result in $1.

>(?: ) {1,30} means we're using a non-capturing cluster to wrap the pattern
>we're looking for (and it must match 1 to 30 times (30 alphanumerics being
>the most you're looking for)). You could just wrap the pattern in (), but
>see perlre for more info on why this syntax is better (under the Extended
>Patterns section).

>[[:alnum:]] is a posix character class for all alphanumerics.
>[[:^alnum]] is the negated version of the above (i.e., all
>non-alphanumerics)

>[[:alnum:]][[:^alnum:]]* means we're looking for an alphanumeric character
>followed by 0 or more non-alphanumeric characters.

>Put it all together and you have you regex that finds 1 to 30 alphanumerics
>in a string interspersed by any amount of non-alphanumeric characters.

>Matt


Thanks Matt, that helps a lot.

Martyn