regex dingbat dodge - single char as string to repeatable single
regex dingbat dodge - single char as string to repeatable single
am 25.01.2008 20:16:00 von John
I have text that might have had a star character in the proprietary
orginating system. The character is used in ratings boxes: A three-
star movie, a four-star restaurant, etc.
By the time it's exported and available to me, it's represented by a
string: "".
I want to suround consecutive stars with font coding and replace each
instance of the string with a single character that, in conjuction
with the font change, will eventually print as a star.
To set up this substitution, I change the strings back to a unique
character, one that I reckon would never occur in nature.
When I try to surround any repetitions of this invented character, I
instead match everything.
===
#!/usr/bin/perl -w
use strict;
my $text = "Cuisine: Urban deli";
$text .= "Overall: <1/2> (very good to
excellent)";
$text .= "Food: <1/2>";
$text =~ s/\/_STAR_/ig; # uscores easier in regex than angle
brackets.
$text =~ s/_STAR_/\xbc/g; # change pseudocharacter to single
character
$text =~ s/(\xbc*)/_STARFONT_$1_ENDSTAR/g; #bracket groups in more
pseudocode
print $text;
====
If I limit the search to five consecutive stars, the match works as I
intended:
===
#!/usr/bin/perl -w
use strict;
my $text = "Cuisine: Urban deli";
$text .= "Overall: <1/2> (very good to
excellent)";
$text .= "Food: <1/2>";
$text =~ s/\/_STAR_/ig; # uscores easier in regex than angle
brackets.
$text =~ s/_STAR_/\xbc/g; # change pseudocharacter to single
character
$text =~ s/(\xbc{1,5})/_STARFONT_$1_ENDSTAR/g; #bracket groups of
stars in more pseudocode
print $text;
===
So what am I missing when it comes to the first search?
Certainly, I am missing some superior technique for matching repeated
instances of such a string, so I am open to suggestions there.
John Campbell
Haddonfield, NJ 08033
Re: regex dingbat dodge - single char as string to repeatable single char.
am 25.01.2008 20:39:58 von someone
John wrote:
> I have text that might have had a star character in the proprietary
> orginating system. The character is used in ratings boxes: A three-
> star movie, a four-star restaurant, etc.
>
> By the time it's exported and available to me, it's represented by a
> string: "".
>
> I want to suround consecutive stars with font coding and replace each
> instance of the string with a single character that, in conjuction
> with the font change, will eventually print as a star.
>
> To set up this substitution, I change the strings back to a unique
> character, one that I reckon would never occur in nature.
>
> When I try to surround any repetitions of this invented character, I
> instead match everything.
>
> ===
>
> #!/usr/bin/perl -w
> use strict;
>
> my $text = "Cuisine: Urban deli";
> $text .= "Overall: <1/2> (very good to
> excellent)";
> $text .= "Food: <1/2>";
>
> $text =~ s/\/_STAR_/ig; # uscores easier in regex than angle
> brackets.
> $text =~ s/_STAR_/\xbc/g; # change pseudocharacter to single
> character
> $text =~ s/(\xbc*)/_STARFONT_$1_ENDSTAR/g; #bracket groups in more
> pseudocode
>
> print $text;
>
> ====
>
> If I limit the search to five consecutive stars, the match works as I
> intended:
>
> ===
>
> #!/usr/bin/perl -w
> use strict;
>
> my $text = "Cuisine: Urban deli";
> $text .= "Overall: <1/2> (very good to
> excellent)";
> $text .= "Food: <1/2>";
>
> $text =~ s/\/_STAR_/ig; # uscores easier in regex than angle
> brackets.
> $text =~ s/_STAR_/\xbc/g; # change pseudocharacter to single
> character
> $text =~ s/(\xbc{1,5})/_STARFONT_$1_ENDSTAR/g; #bracket groups of
> stars in more pseudocode
>
> print $text;
>
> ===
>
> So what am I missing when it comes to the first search?
>
> Certainly, I am missing some superior technique for matching repeated
> instances of such a string, so I am open to suggestions there.
In the first regular expression you are matching '\xbc*' and in the
second you are matching '\xbc{1,5}'. The '*' modifier matches *zero* or
more times and there are *zero* '\xbc' characters everywhere in the
string. The second one has to match at least *one* character. Change
'\xbc*' to '\xbc+'.
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
Re: regex dingbat dodge - single char as string to repeatable single
am 25.01.2008 21:08:17 von John
On Jan 25, 2:39=A0pm, "John W. Krahn" wrote:
> John wrote:
>=A0The '*' modifier matches *zero* or
> more times and there are *zero* '\xbc' characters everywhere in the
> string. =A0The second one has to match at least *one* character. =A0Change=
> '\xbc*' to '\xbc+'.
That does the trick. It ought to come in handy.
Just realized that this snippet prints what in some systems is an
unprintable character.
I just see questions marks. Hope I didn't cause any problems with that.
Re: regex dingbat dodge - single char as string to repeatable single
am 25.01.2008 23:46:46 von Ben Morrow
Quoth John :
> I have text that might have had a star character in the proprietary
> orginating system. The character is used in ratings boxes: A three-
> star movie, a four-star restaurant, etc.
>
> By the time it's exported and available to me, it's represented by a
> string: "".
>
> I want to suround consecutive stars with font coding and replace each
> instance of the string with a single character that, in conjuction
> with the font change, will eventually print as a star.
>
> To set up this substitution, I change the strings back to a unique
> character, one that I reckon would never occur in nature.
>
> When I try to surround any repetitions of this invented character, I
> instead match everything.
>
> #!/usr/bin/perl -w
You want
use warnings;
rather than -w, nowadays.
> use strict;
>
> my $text = "Cuisine: Urban deli";
> $text .= "Overall: <1/2> (very good to
> excellent)";
> $text .= "Food: <1/2>";
>
> $text =~ s/\/_STAR_/ig; # uscores easier in regex
> than angle
No they're not. Angles don't need escaping inside regexen.
> brackets.
> $text =~ s/_STAR_/\xbc/g; # change pseudocharacter to single
> character
> $text =~ s/(\xbc*)/_STARFONT_$1_ENDSTAR/g; #bracket groups in more
> pseudocode
I don't know what the point of that is, unless you have some intervening
code that processes char-by-char.
s/( (?: )+ )/_STARFONT_$1_ENDSTAR/gx;
will work perfectly well. Notice the difference between () and (?: )
(capturing vs. grouping) and my use of /x to make the regex more
comprehensible. Needing + instead of * has already been covered :).
Ben