utf-8
am 31.12.2007 20:33:00 von julia_2683I run perl v5.8.7 and my regular expresion is ($txt =3D~ m/(\w+|=E9\w+)/g)
which do not take every utf-8 word. How to make this regular
expression to take every utf-8 word ?
I run perl v5.8.7 and my regular expresion is ($txt =3D~ m/(\w+|=E9\w+)/g)
which do not take every utf-8 word. How to make this regular
expression to take every utf-8 word ?
julia_2683@hotmail.com writes:
> I run perl v5.8.7 and my regular expresion is ($txt =~ m/(\w+|é\w+)/g)
> which do not take every utf-8 word. How to make this regular
> expression to take every utf-8 word ?
Just \w should work, provided you're handling your encodings correctly *and*
your $txt is actually utf-8 encoded. This is IMO a bug.
Note that if your script itself is utf8 encoded you need to "use utf8"
somewhere at the top of your script.
For instance:
#/usr/bin/perl -w
use strict;
# set output stream as utf-8 encoded (i have a utf-8 enabled terminal)
binmode STDOUT,":utf8";
my $str="\x{e9}"; # "é", not necessarily as utf-8 - very likely latin-1
utf8::upgrade($str); # force utf-8 encoding
print "$str was ",($str =~ /\w+/ ? "" : "not "),"matched\n";
Joost.