[mp1] Can"t get UTF8 input streams to automatically be decoded using PERL_UNICODE under mod_perl

[mp1] Can"t get UTF8 input streams to automatically be decoded using PERL_UNICODE under mod_perl

am 19.03.2008 08:32:40 von Rob French

I have recently started converting one of our webapps to make it fully
UTF-8 compliant. All input/output from the webapp will be encoded as
UTF-8. As such, I am trying to use the PERL_UNICODE env variable to
enable UTF-8 flagging on all input/output streams. This works with
standalone Perl scripts like the one below (the /tmp/utf8.txt file
contains a single character (U+00E6 - LATIN SMALL LETTER Ae) :

#!/usr/bin/perl -w

use strict;
use Encode;

print "PERL_UNICODE Value: ${^UNICODE}\n";
open(FH, " undef $/;
my $var = ;
close(FH);

print "Flagged as UTF8? " . Encode::is_utf8($var) . "\n";
exit;

The resulting output after setting my PERL_UNICODE env var to SDA is:

PERL_UNICODE Value: 63
Flagged as UTF8? 1

Which is correct. Perl processed the input stream (open) as UTF-8 and
flagged it accordingly.

Unfortunately if I put the exact same open call in my mod_perl
TransHandler $var is not flagged as UTF-8. The resulting output when
run in the TransHandler is:

PERL_UNICODE Value: 63
Flagged as UTF8?

The input stream is not processed as UTF-8 and not flagged internally
as UTF-8. If I explicitly add an Encode::decode_utf8($var) in mod_perl
then everything works as expected. It appears as if mod_perl is
ignoring the PERL_UNICODE env variable and not processing my input
streams as UTF-8.

Thanks in advance.

Cheers




Environment details below:

Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
Platform:
osname=linux, osvers=2.6.9-22.18.bz155725.elsmp,
archname=i386-linux-thread-multi
uname='linux hs20-bc1-4.build.redhat.com
2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686
i686 i386 gnulinux '
config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386
-mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost
-Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc.
-Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux
-Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads
-Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db
-Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio
-Dinstallusrbinperl -Ubincompat5005 -Uversiononly
-Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1
5.8.0'
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='gcc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
libc=/lib/libc-2.3.4.so, so=so, useshrplib=true, libperl=libperl.so
gnulibc_version='2.3.4'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
-Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE '
cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'


Characteristics of this binary (from libperl):
Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS
USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
Built under linux
Compiled at Jul 24 2006 18:28:10
@INC:
/usr/lib/perl5/5.8.5/i386-linux-thread-multi
/usr/lib/perl5/5.8.5
/usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.5
/usr/lib/perl5/site_perl/5.8.4
/usr/lib/perl5/site_perl/5.8.3
/usr/lib/perl5/site_perl/5.8.2
/usr/lib/perl5/site_perl/5.8.1
/usr/lib/perl5/site_perl/5.8.0
/usr/lib/perl5/site_perl
/usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.8.5
/usr/lib/perl5/vendor_perl/5.8.4
/usr/lib/perl5/vendor_perl/5.8.3
/usr/lib/perl5/vendor_perl/5.8.2
/usr/lib/perl5/vendor_perl/5.8.1
/usr/lib/perl5/vendor_perl/5.8.0
/usr/lib/perl5/vendor_perl
.
mod_perl version: 1.30

Re: [mp1] Can"t get UTF8 input streams to automatically be decodedusing PERL_UNICODE under mod_perl

am 19.03.2008 11:01:48 von aw

Hi.

Perl's handling of Unicode (and of character sets in general) is
extremely clever and powerful.
But it can sometimes be a bit counter-intuitive.

In any case, it seems to me that the evaluation of the PERL_UNICODE
environment variable is a "Perl thing" rather than a "mod_perl thing",
and that mod_perl per se should not interfere with it. But maybe
mod_perl does some magic on filehandles in general which interferes, who
knows ?

Maybe the first thing to do is to ascertain that the problem is really
due to a mishandling of the PERL_UNICODE environment variable, or
something else. I propose a simple test :
Instead of relying on the PERL_UNICODE variable, what happens when you
change the open() statement as follows :

> open(FH, '<:utf8',"/tmp/utf8.txt");

thus explicitly setting a UTF-8 decoding layer for the stream FH,
instead of relying on PERL_UNICODE.
Does your follow-up test then indicate that the utf8 flag for $var is set ?

Note : even with the decoding layer set, that does not necessarily mean
that all data you read will end up with the utf8 flag set. It depends
on the data. But in your case, if you are really using the same file
data in both tests you show below, then it seems a valid test.

André


Rob French wrote:
> I have recently started converting one of our webapps to make it fully
> UTF-8 compliant. All input/output from the webapp will be encoded as
> UTF-8. As such, I am trying to use the PERL_UNICODE env variable to
> enable UTF-8 flagging on all input/output streams. This works with
> standalone Perl scripts like the one below (the /tmp/utf8.txt file
> contains a single character (U+00E6 - LATIN SMALL LETTER Ae) :
>
> #!/usr/bin/perl -w
>
> use strict;
> use Encode;
>
> print "PERL_UNICODE Value: ${^UNICODE}\n";
> open(FH, " > undef $/;
> my $var = ;
> close(FH);
>
> print "Flagged as UTF8? " . Encode::is_utf8($var) . "\n";
> exit;
>
> The resulting output after setting my PERL_UNICODE env var to SDA is:
>
> PERL_UNICODE Value: 63
> Flagged as UTF8? 1
>
> Which is correct. Perl processed the input stream (open) as UTF-8 and
> flagged it accordingly.
>
> Unfortunately if I put the exact same open call in my mod_perl
> TransHandler $var is not flagged as UTF-8. The resulting output when
> run in the TransHandler is:
>
> PERL_UNICODE Value: 63
> Flagged as UTF8?
>
> The input stream is not processed as UTF-8 and not flagged internally
> as UTF-8. If I explicitly add an Encode::decode_utf8($var) in mod_perl
> then everything works as expected. It appears as if mod_perl is
> ignoring the PERL_UNICODE env variable and not processing my input
> streams as UTF-8.
>
> Thanks in advance.
>
> Cheers
>
>
>
>
> Environment details below:
>
> Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
> Platform:
> osname=linux, osvers=2.6.9-22.18.bz155725.elsmp,
> archname=i386-linux-thread-multi
> uname='linux hs20-bc1-4.build.redhat.com
> 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686
> i686 i386 gnulinux '
> config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386
> -mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost
> -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc.
> -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux
> -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads
> -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db
> -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio
> -Dinstallusrbinperl -Ubincompat5005 -Uversiononly
> -Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1
> 5.8.0'
> hint=recommended, useposix=true, d_sigaction=define
> usethreads=define use5005threads=undef useithreads=define
> usemultiplicity=define
> useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
> use64bitint=undef use64bitall=undef uselongdouble=undef
> usemymalloc=n, bincompat5005=undef
> Compiler:
> cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
> -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
> optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
> cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
> -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
> ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)', gccosandvers=''
> intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
> d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
> ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
> lseeksize=8
> alignbytes=4, prototype=define
> Linker and Libraries:
> ld='gcc', ldflags =' -L/usr/local/lib'
> libpth=/usr/local/lib /lib /usr/lib
> libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
> perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
> libc=/lib/libc-2.3.4.so, so=so, useshrplib=true, libperl=libperl.so
> gnulibc_version='2.3.4'
> Dynamic Linking:
> dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
> -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE '
> cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
>
>
> Characteristics of this binary (from libperl):
> Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS
> USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
> Built under linux
> Compiled at Jul 24 2006 18:28:10
> @INC:
> /usr/lib/perl5/5.8.5/i386-linux-thread-multi
> /usr/lib/perl5/5.8.5
> /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.5
> /usr/lib/perl5/site_perl/5.8.4
> /usr/lib/perl5/site_perl/5.8.3
> /usr/lib/perl5/site_perl/5.8.2
> /usr/lib/perl5/site_perl/5.8.1
> /usr/lib/perl5/site_perl/5.8.0
> /usr/lib/perl5/site_perl
> /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.8.5
> /usr/lib/perl5/vendor_perl/5.8.4
> /usr/lib/perl5/vendor_perl/5.8.3
> /usr/lib/perl5/vendor_perl/5.8.2
> /usr/lib/perl5/vendor_perl/5.8.1
> /usr/lib/perl5/vendor_perl/5.8.0
> /usr/lib/perl5/vendor_perl
> .
> mod_perl version: 1.30
>

Re: [mp1] Can"t get UTF8 input streams to automatically be decoded using PERL_UNICODE under mod_perl

am 19.03.2008 17:54:05 von Rob French

Hi Andr=E9,

Yes, I tried that as well and it worked as expected (UTF-8 flag is
set). Explicit PerlIO layer decoding works in both the non-mod_perl
and mod_perl tests. It seems only the default PERL_UNICODE setting is
ignored in mod_perl even though it is set.

Rgrds,
Rob

On Wed, Mar 19, 2008 at 3:01 AM, Andr=E9 Warnier wrote:
> Hi.
>
> Perl's handling of Unicode (and of character sets in general) is
> extremely clever and powerful.
> But it can sometimes be a bit counter-intuitive.
>
> In any case, it seems to me that the evaluation of the PERL_UNICODE
> environment variable is a "Perl thing" rather than a "mod_perl thing",
> and that mod_perl per se should not interfere with it. But maybe
> mod_perl does some magic on filehandles in general which interferes, who
> knows ?
>
> Maybe the first thing to do is to ascertain that the problem is really
> due to a mishandling of the PERL_UNICODE environment variable, or
> something else. I propose a simple test :
> Instead of relying on the PERL_UNICODE variable, what happens when you
> change the open() statement as follows :
>
> > open(FH, '<:utf8',"/tmp/utf8.txt");
>
> thus explicitly setting a UTF-8 decoding layer for the stream FH,
> instead of relying on PERL_UNICODE.
> Does your follow-up test then indicate that the utf8 flag for $var is s=
et ?
>
> Note : even with the decoding layer set, that does not necessarily mean
> that all data you read will end up with the utf8 flag set. It depends
> on the data. But in your case, if you are really using the same file
> data in both tests you show below, then it seems a valid test.
>
> Andr=E9
>
>
>
>
> Rob French wrote:
> > I have recently started converting one of our webapps to make it fully
> > UTF-8 compliant. All input/output from the webapp will be encoded as
> > UTF-8. As such, I am trying to use the PERL_UNICODE env variable to
> > enable UTF-8 flagging on all input/output streams. This works with
> > standalone Perl scripts like the one below (the /tmp/utf8.txt file
> > contains a single character (U+00E6 - LATIN SMALL LETTER Ae) :
> >
> > #!/usr/bin/perl -w
> >
> > use strict;
> > use Encode;
> >
> > print "PERL_UNICODE Value: ${^UNICODE}\n";
> > open(FH, " > > undef $/;
> > my $var =3D ;
> > close(FH);
> >
> > print "Flagged as UTF8? " . Encode::is_utf8($var) . "\n";
> > exit;
> >
> > The resulting output after setting my PERL_UNICODE env var to SDA is:
> >
> > PERL_UNICODE Value: 63
> > Flagged as UTF8? 1
> >
> > Which is correct. Perl processed the input stream (open) as UTF-8 and
> > flagged it accordingly.
> >
> > Unfortunately if I put the exact same open call in my mod_perl
> > TransHandler $var is not flagged as UTF-8. The resulting output when
> > run in the TransHandler is:
> >
> > PERL_UNICODE Value: 63
> > Flagged as UTF8?
> >
> > The input stream is not processed as UTF-8 and not flagged internally
> > as UTF-8. If I explicitly add an Encode::decode_utf8($var) in mod_perl
> > then everything works as expected. It appears as if mod_perl is
> > ignoring the PERL_UNICODE env variable and not processing my input
> > streams as UTF-8.
> >
> > Thanks in advance.
> >
> > Cheers
> >
> >
> >
> >
> > Environment details below:
> >
> > Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
> > Platform:
> > osname=3Dlinux, osvers=3D2.6.9-22.18.bz155725.elsmp,
> > archname=3Di386-linux-thread-multi
> > uname=3D'linux hs20-bc1-4.build.redhat.com
> > 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686
> > i686 i386 gnulinux '
> > config_args=3D'-des -Doptimize=3D-O2 -g -pipe -m32 -march=3Di386
> > -mtune=3Dpentium4 -Dversion=3D5.8.5 -Dmyhostname=3Dlocalhost
> > -Dperladmin=3Droot@localhost -Dcc=3Dgcc -Dcf_by=3DRed Hat, Inc.
> > -Dinstallprefix=3D/usr -Dprefix=3D/usr -Darchname=3Di386-linux
> > -Dvendorprefix=3D/usr -Dsiteprefix=3D/usr -Duseshrplib -Dusethreads
> > -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db
> > -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3D3pm -Duseperlio
> > -Dinstallusrbinperl -Ubincompat5005 -Uversiononly
> > -Dpager=3D/usr/bin/less -isr -Dinc_version_list=3D5.8.4 5.8.3 5.8.2 5.=
8.1
> > 5.8.0'
> > hint=3Drecommended, useposix=3Dtrue, d_sigaction=3Ddefine
> > usethreads=3Ddefine use5005threads=3Dundef useithreads=3Ddefine
> > usemultiplicity=3Ddefine
> > useperlio=3Ddefine d_sfio=3Dundef uselargefiles=3Ddefine usesocks=
=3Dundef
> > use64bitint=3Dundef use64bitall=3Dundef uselongdouble=3Dundef
> > usemymalloc=3Dn, bincompat5005=3Dundef
> > Compiler:
> > cc=3D'gcc', ccflags =3D'-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
> > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> > -D_FILE_OFFSET_BITS=3D64 -I/usr/include/gdbm',
> > optimize=3D'-O2 -g -pipe -m32 -march=3Di386 -mtune=3Dpentium4',
> > cppflags=3D'-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
> > -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
> > ccversion=3D'', gccversion=3D'3.4.6 20060404 (Red Hat 3.4.6-2)', g=
ccosandvers=3D''
> > intsize=3D4, longsize=3D4, ptrsize=3D4, doublesize=3D8, byteorder=
=3D1234
> > d_longlong=3Ddefine, longlongsize=3D8, d_longdbl=3Ddefine, longdbl=
size=3D12
> > ivtype=3D'long', ivsize=3D4, nvtype=3D'double', nvsize=3D8, Off_t=
=3D'off_t',
> > lseeksize=3D8
> > alignbytes=3D4, prototype=3Ddefine
> > Linker and Libraries:
> > ld=3D'gcc', ldflags =3D' -L/usr/local/lib'
> > libpth=3D/usr/local/lib /lib /usr/lib
> > libs=3D-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthrea=
d -lc
> > perllibs=3D-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
> > libc=3D/lib/libc-2.3.4.so, so=3Dso, useshrplib=3Dtrue, libperl=3Dl=
ibperl.so
> > gnulibc_version=3D'2.3.4'
> > Dynamic Linking:
> > dlsrc=3Ddl_dlopen.xs, dlext=3Dso, d_dlsymun=3Dundef, ccdlflags=3D'=
-Wl,-E
> > -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE '
> > cccdlflags=3D'-fPIC', lddlflags=3D'-shared -L/usr/local/lib'
> >
> >
> > Characteristics of this binary (from libperl):
> > Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS
> > USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
> > Built under linux
> > Compiled at Jul 24 2006 18:28:10
> > @INC:
> > /usr/lib/perl5/5.8.5/i386-linux-thread-multi
> > /usr/lib/perl5/5.8.5
> > /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
> > /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
> > /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
> > /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
> > /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
> > /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
> > /usr/lib/perl5/site_perl/5.8.5
> > /usr/lib/perl5/site_perl/5.8.4
> > /usr/lib/perl5/site_perl/5.8.3
> > /usr/lib/perl5/site_perl/5.8.2
> > /usr/lib/perl5/site_perl/5.8.1
> > /usr/lib/perl5/site_perl/5.8.0
> > /usr/lib/perl5/site_perl
> > /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
> > /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
> > /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
> > /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
> > /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
> > /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
> > /usr/lib/perl5/vendor_perl/5.8.5
> > /usr/lib/perl5/vendor_perl/5.8.4
> > /usr/lib/perl5/vendor_perl/5.8.3
> > /usr/lib/perl5/vendor_perl/5.8.2
> > /usr/lib/perl5/vendor_perl/5.8.1
> > /usr/lib/perl5/vendor_perl/5.8.0
> > /usr/lib/perl5/vendor_perl
> > .
> > mod_perl version: 1.30
> >
>

Re: [mp1] Can"t get UTF8 input streams to automatically be decodedusing PERL_UNICODE under mod_perl

am 19.03.2008 19:35:51 von aw

Hi.

I cannot really think of a reason why Perl itself would do something
different in either case. And in your tests, it was verified that
PERL_UNICODE itself is still set right under mod_perl. So it must be
that mod_perl somehow overrides the basic Perl setting. Maybe mod_perl
needs to do something re the filehandles, because some of them might be
connected to Apache ?

Anyhow, out of my depth now, so let's call on a real mod_perl guru if
any of them is around ?

By the way :
I have tried the same thing in the meantime under Apache 2.x/mod_perl
2.x, and I seem to have the same problem.

I have one more question : where exactly do you set PERL_UNICODE ?



Rob French wrote:
> Hi André,
>
> Yes, I tried that as well and it worked as expected (UTF-8 flag is
> set). Explicit PerlIO layer decoding works in both the non-mod_perl
> and mod_perl tests. It seems only the default PERL_UNICODE setting is
> ignored in mod_perl even though it is set.
>
> Rgrds,
> Rob
>
> On Wed, Mar 19, 2008 at 3:01 AM, André Warnier wrote:
>> Hi.
>>
>> Perl's handling of Unicode (and of character sets in general) is
>> extremely clever and powerful.
>> But it can sometimes be a bit counter-intuitive.
>>
>> In any case, it seems to me that the evaluation of the PERL_UNICODE
>> environment variable is a "Perl thing" rather than a "mod_perl thing",
>> and that mod_perl per se should not interfere with it. But maybe
>> mod_perl does some magic on filehandles in general which interferes, who
>> knows ?
>>
>> Maybe the first thing to do is to ascertain that the problem is really
>> due to a mishandling of the PERL_UNICODE environment variable, or
>> something else. I propose a simple test :
>> Instead of relying on the PERL_UNICODE variable, what happens when you
>> change the open() statement as follows :
>>
>> > open(FH, '<:utf8',"/tmp/utf8.txt");
>>
>> thus explicitly setting a UTF-8 decoding layer for the stream FH,
>> instead of relying on PERL_UNICODE.
>> Does your follow-up test then indicate that the utf8 flag for $var is set ?
>>
>> Note : even with the decoding layer set, that does not necessarily mean
>> that all data you read will end up with the utf8 flag set. It depends
>> on the data. But in your case, if you are really using the same file
>> data in both tests you show below, then it seems a valid test.
>>
>> André
>>
>>
>>
>>
>> Rob French wrote:
>> > I have recently started converting one of our webapps to make it fully
>> > UTF-8 compliant. All input/output from the webapp will be encoded as
>> > UTF-8. As such, I am trying to use the PERL_UNICODE env variable to
>> > enable UTF-8 flagging on all input/output streams. This works with
>> > standalone Perl scripts like the one below (the /tmp/utf8.txt file
>> > contains a single character (U+00E6 - LATIN SMALL LETTER Ae) :
>> >
>> > #!/usr/bin/perl -w
>> >
>> > use strict;
>> > use Encode;
>> >
>> > print "PERL_UNICODE Value: ${^UNICODE}\n";
>> > open(FH, " >> > undef $/;
>> > my $var = ;
>> > close(FH);
>> >
>> > print "Flagged as UTF8? " . Encode::is_utf8($var) . "\n";
>> > exit;
>> >
>> > The resulting output after setting my PERL_UNICODE env var to SDA is:
>> >
>> > PERL_UNICODE Value: 63
>> > Flagged as UTF8? 1
>> >
>> > Which is correct. Perl processed the input stream (open) as UTF-8 and
>> > flagged it accordingly.
>> >
>> > Unfortunately if I put the exact same open call in my mod_perl
>> > TransHandler $var is not flagged as UTF-8. The resulting output when
>> > run in the TransHandler is:
>> >
>> > PERL_UNICODE Value: 63
>> > Flagged as UTF8?
>> >
>> > The input stream is not processed as UTF-8 and not flagged internally
>> > as UTF-8. If I explicitly add an Encode::decode_utf8($var) in mod_perl
>> > then everything works as expected. It appears as if mod_perl is
>> > ignoring the PERL_UNICODE env variable and not processing my input
>> > streams as UTF-8.
>> >
>> > Thanks in advance.
>> >
>> > Cheers
>> >
>> >
>> >
>> >
>> > Environment details below:
>> >
>> > Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
>> > Platform:
>> > osname=linux, osvers=2.6.9-22.18.bz155725.elsmp,
>> > archname=i386-linux-thread-multi
>> > uname='linux hs20-bc1-4.build.redhat.com
>> > 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i686
>> > i686 i386 gnulinux '
>> > config_args='-des -Doptimize=-O2 -g -pipe -m32 -march=i386
>> > -mtune=pentium4 -Dversion=5.8.5 -Dmyhostname=localhost
>> > -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc.
>> > -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux
>> > -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads
>> > -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db
>> > -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio
>> > -Dinstallusrbinperl -Ubincompat5005 -Uversiononly
>> > -Dpager=/usr/bin/less -isr -Dinc_version_list=5.8.4 5.8.3 5.8.2 5.8.1
>> > 5.8.0'
>> > hint=recommended, useposix=true, d_sigaction=define
>> > usethreads=define use5005threads=undef useithreads=define
>> > usemultiplicity=define
>> > useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
>> > use64bitint=undef use64bitall=undef uselongdouble=undef
>> > usemymalloc=n, bincompat5005=undef
>> > Compiler:
>> > cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
>> > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
>> > -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
>> > optimize='-O2 -g -pipe -m32 -march=i386 -mtune=pentium4',
>> > cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
>> > -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
>> > ccversion='', gccversion='3.4.6 20060404 (Red Hat 3.4.6-2)', gccosandvers=''
>> > intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
>> > d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
>> > ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
>> > lseeksize=8
>> > alignbytes=4, prototype=define
>> > Linker and Libraries:
>> > ld='gcc', ldflags =' -L/usr/local/lib'
>> > libpth=/usr/local/lib /lib /usr/lib
>> > libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
>> > perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
>> > libc=/lib/libc-2.3.4.so, so=so, useshrplib=true, libperl=libperl.so
>> > gnulibc_version='2.3.4'
>> > Dynamic Linking:
>> > dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
>> > -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE '
>> > cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
>> >
>> >
>> > Characteristics of this binary (from libperl):
>> > Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS
>> > USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
>> > Built under linux
>> > Compiled at Jul 24 2006 18:28:10
>> > @INC:
>> > /usr/lib/perl5/5.8.5/i386-linux-thread-multi
>> > /usr/lib/perl5/5.8.5
>> > /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
>> > /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
>> > /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
>> > /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
>> > /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
>> > /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
>> > /usr/lib/perl5/site_perl/5.8.5
>> > /usr/lib/perl5/site_perl/5.8.4
>> > /usr/lib/perl5/site_perl/5.8.3
>> > /usr/lib/perl5/site_perl/5.8.2
>> > /usr/lib/perl5/site_perl/5.8.1
>> > /usr/lib/perl5/site_perl/5.8.0
>> > /usr/lib/perl5/site_perl
>> > /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
>> > /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
>> > /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
>> > /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
>> > /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
>> > /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
>> > /usr/lib/perl5/vendor_perl/5.8.5
>> > /usr/lib/perl5/vendor_perl/5.8.4
>> > /usr/lib/perl5/vendor_perl/5.8.3
>> > /usr/lib/perl5/vendor_perl/5.8.2
>> > /usr/lib/perl5/vendor_perl/5.8.1
>> > /usr/lib/perl5/vendor_perl/5.8.0
>> > /usr/lib/perl5/vendor_perl
>> > .
>> > mod_perl version: 1.30
>> >
>>
>

Re: [mp1] Can"t get UTF8 input streams to automatically be decoded using PERL_UNICODE under mod_perl

am 19.03.2008 19:41:37 von Rob French

I have tried setting it via Apache SetEnv directive as well as in my
environment as root when starting Apache. In both cases the variable
is correctly set in mod_perl it is just ignored.

As another test I tried the same code as a plain ol' CGI script and it
works in that case. So the issue is definitely with mod_perl and its
interaction with the PERL_UNICODE env variable.

Thanks for your help investigating. I was worried that it might be a
mod_perl 1.x thing or a Perl version thing. Good to know it isn't just
my setup :)

Rgrds,
Rob

On Wed, Mar 19, 2008 at 11:35 AM, Andr=E9 Warnier wrote:
> Hi.
>
> I cannot really think of a reason why Perl itself would do something
> different in either case. And in your tests, it was verified that
> PERL_UNICODE itself is still set right under mod_perl. So it must be
> that mod_perl somehow overrides the basic Perl setting. Maybe mod_perl
> needs to do something re the filehandles, because some of them might be
> connected to Apache ?
>
> Anyhow, out of my depth now, so let's call on a real mod_perl guru if
> any of them is around ?
>
> By the way :
> I have tried the same thing in the meantime under Apache 2.x/mod_perl
> 2.x, and I seem to have the same problem.
>
> I have one more question : where exactly do you set PERL_UNICODE ?
>
>
>
>
>
> Rob French wrote:
> > Hi Andr=E9,
> >
> > Yes, I tried that as well and it worked as expected (UTF-8 flag is
> > set). Explicit PerlIO layer decoding works in both the non-mod_perl
> > and mod_perl tests. It seems only the default PERL_UNICODE setting is
> > ignored in mod_perl even though it is set.
> >
> > Rgrds,
> > Rob
> >
> > On Wed, Mar 19, 2008 at 3:01 AM, Andr=E9 Warnier wrote=
:
> >> Hi.
> >>
> >> Perl's handling of Unicode (and of character sets in general) is
> >> extremely clever and powerful.
> >> But it can sometimes be a bit counter-intuitive.
> >>
> >> In any case, it seems to me that the evaluation of the PERL_UNICODE
> >> environment variable is a "Perl thing" rather than a "mod_perl thing=
",
> >> and that mod_perl per se should not interfere with it. But maybe
> >> mod_perl does some magic on filehandles in general which interferes,=
who
> >> knows ?
> >>
> >> Maybe the first thing to do is to ascertain that the problem is real=
ly
> >> due to a mishandling of the PERL_UNICODE environment variable, or
> >> something else. I propose a simple test :
> >> Instead of relying on the PERL_UNICODE variable, what happens when y=
ou
> >> change the open() statement as follows :
> >>
> >> > open(FH, '<:utf8',"/tmp/utf8.txt");
> >>
> >> thus explicitly setting a UTF-8 decoding layer for the stream FH,
> >> instead of relying on PERL_UNICODE.
> >> Does your follow-up test then indicate that the utf8 flag for $var i=
s set ?
> >>
> >> Note : even with the decoding layer set, that does not necessarily m=
ean
> >> that all data you read will end up with the utf8 flag set. It depen=
ds
> >> on the data. But in your case, if you are really using the same fil=
e
> >> data in both tests you show below, then it seems a valid test.
> >>
> >> Andr=E9
> >>
> >>
> >>
> >>
> >> Rob French wrote:
> >> > I have recently started converting one of our webapps to make it f=
ully
> >> > UTF-8 compliant. All input/output from the webapp will be encoded =
as
> >> > UTF-8. As such, I am trying to use the PERL_UNICODE env variable t=
o
> >> > enable UTF-8 flagging on all input/output streams. This works with
> >> > standalone Perl scripts like the one below (the /tmp/utf8.txt file
> >> > contains a single character (U+00E6 - LATIN SMALL LETTER Ae) :
> >> >
> >> > #!/usr/bin/perl -w
> >> >
> >> > use strict;
> >> > use Encode;
> >> >
> >> > print "PERL_UNICODE Value: ${^UNICODE}\n";
> >> > open(FH, " > >> > undef $/;
> >> > my $var =3D ;
> >> > close(FH);
> >> >
> >> > print "Flagged as UTF8? " . Encode::is_utf8($var) . "\n";
> >> > exit;
> >> >
> >> > The resulting output after setting my PERL_UNICODE env var to SDA =
is:
> >> >
> >> > PERL_UNICODE Value: 63
> >> > Flagged as UTF8? 1
> >> >
> >> > Which is correct. Perl processed the input stream (open) as UTF-8 =
and
> >> > flagged it accordingly.
> >> >
> >> > Unfortunately if I put the exact same open call in my mod_perl
> >> > TransHandler $var is not flagged as UTF-8. The resulting output wh=
en
> >> > run in the TransHandler is:
> >> >
> >> > PERL_UNICODE Value: 63
> >> > Flagged as UTF8?
> >> >
> >> > The input stream is not processed as UTF-8 and not flagged interna=
lly
> >> > as UTF-8. If I explicitly add an Encode::decode_utf8($var) in mod_=
perl
> >> > then everything works as expected. It appears as if mod_perl is
> >> > ignoring the PERL_UNICODE env variable and not processing my input
> >> > streams as UTF-8.
> >> >
> >> > Thanks in advance.
> >> >
> >> > Cheers
> >> >
> >> >
> >> >
> >> >
> >> > Environment details below:
> >> >
> >> > Summary of my perl5 (revision 5 version 8 subversion 5) configurat=
ion:
> >> > Platform:
> >> > osname=3Dlinux, osvers=3D2.6.9-22.18.bz155725.elsmp,
> >> > archname=3Di386-linux-thread-multi
> >> > uname=3D'linux hs20-bc1-4.build.redhat.com
> >> > 2.6.9-22.18.bz155725.elsmp #1 smp thu nov 17 15:34:08 est 2005 i68=
6
> >> > i686 i386 gnulinux '
> >> > config_args=3D'-des -Doptimize=3D-O2 -g -pipe -m32 -march=3Di3=
86
> >> > -mtune=3Dpentium4 -Dversion=3D5.8.5 -Dmyhostname=3Dlocalhost
> >> > -Dperladmin=3Droot@localhost -Dcc=3Dgcc -Dcf_by=3DRed Hat, Inc.
> >> > -Dinstallprefix=3D/usr -Dprefix=3D/usr -Darchname=3Di386-linux
> >> > -Dvendorprefix=3D/usr -Dsiteprefix=3D/usr -Duseshrplib -Dusethread=
s
> >> > -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db
> >> > -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3D3pm -Duseperli=
o
> >> > -Dinstallusrbinperl -Ubincompat5005 -Uversiononly
> >> > -Dpager=3D/usr/bin/less -isr -Dinc_version_list=3D5.8.4 5.8.3 5.8.=
2 5.8.1
> >> > 5.8.0'
> >> > hint=3Drecommended, useposix=3Dtrue, d_sigaction=3Ddefine
> >> > usethreads=3Ddefine use5005threads=3Dundef useithreads=3Ddefin=
e
> >> > usemultiplicity=3Ddefine
> >> > useperlio=3Ddefine d_sfio=3Dundef uselargefiles=3Ddefine useso=
cks=3Dundef
> >> > use64bitint=3Dundef use64bitall=3Dundef uselongdouble=3Dundef
> >> > usemymalloc=3Dn, bincompat5005=3Dundef
> >> > Compiler:
> >> > cc=3D'gcc', ccflags =3D'-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
> >> > -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURC=
E
> >> > -D_FILE_OFFSET_BITS=3D64 -I/usr/include/gdbm',
> >> > optimize=3D'-O2 -g -pipe -m32 -march=3Di386 -mtune=3Dpentium4'=
,
> >> > cppflags=3D'-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
> >> > -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdb=
m'
> >> > ccversion=3D'', gccversion=3D'3.4.6 20060404 (Red Hat 3.4.6-2)=
', gccosandvers=3D''
> >> > intsize=3D4, longsize=3D4, ptrsize=3D4, doublesize=3D8, byteor=
der=3D1234
> >> > d_longlong=3Ddefine, longlongsize=3D8, d_longdbl=3Ddefine, lon=
gdblsize=3D12
> >> > ivtype=3D'long', ivsize=3D4, nvtype=3D'double', nvsize=3D8, Of=
f_t=3D'off_t',
> >> > lseeksize=3D8
> >> > alignbytes=3D4, prototype=3Ddefine
> >> > Linker and Libraries:
> >> > ld=3D'gcc', ldflags =3D' -L/usr/local/lib'
> >> > libpth=3D/usr/local/lib /lib /usr/lib
> >> > libs=3D-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpt=
hread -lc
> >> > perllibs=3D-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -l=
c
> >> > libc=3D/lib/libc-2.3.4.so, so=3Dso, useshrplib=3Dtrue, libperl=
=3Dlibperl.so
> >> > gnulibc_version=3D'2.3.4'
> >> > Dynamic Linking:
> >> > dlsrc=3Ddl_dlopen.xs, dlext=3Dso, d_dlsymun=3Dundef, ccdlflags=
=3D'-Wl,-E
> >> > -Wl,-rpath,/usr/lib/perl5/5.8.5/i386-linux-thread-multi/CORE '
> >> > cccdlflags=3D'-fPIC', lddlflags=3D'-shared -L/usr/local/lib'
> >> >
> >> >
> >> > Characteristics of this binary (from libperl):
> >> > Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS
> >> > USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
> >> > Built under linux
> >> > Compiled at Jul 24 2006 18:28:10
> >> > @INC:
> >> > /usr/lib/perl5/5.8.5/i386-linux-thread-multi
> >> > /usr/lib/perl5/5.8.5
> >> > /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
> >> > /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
> >> > /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
> >> > /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
> >> > /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
> >> > /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
> >> > /usr/lib/perl5/site_perl/5.8.5
> >> > /usr/lib/perl5/site_perl/5.8.4
> >> > /usr/lib/perl5/site_perl/5.8.3
> >> > /usr/lib/perl5/site_perl/5.8.2
> >> > /usr/lib/perl5/site_perl/5.8.1
> >> > /usr/lib/perl5/site_perl/5.8.0
> >> > /usr/lib/perl5/site_perl
> >> > /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
> >> > /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
> >> > /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
> >> > /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
> >> > /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
> >> > /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
> >> > /usr/lib/perl5/vendor_perl/5.8.5
> >> > /usr/lib/perl5/vendor_perl/5.8.4
> >> > /usr/lib/perl5/vendor_perl/5.8.3
> >> > /usr/lib/perl5/vendor_perl/5.8.2
> >> > /usr/lib/perl5/vendor_perl/5.8.1
> >> > /usr/lib/perl5/vendor_perl/5.8.0
> >> > /usr/lib/perl5/vendor_perl
> >> > .
> >> > mod_perl version: 1.30
> >> >
> >>
> >
>
>

Re: [mp1] Can"t get UTF8 input streams to automatically be decodedusing PERL_UNICODE under mod_perl

am 19.03.2008 20:01:51 von dstroma

Maybe you need to use PerlSetEnv ?

----- Original Message -----
From: "Rob French"
To: "André Warnier"
Cc:
Sent: Wednesday, March 19, 2008 2:41 PM
Subject: Re: [mp1] Can't get UTF8 input streams to automatically be decoded
using PERL_UNICODE under mod_perl


I have tried setting it via Apache SetEnv directive as well as in my
environment as root when starting Apache. In both cases the variable
is correctly set in mod_perl it is just ignored.

As another test I tried the same code as a plain ol' CGI script and it
works in that case. So the issue is definitely with mod_perl and its
interaction with the PERL_UNICODE env variable.

Thanks for your help investigating. I was worried that it might be a
mod_perl 1.x thing or a Perl version thing. Good to know it isn't just
my setup :)

Rgrds,
Rob

Re: [mp1] Can"t get UTF8 input streams to automatically be decoded using PERL_UNICODE under mod_perl

am 19.03.2008 20:14:50 von Rob French

Setting the environment variable has always worked. mod_perl can "see"
the PERL_UNICODE variable is set based on the fact that the
${^UNICODE} variable is returning 63 (SDA). The problem is that it
seems to ignore it.

On Wed, Mar 19, 2008 at 12:01 PM, Dondi Stroma wrote:
> Maybe you need to use PerlSetEnv ?
>
>
>
> ----- Original Message -----
> From: "Rob French"
> To: "Andr=E9 Warnier"
> Cc:
> Sent: Wednesday, March 19, 2008 2:41 PM
> Subject: Re: [mp1] Can't get UTF8 input streams to automatically be deco=
ded
> using PERL_UNICODE under mod_perl
>
>
> I have tried setting it via Apache SetEnv directive as well as in my
> environment as root when starting Apache. In both cases the variable
> is correctly set in mod_perl it is just ignored.
>
> As another test I tried the same code as a plain ol' CGI script and it
> works in that case. So the issue is definitely with mod_perl and its
> interaction with the PERL_UNICODE env variable.
>
> Thanks for your help investigating. I was worried that it might be a
> mod_perl 1.x thing or a Perl version thing. Good to know it isn't just
> my setup :)
>
> Rgrds,
> Rob
>
>

Re: [mp1] Can"t get UTF8 input streams to automatically be decodedusing PERL_UNICODE under mod_perl

am 19.03.2008 20:37:52 von aw

And I think PerlSetEnv would not work anyway.
It will set PERL_UNICODE in time for the handler/script to print it, but
probably too late for Perl to take it into account, since by that time
the Perl interpreter is already up and running, so the internal
$^UNICODE variable is already set since a long time.
That's why I was asking when it was being set.

By the way, in my case (apache2/mp2 and virtual servers), the Apache
SetEnv sets $ENV{PERL_UNICODE} for the handler, but $^UNICODE remains 0.
One more thing to try : doing a
use open ':utf8';
in the global mod_perl startup script.


Rob French wrote:
> Setting the environment variable has always worked. mod_perl can "see"
> the PERL_UNICODE variable is set based on the fact that the
> ${^UNICODE} variable is returning 63 (SDA). The problem is that it
> seems to ignore it.
>
> On Wed, Mar 19, 2008 at 12:01 PM, Dondi Stroma wrote:
>> Maybe you need to use PerlSetEnv ?
>>
>>
>>
>> ----- Original Message -----
>> From: "Rob French"
>> To: "André Warnier"
>> Cc:
>> Sent: Wednesday, March 19, 2008 2:41 PM
>> Subject: Re: [mp1] Can't get UTF8 input streams to automatically be decoded
>> using PERL_UNICODE under mod_perl
>>
>>
>> I have tried setting it via Apache SetEnv directive as well as in my
>> environment as root when starting Apache. In both cases the variable
>> is correctly set in mod_perl it is just ignored.
>>
>> As another test I tried the same code as a plain ol' CGI script and it
>> works in that case. So the issue is definitely with mod_perl and its
>> interaction with the PERL_UNICODE env variable.
>>
>> Thanks for your help investigating. I was worried that it might be a
>> mod_perl 1.x thing or a Perl version thing. Good to know it isn't just
>> my setup :)
>>
>> Rgrds,
>> Rob
>>
>>
>

Re: [mp1] Can"t get UTF8 input streams to automatically be decodedusing PERL_UNICODE under mod_perl

am 19.03.2008 20:45:32 von aw

André Warnier wrote:
> One more thing to try : doing a
> use open ':utf8';
> in the global mod_perl startup script.
>

well, that works.
Rob, that should probably help you.
The difference with PERL_UNICODE "SAD" seems to be that it will not
automatically consider @ARGV as utf-8.



> Rob French wrote:
>> Setting the environment variable has always worked. mod_perl can "see"
>> the PERL_UNICODE variable is set based on the fact that the
>> ${^UNICODE} variable is returning 63 (SDA). The problem is that it
>> seems to ignore it.
>>
>> On Wed, Mar 19, 2008 at 12:01 PM, Dondi Stroma
>> wrote:
>>> Maybe you need to use PerlSetEnv ?
>>>
>>>
>>>
>>> ----- Original Message -----
>>> From: "Rob French"
>>> To: "André Warnier"
>>> Cc:
>>> Sent: Wednesday, March 19, 2008 2:41 PM
>>> Subject: Re: [mp1] Can't get UTF8 input streams to automatically be
>>> decoded
>>> using PERL_UNICODE under mod_perl
>>>
>>>
>>> I have tried setting it via Apache SetEnv directive as well as in my
>>> environment as root when starting Apache. In both cases the variable
>>> is correctly set in mod_perl it is just ignored.
>>>
>>> As another test I tried the same code as a plain ol' CGI script and it
>>> works in that case. So the issue is definitely with mod_perl and its
>>> interaction with the PERL_UNICODE env variable.
>>>
>>> Thanks for your help investigating. I was worried that it might be a
>>> mod_perl 1.x thing or a Perl version thing. Good to know it isn't just
>>> my setup :)
>>>
>>> Rgrds,
>>> Rob
>>>
>>>
>>
>

Re: [mp1] Can"t get UTF8 input streams to automatically be decoded using PERL_UNICODE under mod_perl

am 19.03.2008 21:04:39 von Rob French

Good suggestion. It looks like that works for my simple open() example
but unfortunately it doesn't work when reading from sockets. What I am
trying to do is tell Perl that all incoming POST data is UTF-8 encoded
and flag it as such. The $r->read() call unfortunately doesn't abide
by the open pragma.

Looks like I might have to go dig through source :-)

Thanks again for the help.

On Wed, Mar 19, 2008 at 12:45 PM, Andr=E9 Warnier wrote:
>
>
> Andr=E9 Warnier wrote:
> > One more thing to try : doing a
> > use open ':utf8';
> > in the global mod_perl startup script.
> >
>
> well, that works.
> Rob, that should probably help you.
> The difference with PERL_UNICODE "SAD" seems to be that it will not
> automatically consider @ARGV as utf-8.
>
>
>
>
>
> > Rob French wrote:
> >> Setting the environment variable has always worked. mod_perl can "see=
"
> >> the PERL_UNICODE variable is set based on the fact that the
> >> ${^UNICODE} variable is returning 63 (SDA). The problem is that it
> >> seems to ignore it.
> >>
> >> On Wed, Mar 19, 2008 at 12:01 PM, Dondi Stroma
> >> wrote:
> >>> Maybe you need to use PerlSetEnv ?
> >>>
> >>>
> >>>
> >>> ----- Original Message -----
> >>> From: "Rob French"
> >>> To: "Andr=E9 Warnier"
> >>> Cc:
> >>> Sent: Wednesday, March 19, 2008 2:41 PM
> >>> Subject: Re: [mp1] Can't get UTF8 input streams to automatically be
> >>> decoded
> >>> using PERL_UNICODE under mod_perl
> >>>
> >>>
> >>> I have tried setting it via Apache SetEnv directive as well as in m=
y
> >>> environment as root when starting Apache. In both cases the variabl=
e
> >>> is correctly set in mod_perl it is just ignored.
> >>>
> >>> As another test I tried the same code as a plain ol' CGI script and=
it
> >>> works in that case. So the issue is definitely with mod_perl and it=
s
> >>> interaction with the PERL_UNICODE env variable.
> >>>
> >>> Thanks for your help investigating. I was worried that it might be =
a
> >>> mod_perl 1.x thing or a Perl version thing. Good to know it isn't j=
ust
> >>> my setup :)
> >>>
> >>> Rgrds,
> >>> Rob
> >>>
> >>>
> >>
> >
>