utf8 urls
am 19.03.2008 13:06:59 von Eli Shemer
This is a multipart message in MIME format.
--Boundary_(ID_kKUiYFPEHfWfqVOTztupgA)
Content-type: text/plain; charset=windows-1255
Content-transfer-encoding: quoted-printable
Hey there
=20
For some reason the following test doesn=92t print anything out to the =
screen
Do I need to change something in the apache configuration, or =
mod_perl=92s ?
=20
/articles_read.pl?id=çåæøú
=20
## get http parameters
$r =3D shift;
$apr =3D Apache2::Request->new($r);
print $apr->param('id');
=20
=20
thanks in advance.
=20
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.=20
Version: 7.5.503 / Virus Database: 269.16.4/1146 - Release Date: =
22/11/2007
18:55
=20
--Boundary_(ID_kKUiYFPEHfWfqVOTztupgA)
Content-type: text/html; charset=windows-1255
Content-transfer-encoding: quoted-printable
xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" =
xmlns=3D"http://www.w3.org/TR/REC-html40">
charset=3Dwindows-1255">
Hey there
For some reason the following test doesn=92t print =
anything
out to the screen
Do I need to change something in the apache =
configuration,
or mod_perl=92s ?
/articles_read.pl?id=3D
style=3D'font-family:
"Arial","sans-serif"'>çåæø=FA
style=3D'font-family:"Arial","sans-serif"'>
## =
get http
parameters
$r =3D shift;
$apr =3D =
Apache2::Request->new($r);
print =A0$apr->param('id');
thanks in advance.
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.503 / Virus Database: 269.16.4/1146 - Release Date: =
22/11/2007 18:55
--Boundary_(ID_kKUiYFPEHfWfqVOTztupgA)--
Re: utf8 urls
am 19.03.2008 13:18:47 von John ORourke
This is a multi-part message in MIME format.
--------------070401030504050708080000
Content-Type: text/plain; charset=windows-1255; format=flowed
Content-Transfer-Encoding: quoted-printable
Eli Shemer wrote:
>
> For some reason the following test doesn=92t print anything out to the =
> screen
>
> Do I need to change something in the apache configuration, or mod_perl=92=
s ?
>
> =20
>
> /articles_read.pl?id=çåæøú
>
> =20
>
> ## get http parameters
>
> $r =3D shift;
>
> $apr =3D Apache2::Request->new($r);
>
> print $apr->param('id');
>
I'm not sure why you get nothing, but I can tell you strings read from=20
Apache objects come through as octets and need to be decoded before=20
use. We're using UTF-8 chars in URLs but I've never used one in a GET=20
request parameter.
hope that helps,
John
> =20
>
> =20
>
> thanks in advance.
>
> =20
>
>
> Internal Virus Database is out-of-date.
> Checked by AVG Free Edition.
> Version: 7.5.503 / Virus Database: 269.16.4/1146 - Release Date:=20
> 22/11/2007 18:55
>
--------------070401030504050708080000
Content-Type: text/html; charset=windows-1255
Content-Transfer-Encoding: quoted-printable
http-equiv=3D"Content-Type">
Eli Shemer wrote:
type=3D"cite">
">
For some reason the following test doesn=92t pri=
nt
anything
out to the screen
Do I need to change something in the apache
configuration,
or mod_perl=92s ?
=A0
/articles_read.pl?id=3D
style=3D"font-family: "Arial","sans-serif";" lang=3D=
"HE">çåæø=FA
style=3D"font-family: "Arial","sans-serif";" lang=3D=
"HE">=A0
n>##
get http
parameters
$r =3D shift;
$apr =3D Apache2::Request->new($r);
>
print =A0$apr->param('id');
I'm not sure why you get nothing, but I can tell you strings read from
Apache objects come through as octets and need to be decoded before
use.=A0 We're using UTF-8 chars in URLs but I've never used one in a GET
request parameter.
hope that helps,
John
type=3D"cite">
=A0
=A0
thanks in advance.
=A0
Internal Virus Database is out-of-date.
Checked by AVG Free Edition.
Version: 7.5.503 / Virus Database: 269.16.4/1146 - Release Date:
22/11/2007 18:55
--------------070401030504050708080000--
Re: utf8 urls
am 19.03.2008 13:42:14 von aw
From a previous message by Adam Prime in this same list :
[...]
SetHandler modperl doesn't bind 'print' to '$r->print'. Try SetHandler
perl-script, or change your code to pass in the request object and use
$r->print instead of print.
[...]
or, more verbously and explicitly :
if in your Apache configuration for this "location", you used
SetHandler modperl
then, you should not assume that print() sends its output to the
browser. But if you did (like you did)
$r = shift; # get the Apache::RequestRec object
then $r->print() does go back as a response to the browser.
You should probably at least set a content-type header though,
like
$r->content_type('text/plain');
$r->print $apr->param('id');
and, in your case, it might also be a good idea to send back a header
indicating which is the character set used (presumably UTF-8), since the
default HTTP character set is iso-8859-1, and the string you send back
doesn't look as being printable in that charset.
But I don't know exactly how to do that best in mod_perl.
Would the following work ?
$r->content_type('text/plain; charset="UTF-8"');
Also, the previous message talking about how to handle your (apparently)
UTF-8 request should be taken into account.
André
Eli Shemer wrote:
> Hey there
>
>
>
> For some reason the following test doesnât print anything out to the screen
>
> Do I need to change something in the apache configuration, or mod_perlâs ?
>
>
>
> /articles_read.pl?id=×××רת
>
>
>
> ## get http parameters
>
> $r = shift;
>
> $apr = Apache2::Request->new($r);
>
> print $apr->param('id');
>
>
>
>
>
> thanks in advance.
>
>
>
>
> Internal Virus Database is out-of-date.
> Checked by AVG Free Edition.
> Version: 7.5.503 / Virus Database: 269.16.4/1146 - Release Date: 22/11/2007
> 18:55
>
>
Re: utf8 urls
am 19.03.2008 13:54:08 von Geoffrey Young
John ORourke wrote:
> Eli Shemer wrote:
>>
>> For some reason the following test doesnt print anything out to the
>> screen
>>
>> Do I need to change something in the apache configuration, or
>> mod_perls ?
>>
>>
>>
>> /articles_read.pl?id=çåæøú
>>
>>
>>
>> ## get http parameters
>>
>> $r = shift;
>>
>> $apr = Apache2::Request->new($r);
>>
>> print $apr->param('id');
>>
>
> I'm not sure why you get nothing, but I can tell you strings read from
> Apache objects come through as octets and need to be decoded before
> use. We're using UTF-8 chars in URLs but I've never used one in a GET
> request parameter.
I can't say why it doesn't work, but I'm surprised it would in either
case - the only characters explicitly allowed in a uri are us-ascii.
from rfc2396:
2.4. Escape Sequences
Data must be escaped if it does not have a representation using an
unreserved character; this includes data that does not correspond to
a printable character of the US-ASCII coded character set, or that
corresponds to any US-ASCII character that is disallowed, as
explained below.
I bit of googling turned up this cpan module:
http://search.cpan.org/dist/URI-Find-UTF8/lib/URI/Find/UTF8. pm
where the docs point to a ja.wikipedia.org page. for me (firefox 2.0)
clicking on the "original" uri (the one with the japanese characters)
opens up a uri with the uri-escaped character sequence. it's like magic ;)
anyway, my point wasn't to get into some huge debate on whether people
are (successfully) using utf-8 characters in uris, etc. rather, it is
that mod_perl is (mostly) merely a wrapper around apache, and if
something is improper wrt an official rfc apache generally dismisses it
rather than bending to a behavior which people may be using anyway.
so, if it works, great. if not, try making your urls conform to 2396
and see if you have better results.
--Geoff
Re: utf8 urls
am 19.03.2008 13:54:21 von torsten.foertsch
On Wed 19 Mar 2008, Eli Shemer wrote:
> For some reason the following test doesnâ=99t print anything out to =
the screen
>
> Do I need to change something in the apache configuration, or mod_perl=E2=
s ?
>
> Â
>
> /articles_read.pl?id=×××ר×=AA
This is probably a bug in libapreq2. I have tried this handler:
sub {
my $r=3D$_[0];
$r->content_type('text/html; charset=3DUTF-8');
my $x=3DApache2::Request->new($r);
$r->print("
\nargs=3D".$r->args."\nparam(x)=3D". =20
$x->param('x')."\n\n");
return Apache2::Const::OK;
}
http://localhost/test?x=×××ר×=AA entered in FF chan=
ges on the fly into
http://localhost/test?x=3D%D7%97%D7%95%D7%96%D7%A8%D7%AA and it works.
But on the command line with curl it doesn't:
$ curl 'http://localhost/test?x=×××ר×=AA' -v
* About to connect() to localhost port 80 (#0)
* Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 80 (#0)
> GET /test?x=×××ר×=AA HTTP/1.1
> User-Agent: curl/7.16.4 (i686-suse-linux-gnu) libcurl/7.16.4 OpenSSL/0.9.=
8e=20
zlib/1.2.3 libidn/1.0
> Host: localhost
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Wed, 19 Mar 2008 12:45:29 GMT
< Server: Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.8e DAV/2 SVN/1.4.5=
=20
mod_apreq2-20051231/2.6.0 mod_perl/2.0.4-dev Perl/v5.8.8
< Transfer-Encoding: chunked
< Content-Type: text/html; charset=3DUTF-8
<
args=3Dx=×××ר×=AA
param(x)=3D
* Connection #0 to host localhost left intact
* Closing connection #0
Torsten
Re: utf8 urls
am 19.03.2008 14:13:43 von John ORourke
Geoffrey Young wrote:
> John ORourke wrote:
>> Eli Shemer wrote:
>>>
>>> For some reason the following test doesnâ=99t print anything out=
to the=20
>>> screen
>>>
>> I'm not sure why you get nothing, but I can tell you strings read=20
>> from Apache objects come through as octets and need to be decoded=20
>> before use. We're using UTF-8 chars in URLs but I've never used one=20
>> in a GET request parameter.
>
> I can't say why it doesn't work, but I'm surprised it would in either=20
> case - the only characters explicitly allowed in a uri are us-ascii.=20
> from rfc2396:
>
My bad memory there - you are quite correct. The way we do it is the=20
accepted way - to URL-encode the UTF-8 encoded text, and that will work=20
with URLs and parameters.
eg:
http://www....../categories/name/ty%C3%B6kalut-lamput
is the correct form of:
http://www....../categories/name/työkalut-lamput
encode before printing:
$octets =3D utf8_encode($my_utf8_string); # make octets
$octets =3D~ s/([^\041-\177])/sprintf("%%%02X",ord($1))/ge; # URL-encode =
non-ASCII chars
$r->print($octets);
(the above is simplified - you'll also need to encode question marks etc)=
decode after reading:
$url =3D utf8_decode ( $r->uri() );
or
$param =3D utf8_decode ( $r->param('info') );
cheers
John
Re: utf8 urls
am 19.03.2008 14:32:58 von aw
I think that these things can get very confused and confusing very
quickly, unless one steps through them one step at a time.
Let me try a first iteration :
1) URI's, as sent to the HTTP server, should contain only US-ASCII
characters (and no spaces). If there are other characters, they should
be encoded using the appropriate RFC-dictated URI-encoding scheme.
2) Whether Firefox is smart enough to automatically encode a URI
properly, when it notices that it contains non-US-ASCII characters, is a
nice aspect of Firefox if it does, but should not confuse the main issue.
In other words, if you send a non-ASCII URI to a server (via curl or
lwp-request e.g.), then you should arrange yourself to URI-encode the
request.
3) According to a previous response, at the receiving side, when Apache
gets a properly-encoded request URI containing non-ASCII characters, it
leaves it encoded and passes it "as is" (or "as bytes") to the
processing layer, which in this case is mod_perl.
4) mod_perl parses the URI and makes it accessible in several ways to
the modules running under it (in this case a request handler or a script).
Question : does mod_perl decode the URI string prior to passing it in
bits and pieces to the handler/script, or not ?
(From another response, it would seem that it doesn't)
5) the handler/script obtains the URI parts from mod_perl, possibly
through the RequestRec or Request object.
If such URI parts contained non-ASCII characters, do these modules
perform any translation, or does the handler/script still receive them
as URI-encoded ?
(From another response, it would seem that they don't, and it does)
6) Now the handler/script has the value of the (for instance) query
parameter "id" (and assume it contains non-ASCII characters), and it
wants to output it back to the browser.
To do that, it must arrange to send to the browser a HTTP header that
will tell the browser in which character set this response is encoded,
since by default the HTTP protocol says it is iso-8859-1.
And it seems that in order to do that, it should use, as minimum
$param = $apr->param('id');
$r->content_type('text/plain; charset="UTF-8"');
$r->print $param;
There are a couple of aspects not mentioned above, such as
- how does the handler/script "know" which decoding it should apply to
the URI elements ? Is it certain that it is UTF-8 ?
Another go, anyone ?
André
Torsten Foertsch wrote:
> On Wed 19 Mar 2008, Eli Shemer wrote:
>> For some reason the following test doesnât print anything out to the screen
>>
>> Do I need to change something in the apache configuration, or mod_perlâs ?
>>
>>
>>
>> /articles_read.pl?id=×××רת
>
> This is probably a bug in libapreq2. I have tried this handler:
>
> sub {
> my $r=$_[0];
> $r->content_type('text/html; charset=UTF-8');
> my $x=Apache2::Request->new($r);
> $r->print("
\nargs=".$r->args."\nparam(x)=".
> $x->param('x')."\n\n");
> return Apache2::Const::OK;
> }
>
> http://localhost/test?x=×××רת entered in FF changes on the fly into
> http://localhost/test?x=%D7%97%D7%95%D7%96%D7%A8%D7%AA and it works.
>
> But on the command line with curl it doesn't:
>
> $ curl 'http://localhost/test?x=×××רת' -v
> * About to connect() to localhost port 80 (#0)
> * Trying 127.0.0.1... connected
> * Connected to localhost (127.0.0.1) port 80 (#0)
>> GET /test?x=×××רת HTTP/1.1
>> User-Agent: curl/7.16.4 (i686-suse-linux-gnu) libcurl/7.16.4 OpenSSL/0.9.8e
> zlib/1.2.3 libidn/1.0
>> Host: localhost
>> Accept: */*
>>
> < HTTP/1.1 200 OK
> < Date: Wed, 19 Mar 2008 12:45:29 GMT
> < Server: Apache/2.2.6 (Unix) mod_ssl/2.2.6 OpenSSL/0.9.8e DAV/2 SVN/1.4.5
> mod_apreq2-20051231/2.6.0 mod_perl/2.0.4-dev Perl/v5.8.8
> < Transfer-Encoding: chunked
> < Content-Type: text/html; charset=UTF-8
> <
>
> args=x=×××רת
> param(x)=
>
> * Connection #0 to host localhost left intact
> * Closing connection #0
>
> Torsten
>