Bug: Character sets and $r->custom_response
Bug: Character sets and $r->custom_response
am 23.05.2008 12:09:52 von Clinton Gormley
Hi all
There seems to be a bug in the mod_perl2/apache2 handling of character
sets for $r->custom_response(). I'm not sure which is at fault.
My pages are all in UTF8, but I can't find a way to set this character
set for custom generated error pages.
I've tried:
- $r->content_type('text/html; charset = utf8');
- $r->err_headers_out('Content-type' =>'text/html; charset = utf8');
-
- AddDefaultCharset UTF8
to no avail - apache always overrides this with:
Content-Type: text/html; charset=iso-8859-1
According to the apache docs for AddDefaultCharset:
Note: This will not have any effect on the Content-Type and
character set for default Apache-generated status pages (such as
'404 Not Found' or '301 Moved Permanently') because those have
an actual character set (that in which the hard-coded page
content is written) and don't need to have a default applied.
That implies that the character set is taken from the file itself, but
that shouldn't apply to errors generated with
$r->custom_response($error_msg)
For now, I plan to just entity escape anything that isn't in the ASCII
range, but is there a workaround? Should this be fixed?
thanks
Clint
Re: Bug: Character sets and $r->custom_response
am 23.05.2008 13:12:23 von Clinton Gormley
> For now, I plan to just entity escape anything that isn't in the ASCII
> range, but is there a workaround? Should this be fixed?
For those looking for an easy workaround for this, this is what I've
used:
$output = Encode::encode('iso-8859-1',$output,Encode::FB_HTMLCREF);
To explain:
- $output contains the full HTML page that I'm passing to
$r->custom_response
- This, of course, includes < > characters, so you don't want to do
a straight encode_entities
- So the above command tries to convert the string to ISO-8859-1
- The last argument tells encode that, when it finds a character that
it can't represent in ISO-8859-1, it should replace it with the
relevant HTML entity.
Clint
Re: Bug: Character sets and $r->custom_response
am 23.05.2008 14:29:28 von Geoffrey Young
Clinton Gormley wrote:
> Hi all
>
> There seems to be a bug in the mod_perl2/apache2 handling of character
> sets for $r->custom_response(). I'm not sure which is at fault.
>
> My pages are all in UTF8, but I can't find a way to set this character
> set for custom generated error pages.
>
> I've tried:
>
> - $r->content_type('text/html; charset = utf8');
> - $r->err_headers_out('Content-type' =>'text/html; charset = utf8');
> -
> - AddDefaultCharset UTF8
>
> to no avail - apache always overrides this with:
>
> Content-Type: text/html; charset=iso-8859-1
>
> According to the apache docs for AddDefaultCharset:
>
> Note: This will not have any effect on the Content-Type and
> character set for default Apache-generated status pages (such as
> '404 Not Found' or '301 Moved Permanently') because those have
> an actual character set (that in which the hard-coded page
> content is written) and don't need to have a default applied.
>
> That implies that the character set is taken from the file itself, but
> that shouldn't apply to errors generated with
> $r->custom_response($error_msg)
>
> For now, I plan to just entity escape anything that isn't in the ASCII
> range, but is there a workaround? Should this be fixed?
this isn't a mod_perl thing, it's an httpd thing. this is from 1.3:
http://www.mail-archive.com/modperl@apache.org/msg20549.html
it seems the same holds true in httpd 2.0, it seems. see
ap_send_error_response in
modules/http/http_protocol.c
try setting subprocess_env(suppress-error-charset => 1) and see if that
helps you at all.
--Geoff
Re: Bug: Character sets and $r->custom_response
am 23.05.2008 14:49:42 von Clinton Gormley
> this isn't a mod_perl thing, it's an httpd thing. this is from 1.3:
>
> http://www.mail-archive.com/modperl@apache.org/msg20549.html
>
> it seems the same holds true in httpd 2.0, it seems. see
> ap_send_error_response in
>
> modules/http/http_protocol.c
>
> try setting subprocess_env(suppress-error-charset => 1) and see if that
> helps you at all.
Hah! That does indeed work. Well, it doesn't honour my utf8 setting,
but it just sends no character set.
Rather than that, I'm going to keep the explicit character set with the
entity escaping - at least that way the result "should" be consistent.
thanks for the help Geoff
Clint