Re: Encoding problem

Re: Encoding problem

am 23.04.2010 09:49:39 von CROMBEZ Emmanuel

Hello

My solution for UTF-8 problem in web page are in 3 steps:
1 - fixe content-type
2 - fixe xml encoding
3 - fixe html header encoding

In mod_perl , use

$r->content_type('text/html; Charset=UTF-8');

The first line of your html page must be :



And in the you must have :



If you doesn't have this 3 lines, some part of your page works and other
doesn't. For exemple if you don't set the , the title of your page
doesn't work all the time.

I write article in my blog (in french) here :
http://ecrombez.lantrasite.com/index.pl?PAGE=53&ID_BILLET=11 5

Le vendredi 23 avril 2010 à 09:27 +0200, kolikov a écrit :
> Jean-Christophe Boggio wrote:
> > ? The problem comes from the header I *receive*. The headers I send are
> > always good (hard coded in base.epl). I'm quoting myself :
> >
> >>
>
> If it may help :
>
> All my scripts are written in utf-8 encoding
> My default system/database locales are utf-8
> My apache2.conf is the default one
>
> # cat /etc/apache2/sites-available/mysite
>
> AddDefaultCharset utf-8
> ETC ....
>

>
> My html headers are the same as yours.
> But I put on Every


>
>
>
> Which may make the point ...
>
> Bregs,
> Romu.
>

Re: Encoding problem

am 23.04.2010 10:30:31 von aw

Emmanuel CROMBEZ wrote:
> Hello
>
> My solution for UTF-8 problem in web page are in 3 steps:
> 1 - fixe content-type
> 2 - fixe xml encoding
> 3 - fixe html header encoding
>
> In mod_perl , use
>
> $r->content_type('text/html; Charset=UTF-8');
>
> The first line of your html page must be :
>
>
>
> And in the you must have :
>
>
>
The above is all good, but you should also add what koliko wrote :


... but this is still not 100% foolproof, unfortunately.
There are still several aspects that can still give a problem :

1) according to the HTTP specification, the request URL (of which a
query_string is a part, for a GET) does not have any particular
encoding. That means that the proper decoding of that query string
require some kind of agreement between the client and the server.

2) in the tag above, the method is POST. That means that the
data will arrive in the body of the request. There are 2 ways of
encoding POST data (or rather, to present it) :
- www-form-urlencoded (the default)
- multipart/form-data
To specifiy which method the browser should use, you should have an
additional attribute in the tag, e.g. :
enctype="multipart/form-data">
Theoretically, in the multipart/form-data format, each form parameter is
submitted in a separate "section" of the data, a bit like an email with
attachments. And each part should have a Content-type header, with a
charset. Unfortunately, the last time I looked, browsers do not do
specify the charset for form parameters. (That is a real pity, because
that would be the right solution.)

3) Finally, no matter what you do at the server side, ultimately you are
sending this to a browser on the client side. And the ultimate master
of the browser is the user who sits in front of it. If the user wants
to change the browser settings (including the charset of your page) he can.
The user can also be using a bad browser (who decides itself how your
page should be interpreted), or a program that is not a browser, but
just simulates one (think curl, wget, lwp-request).

An example : on the server side, save a html page with the MS Notepad,
as UTF-8. Notepad then automatically adds a "BOM" at the beginning of
the file. Now send this page to IE. It does not matter which charset
you set in the HTTP headers, or in the page's tags, IE will look
at the BOM and decide that this is UTF-8. Always.

(IE also has a setting : "send all URLs as UTF-8").

So I add yet another gimmick to my form pages : a hidden field
containing a known "accented" character sequence. Then when the
parameters of the form are posted to the server, the perl code on the
server side checks the length (in bytes) of this parameter. If it
matches the expected byte length and value of the hidden field, then
chances are that everything is OK. If not, something funny happened.
Of course, a user really out to get you can also save the form, edit the
hidden field, and submit the modified form to your server.

Everything that can happen, will happen at some time. It is just a
matter of how much the incentive is to do it.















> If you doesn't have this 3 lines, some part of your page works and other
> doesn't. For exemple if you don't set the , the title of your page
> doesn't work all the time.
>
> I write article in my blog (in french) here :
> http://ecrombez.lantrasite.com/index.pl?PAGE=53&ID_BILLET=11 5
>
> Le vendredi 23 avril 2010 à 09:27 +0200, kolikov a écrit :
>> Jean-Christophe Boggio wrote:
>>> ? The problem comes from the header I *receive*. The headers I send are
>>> always good (hard coded in base.epl). I'm quoting myself :
>>>
>>>>
>> If it may help :
>>
>> All my scripts are written in utf-8 encoding
>> My default system/database locales are utf-8
>> My apache2.conf is the default one
>>
>> # cat /etc/apache2/sites-available/mysite
>>
>> AddDefaultCharset utf-8
>> ETC ....
>>

>>
>> My html headers are the same as yours.
>> But I put on Every
>>
>>
>>
>> Which may make the point ...
>>
>> Bregs,
>> Romu.
>>
>
>
>