<Form> and Arabic input problems
am 11.09.2007 08:44:37 von shrorHi every body,
I need some help regarding my
Hi every body,
I need some help regarding my
Scripsit shror:
> when I was testing the Arabic side the
On Sep 11, 12:47 pm, "Jukka K. Korpela"
> Scripsit shror:
>
> > when I was testing the Arabic side the
Scripsit shror:
> On Sep 11, 12:47 pm, "Jukka K. Korpela"
...
>> For example, if a page is ASCII (or ISO-8859-1) encoded, then the
>> form data encoding is the same by default, and if you enter an
>> Arabic character in a form there, the effect is _undefined_ by HTML
>> specifications. What browsers might do is to represent the
>> characters that have no representation in the encoding by character
>> references like ب (or by entity references, when applicable).
>> This is really odd, since the form data is just character data, not
>> HTML, but on the other hand, what else could a poor browser do?
>>
>> You could tweak your form handler into dealing with such references,
>> but the real solution is to make the page UTF-8 encoded and to make
>> the form handler deal with UTF-8 data.
...
> sorry for not sending my URL I know its stupidity but here it is
> http://www.mobidp.com/request2.htm
The situation is basically what I wrote in the quoted text, just with
windows-1252 (Windows Latin 1) as the encoding. The encoding in unable to
represent any Arabic letters.
The encoding is specified in a tag, and HTTP headers are silent about
encoding, so it would be almost trivial to change the encoding to utf-8, by
modifying the tag and by replacing all non-ASCII characters (such as
the copyright sign) by entity or character references (such as ©).
ASCII data constitutes utf-8 data too.
But there's probably much more to be done on the server side, in the form
handler (confirmation.php). It would need to be modified so that it can read
utf-8 data and process it meaningfully.
The bad news is that PHP does not support utf-8 yet, except in fairly
limited ways.
Alternative tricks:
1) Let the page be windows-1252 encoded, and just get prepared to getting
stuff like ب. If you pass them into an HTML document, _without_
encoding the "&" in any way, they will appear as the characters they denote
by HTML rules. (This is actually the way people have built, probably by
accident, a poor man's Unicode support to one of the most popular web-based
discussion forums in Finland, suomi24.fi.) There is no guarantee that this
will work, but it happens to work in most situations.
2) Make the Arabic page windows-1256 (Windows Arabic) or iso-8859-6 (ISO
Latin/Arabic) encoded. Your form handler will then get Arabic letters in the
specified 8-bit encoding. This in principle restricts input to characters
representable in the chosen encoding, but in practice you usually get a
number; stuff for other characters.
P.S. Your form has a single-line input field for "Address", which is
probably for a postal address, since you also have "E-mail". Normally you
should reserve a textarea of six lines for input of a postal address, but in
this case, _if_ you include the postal address input (why?), then I think
you should have two textareas, one for the address in Latin letters and one
for the eventual address in the local writing system. According to the
International Postal Union, a letter sent e.g. to an Arabic-speaking country
from abroad should have the recipient address in two ways, in Latin letters
and in Arabic letters.
--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
On Sep 12, 10:05 am, "Jukka K. Korpela"
> Scripsit shror:
>
>
>
>
>
> > On Sep 11, 12:47 pm, "Jukka K. Korpela"
> ...
> >> For example, if a page is ASCII (or ISO-8859-1) encoded, then the
> >> form data encoding is the same by default, and if you enter an
> >> Arabic character in a form there, the effect is _undefined_ by HTML
> >> specifications. What browsers might do is to represent the
> >> characters that have no representation in the encoding by character
> >> references like ب (or by entity references, when applicable).
> >> This is really odd, since the form data is just character data, not
> >> HTML, but on the other hand, what else could a poor browser do?
>
> >> You could tweak your form handler into dealing with such references,
> >> but the real solution is to make the page UTF-8 encoded and to make
> >> the form handler deal with UTF-8 data.
> ...
> > sorry for not sending my URL I know its stupidity but here it is
> >http://www.mobidp.com/request2.htm
>
> The situation is basically what I wrote in the quoted text, just with
> windows-1252 (Windows Latin 1) as the encoding. The encoding in unable to
> represent any Arabic letters.
>
> The encoding is specified in a tag, and HTTP headers are silent about
> encoding, so it would be almost trivial to change the encoding to utf-8, by
> modifying the tag and by replacing all non-ASCII characters (such as
> the copyright sign) by entity or character references (such as ©).
> ASCII data constitutes utf-8 data too.
>
> But there's probably much more to be done on the server side, in the form
> handler (confirmation.php). It would need to be modified so that it can read
> utf-8 data and process it meaningfully.
>
> The bad news is that PHP does not support utf-8 yet, except in fairly
> limited ways.
>
> Alternative tricks:
>
> 1) Let the page be windows-1252 encoded, and just get prepared to getting
> stuff like ب. If you pass them into an HTML document, _without_
> encoding the "&" in any way, they will appear as the characters they denote
> by HTML rules. (This is actually the way people have built, probably by
> accident, a poor man's Unicode support to one of the most popular web-based
> discussion forums in Finland, suomi24.fi.) There is no guarantee that this
> will work, but it happens to work in most situations.
>
> 2) Make the Arabic page windows-1256 (Windows Arabic) or iso-8859-6 (ISO
> Latin/Arabic) encoded. Your form handler will then get Arabic letters in the
> specified 8-bit encoding. This in principle restricts input to characters
> representable in the chosen encoding, but in practice you usually get a
> number; stuff for other characters.
>
> P.S. Your form has a single-line input field for "Address", which is
> probably for a postal address, since you also have "E-mail". Normally you
> should reserve a textarea of six lines for input of a postal address, but in
> this case, _if_ you include the postal address input (why?), then I think
> you should have two textareas, one for the address in Latin letters and one
> for the eventual address in the local writing system. According to the
> International Postal Union, a letter sent e.g. to an Arabic-speaking country
> from abroad should have the recipient address in two ways, in Latin letters
> and in Arabic letters.
>
> --
> Jukka K. Korpela ("Yucca")http://www.cs.tut.fi/~jkorpela/
Really I dont know how to thank your Jukka,
I just changed the encoding to UTF-8 and tryed my
Scripsit shror:
> I just changed the encoding to UTF-8 and tryed my