<Form> and Arabic input problems

<Form> and Arabic input problems

am 11.09.2007 08:44:37 von shror

Hi every body,

I need some help regarding my

,
I have a website in English & my customer asked me to make another
copy of the website in Arabic language and I have an email Form
posting to a php script for sending the form entries to his email but
when I was testing the Arabic side the wasn't sent correctly
when I entered Arabic text.
What I got was nothing except Unicode letters like this:
بامب

but all the text in my pages is viewable very good so please tell me
whats the problem and how to solve it

Thanks for your help in advance

shror

Re: <Form> and Arabic input problems

am 11.09.2007 11:47:34 von jkorpela

Scripsit shror:

> when I was testing the Arabic side the wasn't sent correctly
> when I entered Arabic text.
> What I got was nothing except Unicode letters like this:
> بامب

As usual, revealing a URL would have helped to analyze the problem. But it
looks pretty obvious that the problem is in the character encoding of the
form data.

For example, if a page is ASCII (or ISO-8859-1) encoded, then the form data
encoding is the same by default, and if you enter an Arabic character in a
form there, the effect is _undefined_ by HTML specifications. What browsers
might do is to represent the characters that have no representation in the
encoding by character references like ب (or by entity references, when
applicable). This is really odd, since the form data is just character data,
not HTML, but on the other hand, what else could a poor browser do?

You could tweak your form handler into dealing with such references, but the
real solution is to make the page UTF-8 encoded and to make the form handler
deal with UTF-8 data.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Re: <Form> and Arabic input problems

am 12.09.2007 08:16:44 von shror

On Sep 11, 12:47 pm, "Jukka K. Korpela" wrote:
> Scripsit shror:
>
> > when I was testing the Arabic side the wasn't sent correctly
> > when I entered Arabic text.
> > What I got was nothing except Unicode letters like this:
> > بامب
>
> As usual, revealing a URL would have helped to analyze the problem. But it
> looks pretty obvious that the problem is in the character encoding of the
> form data.
>
> For example, if a page is ASCII (or ISO-8859-1) encoded, then the form data
> encoding is the same by default, and if you enter an Arabic character in a
> form there, the effect is _undefined_ by HTML specifications. What browsers
> might do is to represent the characters that have no representation in the
> encoding by character references like ب (or by entity references, when
> applicable). This is really odd, since the form data is just character data,
> not HTML, but on the other hand, what else could a poor browser do?
>
> You could tweak your form handler into dealing with such references, but the
> real solution is to make the page UTF-8 encoded and to make the form handler
> deal with UTF-8 data.
>
> --
> Jukka K. Korpela ("Yucca")http://www.cs.tut.fi/~jkorpela/

sorry for not sending my URL I know its stupidity but here it is
http://www.mobidp.com/request2.htm


Thanks for your help Jukka

shror

Re: <Form> and Arabic input problems

am 12.09.2007 09:05:18 von jkorpela

Scripsit shror:

> On Sep 11, 12:47 pm, "Jukka K. Korpela" wrote:
...
>> For example, if a page is ASCII (or ISO-8859-1) encoded, then the
>> form data encoding is the same by default, and if you enter an
>> Arabic character in a form there, the effect is _undefined_ by HTML
>> specifications. What browsers might do is to represent the
>> characters that have no representation in the encoding by character
>> references like ب (or by entity references, when applicable).
>> This is really odd, since the form data is just character data, not
>> HTML, but on the other hand, what else could a poor browser do?
>>
>> You could tweak your form handler into dealing with such references,
>> but the real solution is to make the page UTF-8 encoded and to make
>> the form handler deal with UTF-8 data.
...
> sorry for not sending my URL I know its stupidity but here it is
> http://www.mobidp.com/request2.htm

The situation is basically what I wrote in the quoted text, just with
windows-1252 (Windows Latin 1) as the encoding. The encoding in unable to
represent any Arabic letters.

The encoding is specified in a tag, and HTTP headers are silent about
encoding, so it would be almost trivial to change the encoding to utf-8, by
modifying the tag and by replacing all non-ASCII characters (such as
the copyright sign) by entity or character references (such as ©).
ASCII data constitutes utf-8 data too.

But there's probably much more to be done on the server side, in the form
handler (confirmation.php). It would need to be modified so that it can read
utf-8 data and process it meaningfully.

The bad news is that PHP does not support utf-8 yet, except in fairly
limited ways.

Alternative tricks:

1) Let the page be windows-1252 encoded, and just get prepared to getting
stuff like ب. If you pass them into an HTML document, _without_
encoding the "&" in any way, they will appear as the characters they denote
by HTML rules. (This is actually the way people have built, probably by
accident, a poor man's Unicode support to one of the most popular web-based
discussion forums in Finland, suomi24.fi.) There is no guarantee that this
will work, but it happens to work in most situations.

2) Make the Arabic page windows-1256 (Windows Arabic) or iso-8859-6 (ISO
Latin/Arabic) encoded. Your form handler will then get Arabic letters in the
specified 8-bit encoding. This in principle restricts input to characters
representable in the chosen encoding, but in practice you usually get a
&#number; stuff for other characters.

P.S. Your form has a single-line input field for "Address", which is
probably for a postal address, since you also have "E-mail". Normally you
should reserve a textarea of six lines for input of a postal address, but in
this case, _if_ you include the postal address input (why?), then I think
you should have two textareas, one for the address in Latin letters and one
for the eventual address in the local writing system. According to the
International Postal Union, a letter sent e.g. to an Arabic-speaking country
from abroad should have the recipient address in two ways, in Latin letters
and in Arabic letters.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Re: <Form> and Arabic input problems

am 12.09.2007 09:41:47 von shror

On Sep 12, 10:05 am, "Jukka K. Korpela" wrote:
> Scripsit shror:
>
>
>
>
>
> > On Sep 11, 12:47 pm, "Jukka K. Korpela" wrote:
> ...
> >> For example, if a page is ASCII (or ISO-8859-1) encoded, then the
> >> form data encoding is the same by default, and if you enter an
> >> Arabic character in a form there, the effect is _undefined_ by HTML
> >> specifications. What browsers might do is to represent the
> >> characters that have no representation in the encoding by character
> >> references like ب (or by entity references, when applicable).
> >> This is really odd, since the form data is just character data, not
> >> HTML, but on the other hand, what else could a poor browser do?
>
> >> You could tweak your form handler into dealing with such references,
> >> but the real solution is to make the page UTF-8 encoded and to make
> >> the form handler deal with UTF-8 data.
> ...
> > sorry for not sending my URL I know its stupidity but here it is
> >http://www.mobidp.com/request2.htm
>
> The situation is basically what I wrote in the quoted text, just with
> windows-1252 (Windows Latin 1) as the encoding. The encoding in unable to
> represent any Arabic letters.
>
> The encoding is specified in a tag, and HTTP headers are silent about
> encoding, so it would be almost trivial to change the encoding to utf-8, by
> modifying the tag and by replacing all non-ASCII characters (such as
> the copyright sign) by entity or character references (such as ©).
> ASCII data constitutes utf-8 data too.
>
> But there's probably much more to be done on the server side, in the form
> handler (confirmation.php). It would need to be modified so that it can read
> utf-8 data and process it meaningfully.
>
> The bad news is that PHP does not support utf-8 yet, except in fairly
> limited ways.
>
> Alternative tricks:
>
> 1) Let the page be windows-1252 encoded, and just get prepared to getting
> stuff like ب. If you pass them into an HTML document, _without_
> encoding the "&" in any way, they will appear as the characters they denote
> by HTML rules. (This is actually the way people have built, probably by
> accident, a poor man's Unicode support to one of the most popular web-based
> discussion forums in Finland, suomi24.fi.) There is no guarantee that this
> will work, but it happens to work in most situations.
>
> 2) Make the Arabic page windows-1256 (Windows Arabic) or iso-8859-6 (ISO
> Latin/Arabic) encoded. Your form handler will then get Arabic letters in the
> specified 8-bit encoding. This in principle restricts input to characters
> representable in the chosen encoding, but in practice you usually get a
> &#number; stuff for other characters.
>
> P.S. Your form has a single-line input field for "Address", which is
> probably for a postal address, since you also have "E-mail". Normally you
> should reserve a textarea of six lines for input of a postal address, but in
> this case, _if_ you include the postal address input (why?), then I think
> you should have two textareas, one for the address in Latin letters and one
> for the eventual address in the local writing system. According to the
> International Postal Union, a letter sent e.g. to an Arabic-speaking country
> from abroad should have the recipient address in two ways, in Latin letters
> and in Arabic letters.
>
> --
> Jukka K. Korpela ("Yucca")http://www.cs.tut.fi/~jkorpela/

Really I dont know how to thank your Jukka,

I just changed the encoding to UTF-8 and tryed my and I
recieved arabic in my email
its now working fine because of your help, I didnt do any thing to the
php script and it worked

Thanks so much

Re: <Form> and Arabic input problems

am 12.09.2007 13:09:21 von jkorpela

Scripsit shror:

> I just changed the encoding to UTF-8 and tryed my and I
> recieved arabic in my email
> its now working fine because of your help, I didnt do any thing to the
> php script and it worked

Sounds too good to be true... and strange. But I guess the PHP script just
passes the incoming data "as is", as a sequence of octets, and your email
program manages to interpret it as utf-8.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/