Problem with utf8 encoding

Problem with utf8 encoding

am 03.12.2009 08:00:16 von jmirandac.85

--0015175df19629f2540479cd8b64
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hello everyone!

I'm working in a project using postgres, propel, and php.

My development environment is:
SO: Windows vista Business SP2
Postgres: Postgres v8.4
Propel: Propel generator/runtime v1.4
PHP: PHP v5.3

Currently I'm struggling with a problem caused by the encoding. Everytime I
try to insert a row into the table CURRENCY, which has ID, DESC, and SYMBOL
as its columns, I get the following error:
Unable to execute INSERT statement. [wrapped: SQLSTATE[22021]: Character no=
t
in repertoire: 7 ERROR: invalid byte sequence for encoding "UTF8": 0x80
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".]

I've created the database using this sentence:
*CREATE DATABASE sbs*
* WITH OWNER =3D sbsadmin*
* ENCODING =3D 'UTF8'*
* LC_COLLATE =3D 'Spanish_Peru.1252'*
* LC_CTYPE =3D 'Spanish_Peru.1252'*
* CONNECTION LIMIT =3D -1;*

When I run the following commands in psql I get this:
*sbs=3D#SHOW SERVER_ENCODING;*
*server_encoding*
*---------------*
*UTF8*
*(1 fila)*

*sbs=3D#SHOW CLIENT_ENCODING;*
*client_encoding*
*---------------*
*UTF8*
*(1 fila)*

The generated file by propel that manages the database connection sets the
charset to utf8 too. The insertion fails only everytime I try to insert the
currency symbols such as â‚, Â¥, £, â‚=B1, etc. I have =
no problem with symbols
such S/., $, p., Q, L or any other non-special symbol. The curious thing is
that when I run the sentence INSERT INTO CURRENCY VALUES(3,'EURO','â‚=
');
using the query tool of pgAdminIII, I got no error and the query was
sucessfully executed. However running the same sentence in psql, I got the
row inserted but with ? instead of ₠as the currency symbol.
I think the utf8 is the apropiate charset to represent these symbols but I
have no idea why I'm getting this problem. I've spent the whole day
struggling with this problem and I can't figure it out yet. I have tried
many things, none of them worked. So this is why I'm here asking for your
help, as my last resource. Any guidance that could put me in the right
direction to address this problem will be really appreciated!

Thanks in advance and I'm sorry If I went on a bit long...

--=20
Jorge Miranda Castañeda.
Simbiosys Software S.A.C.
www.simbiosysonline.com

--0015175df19629f2540479cd8b64
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hello everyone!

I'm working in a project =
using postgres, propel, and php.

My development en=
vironment is:
SO: Windows vista Business SP2
Postgres: =
Postgres v8.4

Propel: Propel generator/runtime v1.4
PHP: PHP v5.3
v>
Currently I'm struggling with a problem caused by the =
encoding. Everytime I try to insert a row into the table CURRENCY, which ha=
s ID, DESC, and SYMBOL as its columns, I get the following error:

Unable to execute INSERT statement. [wrapped: SQLSTATE[22021]: Charact=
er not in repertoire: 7 ERROR: invalid byte sequence for encoding "UTF=
8": 0x80 HINT: This error can also happen if the byte sequence does no=
t match the encoding expected by the server, which is controlled by "c=
lient_encoding".]


I've created the database using this sentence: >
CREATE DATABASE sbs
  WITH OWNER =3D sb=
sadmin
       ENCODING =3D 'UTF8&#=
39;
       LC_COLLATE =3D 'Spanish=
_Peru.1252'

       LC_CTYPE =3D 'Spanish_Peru.1252'=
       CONNECTION LIMIT =3D -1; div>

When I run the following commands in psql I get thi=
s:
sbs=3D#SHOW SERVER_ENCODING;

server_encoding
---------------
U=
TF8
(1 fila)

sbs=3D#SHOW C=
LIENT_ENCODING;
client_encoding
---------=
------

UTF8
(1 fila)

The gen=
erated file by propel that manages the database connection sets the charset=
to utf8 too. The insertion fails only everytime I try to insert the curren=
cy symbols such as â‚, Â¥, £, â‚=B1, etc. I have no pro=
blem with symbols such S/., $, p., Q, L or any other non-special symbol. Th=
e curious thing is that when I run the sentence INSERT INTO CURRENCY VALUES=
(3,'EURO','â‚'); using the query tool of pgAdminIII,=
I got no error and the query was sucessfully executed. However running the=
same sentence in psql, I got the row inserted but with ? instead of â‚=
as the currency symbol.

I think the utf8 is the apropiate charset to represent these symbols b=
ut I have no idea why I'm getting this problem. I've spent the whol=
e day struggling with this problem and I can't figure it out yet. I hav=
e tried many things, none of them worked. So this is why I'm here askin=
g for your help, as my last resource. Any guidance that could put me in the=
right direction to address this problem will be really appreciated!


Thanks in advance and I'm sorry If I went on a bit =
long...

--
ly: arial, sans-serif; font-size: 13px; border-collapse: collapse; color: r=
gb(136, 136, 136); ">Jorge Miranda Castañeda.


if; font-size: 13px; border-collapse: collapse; color: rgb(136, 136, 136); =
">Simbiosys Software S.A.C.
n" color=3D"#888888" face=3D"arial, sans-serif"> span" style=3D"border-collapse: collapse;"> nline.com">www.simbiosysonline.com





--0015175df19629f2540479cd8b64--

Re: Problem with utf8 encoding

am 03.12.2009 10:35:14 von Andrew McMillan

--=-2hEk/LQgK/pN+/xdpjSV
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Thu, 2009-12-03 at 02:00 -0500, Jorge Miranda Castañeda wrote:
> Hello everyone!
>=20
>=20
> I'm working in a project using postgres, propel, and php.
>=20
>=20
> My development environment is:
> SO: Windows vista Business SP2
> Postgres: Postgres v8.4
> Propel: Propel generator/runtime v1.4
> PHP: PHP v5.3
>=20
>=20
> Currently I'm struggling with a problem caused by the encoding.
> Everytime I try to insert a row into the table CURRENCY, which has ID,
> DESC, and SYMBOL as its columns, I get the following error:
> Unable to execute INSERT statement. [wrapped: SQLSTATE[22021]:
> Character not in repertoire: 7 ERROR: invalid byte sequence for
> encoding "UTF8": 0x80 HINT: This error can also happen if the byte
> sequence does not match the encoding expected by the server, which is
> controlled by "client_encoding".]
>=20
>=20
> I've created the database using this sentence:
> CREATE DATABASE sbs
> WITH OWNER =3D sbsadmin
> ENCODING =3D 'UTF8'
> LC_COLLATE =3D 'Spanish_Peru.1252'
> LC_CTYPE =3D 'Spanish_Peru.1252'
> CONNECTION LIMIT =3D -1;

Hola Jorge,

I suspect it's the LC_COLLATE and LC_CTYPE that you have there. I don't
*know* this, but they *look* like they are some weird sort of
collation/ctype based on the misguided Windows-1252 encoding. Sadly,
Windows provides data in this encoding into web forms where the accept
charset is supposedly only ISO-8859.

In Windows-1252 the Euro currency symbol is somewhere in the 0x80 - 0x9f
range - possibly it is 0x80, in fact.

I think you would be better to use a consistent locale like es_PE.UTF-8
though if your data is 1252 encoded then you might need to iconv it
first.

If you have data which is a mix of ISO-8859-1, Windows-1252 and UTF-8
then I can point you at a wee bit of PHP code I wrote which will look at
each character in a string and only iconv from 8859/1252 -> UTF-8 if it
is a high-bit byte which is not part of a valid UTF-8 character already.

The code is here:

http://repo.or.cz/w/awl.git/blob/HEAD:/inc/AWLUtilities.php

You need both of the last two functions - call the first one during
initialisation, and use the second one to clean the strings.

Cheers,
Andrew McMillan.


------------------------------------------------------------ ------------
andrew (AT) morphoss (DOT) com +64(272)DEBIAN
You will feel hungry again in another hour.
------------------------------------------------------------ ------------


--=-2hEk/LQgK/pN+/xdpjSV
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEABECAAYFAksXhk4ACgkQjJA0f48GgBLfyACeM0PFPFa4QRs74qOkAz6F mWer
G4YAn0CIGkuqFZv/08vf6Q3BWSFeukcS
=gEOs
-----END PGP SIGNATURE-----

--=-2hEk/LQgK/pN+/xdpjSV--

Re: Problem with utf8 encoding

am 08.12.2009 23:47:08 von Sylvain Racine

Hi Jorge,

I already encountered this problem while programming PHP web application
in French. The problem is with CLIENT_ENCODING variable of your
database. It must
fit with the encoding of your web page. Forget for the
moment to ask query inside pgAdminIII or a Windows shell using psql
(it's different ways to access your Postgresql database). You talked
about propel. Then I suppose you use Symphony framework.

Look at the encoding of the web page where you enter your data
in your browser. Usually, browser is set to automaticaly detect web page
encoding. I assume you have Internet Explorer. Check in the menu
View/Encoding... You should see an encoding name with a large dot in
front. It's the one used to show the web page. It's the same encoding
that Symphony (or PHP) uses to send your data to the database.

An other way to better know the encoding of the web page is to show
source code (View/Source on IE). If you can see near the first lines
something who looks like

9-1">

In this example, iso-8859-1 is your web page encoding. This means you
need to set Postgresql client encoding variable to "latin1". You can
find here a charset table converter for Postgresql:
http://www.postgresql.org/docs/8.1/static/multibyte.html

Once you found your client encoding, you need to set this parameter in
your database. Here, it's important to understand difference between
server and client encoding. In your example, your database is set to
UTF-8 (what you gave as encoding parameter while you created your
database). It's your SERVER encoding. It's mean that all your data are
store in your database in UTF-8 encoding, with collate and ctype in
"Spanish_Peru.1252".

> /CREATE DATABASE sbs/
> / WITH OWNER =3D sbsadmin/
> / ENCODING =3D 'UTF8'/
> / LC_COLLATE =3D 'Spanish_Peru.1252'/
> / LC_CTYPE =3D 'Spanish_Peru.1252'/

Postgresql comes with the iconv librairy and can convert your data in
almost any other charset depending of your CLIENT encoding. Now, I talk
about your PHP application (web page). You can set your client encoding
for a while with "SET NAMES 'latin1'" at the begin of each time you send
a set of queries to your database, i.e. each time you refresh or change
your web page.

But this way is incomfortable if you not control your SQL query input.=20
You can set it once in your
postgresql.conf file. Set your database 'client_encoding' variable to
the encoding of your favorite PHP application. Don't forget to restart
your Postgresql Windows service once you changed it.

As mentionned in the documentation, you can also change your client
encoding default setting by changing PGCLIENTENCODING environment
variable. This can be set by right clicking on "Your computer" icon on
your windows desktop, click on Properties, click on Advanced System
Parameters to your left menu, then click on Advanced System Parameters
tab and
click on Environment Variables button. You have choice to set your
variable just inside your windows account (user variables) or for all
the machine (System variables). Then click on New button and add this
variable with the wished value. It's important to note that this
methode must be done on the computer where is put your Postgresql
database server. If your database is not on your computer, use another
method to change your client encoding setting.

This should resolve your problem. About use of psql through a shell, you
must know that Windows shell encoding is 'cp850'. It's a old charset
used with the DOS system who keep Windows some decade behind... Up to
this moment, I didn't find a Postgresql encoding where you can enter and
send your particular symbol... Another Microsoft problem...

Best regards

Sylvain Racine


Jorge Miranda Castañeda wrote:
> Hello everyone!
>
> I'm working in a project using postgres, propel, and php.
>
> My development environment is:
> SO: Windows vista Business SP2
> Postgres: Postgres v8.4
> Propel: Propel generator/runtime v1.4
> PHP: PHP v5.3
>
> Currently I'm struggling with a problem caused by the encoding.=20
Everytime I try to insert a row into the table CURRENCY, which has ID,=20
DESC, and SYMBOL as its columns, I get the following error:
> Unable to execute INSERT statement. [wrapped: SQLSTATE[22021]:=20
Character not in repertoire: 7 ERROR: invalid byte sequence for encoding=20
"UTF8": 0x80 HINT: This error can also happen if the byte sequence does=20
not match the encoding expected by the server, which is controlled by=20
"client_encoding".]
>
> I've created the database using this sentence:
> /CREATE DATABASE sbs/
> / WITH OWNER =3D sbsadmin/
> / ENCODING =3D 'UTF8'/
> / LC_COLLATE =3D 'Spanish_Peru.1252'/
> / LC_CTYPE =3D 'Spanish_Peru.1252'/
> / CONNECTION LIMIT =3D -1;/
>
> When I run the following commands in psql I get this:
> /sbs=3D#SHOW SERVER_ENCODING;/
> /server_encoding/
> /---------------/
> /UTF8/
> /(1 fila)/
>
> /sbs=3D#SHOW CLIENT_ENCODING;/
> /client_encoding/
> /---------------/
> /UTF8/
> /(1 fila)/
>
> The generated file by propel that manages the database connection=20
sets the charset to utf8 too. The insertion fails only everytime I try=20
to insert the currency symbols such as â‚, Â¥, £, â‚=B1=
, etc. I have no=20
problem with symbols such S/., $, p., Q, L or any other non-special=20
symbol. The curious thing is that when I run the sentence INSERT INTO=20
CURRENCY VALUES(3,'EURO','â‚'); using the query tool of pgAdminIII=
, I got=20
no error and the query was sucessfully executed. However running the=20
same sentence in psql, I got the row inserted but with ? instead of â‚=
as=20
the currency symbol.
> I think the utf8 is the apropiate charset to represent these symbols=20
but I have no idea why I'm getting this problem. I've spent the whole=20
day struggling with this problem and I can't figure it out yet. I have=20
tried many things, none of them worked. So this is why I'm here asking=20
for your help, as my last resource. Any guidance that could put me in=20
the right direction to address this problem will be really appreciated!
>
> Thanks in advance and I'm sorry If I went on a bit long...
>
> --
> Jorge Miranda Castañeda.
> Simbiosys Software S.A.C.
> www.simbiosysonline.com
>



--=20
Sent via pgsql-php mailing list (pgsql-php@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-php