PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

am 21.07.2007 13:53:44 von aldnin

When I try to send this query (select 'lacarrière' as test;) to a UTF8 initialized pgsql-database (8.2.4) from PHP 5.2.3 I get this error:

ERROR: invalid byte sequence for encoding "UTF8": 0xe87265

I use pg_query for the query delivery.

Client Encoding is set to:
client_encoding
-----------------
UTF8
(1 row)

pg_client_encoding() also deliveres me "UTF8".

Wel, one thing is strange, I think that the database itself can mange the obove string without any problems - when I use the PostgreSQL Manager from EMS to insert that string into database I get the correct result, and no errors.

Using php or even psql on the command line I get the errors. I think the way how to send UTF8 stuff to the database has to be different as just pasting some text to psql or sending it through pg_query like I did.

Does somebody has any idea how to fix this?

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

am 21.07.2007 15:02:46 von John DeSoi

On Jul 21, 2007, at 7:53 AM, aldnin wrote:

> When I try to send this query (select 'lacarri=E8re' as test;) to a =20=

> UTF8 initialized pgsql-database (8.2.4) from PHP 5.2.3 I get this =20
> error:
>
> ERROR: invalid byte sequence for encoding "UTF8": 0xe87265
>
> I use pg_query for the query delivery.
>
> Client Encoding is set to:
> client_encoding
> -----------------
> UTF8
> (1 row)
>
> pg_client_encoding() also deliveres me "UTF8".


My guess is that your PHP is not setup to handle UTF8, and is really =20
sending something else. UTF8 is the default client encoding because =20
that is the encoding of the database. It does not mean that PHP has =20
set the right one. Before running your test, try executing this: "SET =20=

client_encoding TO LATIN1;" and see if that fixes it.




John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: PHP + PostgreSQL: invalid byte sequence for encoding"UTF8"

am 21.07.2007 16:04:43 von aldnin

> My guess is that your PHP is not setup to handle UTF8, and is really
> sending something else. UTF8 is the default client encoding because that
> is the encoding of the database. It does not mean that PHP has set the
> right one. Before running your test, try executing this: "SET
> client_encoding TO LATIN1;" and see if that fixes it.

I already did this and all encoding settings are right, but I figured out something more.

1) Using pg_query for fetching UTF8 data from database is working properly. Of course when I try to output it direclty then I get something like that as output "lacarrière" - but when I use utf8_decode() on the UTF8-bytes I get it the right way "lacarrière".

2) I found another PHP application which is able to insert UTF8 data properly, phpPgAdmin, but it seems that it uses the ADODB-Layers for executing SQL-statements.
Well, the fact that phpPgAdmin runs on the same machine handling properly UTF8 data means that my PHP is well configurated handling UTF8.

3) When I add to my DB-Class utf8_encode() on the querystring I send to the database, it works properly, the insert is fine, so that's a temporary solution for my first problem.

4) When I get data from database I usually would have to do a utf8_decode on EVERY string which is fetched from database. So my solution is now, to handle all strings comming UTF8 from database as they are comming with UTF8-bytes, and really only then when I need to decode them I decode them for further use.

Problem:
--------
Just declaring the string 'lacarrière' 10 millions times takes 5 seconds, when doing a utf8_encode() on it takes 13 seconds. So it needs 2-3 times more ressources when using always a utf8_encode() on a string, also when the string does not include special characters. And this ressources are also wasted when the strings don't need to be utf8-encoded.

Workaround:
-----------
To don't waste ressources you have to do a utf8_encode only when you "guess" that there might be special characters - have fun with that, but it's the only way I see to work properly with that special characters in combination with postgres.

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: PHP + PostgreSQL: invalid byte sequence for encoding"UTF8"

am 21.07.2007 16:05:00 von aldnin

> My guess is that your PHP is not setup to handle UTF8, and is really
> sending something else. UTF8 is the default client encoding because that
> is the encoding of the database. It does not mean that PHP has set the
> right one. Before running your test, try executing this: "SET
> client_encoding TO LATIN1;" and see if that fixes it.

I already did this and all encoding settings are right, but I figured out something more.

1) Using pg_query for fetching UTF8 data from database is working properly. Of course when I try to output it direclty then I get something like that as output "lacarrière" - but when I use utf8_decode() on the UTF8-bytes I get it the right way "lacarrière".

2) I found another PHP application which is able to insert UTF8 data properly, phpPgAdmin, but it seems that it uses the ADODB-Layers for executing SQL-statements.
Well, the fact that phpPgAdmin runs on the same machine handling properly UTF8 data means that my PHP is well configurated handling UTF8.

3) When I add to my DB-Class utf8_encode() on the querystring I send to the database, it works properly, the insert is fine, so that's a temporary solution for my first problem.

4) When I get data from database I usually would have to do a utf8_decode on EVERY string which is fetched from database. So my solution is now, to handle all strings comming UTF8 from database as they are comming with UTF8-bytes, and really only then when I need to decode them I decode them for further use.

Problem:
--------
Just declaring the string 'lacarrière' 10 millions times takes 5 seconds, when doing a utf8_encode() on it takes 13 seconds. So it needs 2-3 times more ressources when using always a utf8_encode() on a string, also when the string does not include special characters. And this ressources are also wasted when the strings don't need to be utf8-encoded.

Workaround:
-----------
To don't waste ressources you have to do a utf8_encode only when you "guess" that there might be special characters - have fun with that, but it's the only way I see to work properly with that special characters in combination with postgres.

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: PHP + PostgreSQL: invalid byte sequence for encoding"UTF8"

am 21.07.2007 16:05:19 von aldnin

> My guess is that your PHP is not setup to handle UTF8, and is really
> sending something else. UTF8 is the default client encoding because that
> is the encoding of the database. It does not mean that PHP has set the
> right one. Before running your test, try executing this: "SET
> client_encoding TO LATIN1;" and see if that fixes it.

I already did this and all encoding settings are right, but I figured out something more.

1) Using pg_query for fetching UTF8 data from database is working properly. Of course when I try to output it direclty then I get something like that as output "lacarrière" - but when I use utf8_decode() on the UTF8-bytes I get it the right way "lacarrière".

2) I found another PHP application which is able to insert UTF8 data properly, phpPgAdmin, but it seems that it uses the ADODB-Layers for executing SQL-statements.
Well, the fact that phpPgAdmin runs on the same machine handling properly UTF8 data means that my PHP is well configurated handling UTF8.

3) When I add to my DB-Class utf8_encode() on the querystring I send to the database, it works properly, the insert is fine, so that's a temporary solution for my first problem.

4) When I get data from database I usually would have to do a utf8_decode on EVERY string which is fetched from database. So my solution is now, to handle all strings comming UTF8 from database as they are comming with UTF8-bytes, and really only then when I need to decode them I decode them for further use.

Problem:
--------
Just declaring the string 'lacarrière' 10 millions times takes 5 seconds, when doing a utf8_encode() on it takes 13 seconds. So it needs 2-3 times more ressources when using always a utf8_encode() on a string, also when the string does not include special characters. And this ressources are also wasted when the strings don't need to be utf8-encoded.

Workaround:
-----------
To don't waste ressources you have to do a utf8_encode only when you "guess" that there might be special characters - have fun with that, but it's the only way I see to work properly with that special characters in combination with postgres.

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: PHP + PostgreSQL: invalid byte sequence for encoding"UTF8"

am 21.07.2007 16:05:34 von aldnin

> My guess is that your PHP is not setup to handle UTF8, and is really
> sending something else. UTF8 is the default client encoding because that
> is the encoding of the database. It does not mean that PHP has set the
> right one. Before running your test, try executing this: "SET
> client_encoding TO LATIN1;" and see if that fixes it.

I already did this and all encoding settings are right, but I figured out something more.

1) Using pg_query for fetching UTF8 data from database is working properly. Of course when I try to output it direclty then I get something like that as output "lacarrière" - but when I use utf8_decode() on the UTF8-bytes I get it the right way "lacarrière".

2) I found another PHP application which is able to insert UTF8 data properly, phpPgAdmin, but it seems that it uses the ADODB-Layers for executing SQL-statements.
Well, the fact that phpPgAdmin runs on the same machine handling properly UTF8 data means that my PHP is well configurated handling UTF8.

3) When I add to my DB-Class utf8_encode() on the querystring I send to the database, it works properly, the insert is fine, so that's a temporary solution for my first problem.

4) When I get data from database I usually would have to do a utf8_decode on EVERY string which is fetched from database. So my solution is now, to handle all strings comming UTF8 from database as they are comming with UTF8-bytes, and really only then when I need to decode them I decode them for further use.

Problem:
--------
Just declaring the string 'lacarrière' 10 millions times takes 5 seconds, when doing a utf8_encode() on it takes 13 seconds. So it needs 2-3 times more ressources when using always a utf8_encode() on a string, also when the string does not include special characters. And this ressources are also wasted when the strings don't need to be utf8-encoded.

Workaround:
-----------
To don't waste ressources you have to do a utf8_encode only when you "guess" that there might be special characters - have fun with that, but it's the only way I see to work properly with that special characters in combination with postgres.

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: PHP + PostgreSQL: invalid byte sequence for encoding"UTF8"

am 21.07.2007 16:05:50 von aldnin

> My guess is that your PHP is not setup to handle UTF8, and is really
> sending something else. UTF8 is the default client encoding because that
> is the encoding of the database. It does not mean that PHP has set the
> right one. Before running your test, try executing this: "SET
> client_encoding TO LATIN1;" and see if that fixes it.

I already did this and all encoding settings are right, but I figured out something more.

1) Using pg_query for fetching UTF8 data from database is working properly. Of course when I try to output it direclty then I get something like that as output "lacarrière" - but when I use utf8_decode() on the UTF8-bytes I get it the right way "lacarrière".

2) I found another PHP application which is able to insert UTF8 data properly, phpPgAdmin, but it seems that it uses the ADODB-Layers for executing SQL-statements.
Well, the fact that phpPgAdmin runs on the same machine handling properly UTF8 data means that my PHP is well configurated handling UTF8.

3) When I add to my DB-Class utf8_encode() on the querystring I send to the database, it works properly, the insert is fine, so that's a temporary solution for my first problem.

4) When I get data from database I usually would have to do a utf8_decode on EVERY string which is fetched from database. So my solution is now, to handle all strings comming UTF8 from database as they are comming with UTF8-bytes, and really only then when I need to decode them I decode them for further use.

Problem:
--------
Just declaring the string 'lacarrière' 10 millions times takes 5 seconds, when doing a utf8_encode() on it takes 13 seconds. So it needs 2-3 times more ressources when using always a utf8_encode() on a string, also when the string does not include special characters. And this ressources are also wasted when the strings don't need to be utf8-encoded.

Workaround:
-----------
To don't waste ressources you have to do a utf8_encode only when you "guess" that there might be special characters - have fun with that, but it's the only way I see to work properly with that special characters in combination with postgres.








--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

am 21.07.2007 17:04:25 von niel

Hi

Please configure your email client so we don't receive 5 copies of your
mail.

> I already did this and all encoding settings are right, but I figured out=
something more.
>=20
> 1) Using pg_query for fetching UTF8 data from database is working properl=
y. Of course when I try to output it direclty then I get something like tha=
t as output "lacarrière" - but when I use utf8_decode() on the UTF8-by=
tes I get it the right way "lacarri=E8re".

This indicates that PHP not using UTF-8. That output is typical of
UTF-8 output as Latin characters.

> 2) I found another PHP application which is able to insert UTF8 data prop=
erly, phpPgAdmin, but it seems that it uses the ADODB-Layers for executing =
SQL-statements.
> Well, the fact that phpPgAdmin runs on the same machine handling properly=
UTF8 data means that my PHP is well configurated handling UTF8.

Not true, it only indicates that phpPgAdmin is is configured to handle
UTF-8 correctly.

> 3) When I add to my DB-Class utf8_encode() on the querystring I send to t=
he database, it works properly, the insert is fine, so that's a temporary s=
olution for my first problem.

> 4) When I get data from database I usually would have to do a utf8_decode=
on EVERY string which is fetched from database. So my solution is now, to =
handle all strings comming UTF8 from database as they are comming with UTF8=
-bytes, and really only then when I need to decode them I decode them for =
further use.

Once again indicating your data needs to be converted from some other
character set.


I had similar problems getting PHP to work with UTF-8 and MySQL. Many
of PHP's function are not multibyte aware and assume a Latin character set.

What, if any, output buffering are you using? What is your
default_charset set to?

--
Niel Archer

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: PHP + PostgreSQL: invalid byte sequence for encoding"UTF8"

am 21.07.2007 18:24:46 von aldnin

> Please configure your email client so we don't receive 5 copies of your
> mail.

Just fixed that issue, don't be afraid of that in the future.

> This indicates that PHP not using UTF-8. That output is typical of
> UTF-8 output as Latin characters.

Well, maybe the output is not correct - when running the php script on console (cli) it outputs me the content in the wrong charset, but that's not the problem, doing a utf8_decode() lets me output it in the right charset.

> Not true, it only indicates that phpPgAdmin is is configured to handle
> UTF-8 correctly.

Well, I searched all the source code of phpPgAdmin for charsets and I found:

"echo "\tcodemap[$dbEncoding]}\" />\r\n";"

So this means, phpPgAdmin sets the output charset to the charset which is used by the databased connected to - but that's still not the problem, because I also know how to fix charset output in browsers.

> Once again indicating your data needs to be converted from some other
> character set.

It's already converted to be compatible to utf8 when fetching it from some other ressources.

> I had similar problems getting PHP to work with UTF-8 and MySQL. Many
> of PHP's function are not multibyte aware and assume a Latin character set.
> What, if any, output buffering are you using? What is your
> default_charset set to?

Well, I've set the default_charset to UTF8, it was set before to "" (empty) - but the output on console (cli) and the problem is still the same also after changing this to UTF8, so: this is not the problem, and I don't need proper output on console without utf8_decode() - if I want proper output there I just do a decode, like I do when I want it to get outputed in the browser properly.

Maybe a cleaner explanation of the problem:

I fetch something from database, which looks like "lacarrière" when I output it in PHP - well don't let us get confused from PHPs output. Then I fetch something from another ressource looking like "lacarrière" - when I compare both strings in PHP it tells me that they are "not equal".

So I HAVE TO do either an utf8_encode() on the string from the other ressource OR a utf8_decode() on the string from the database to compare them as "equal".

....and THIS means a lot of more code in my classes.

Hint: The other ressource is a socket connection (API) to another server.

The problem is quite simple I think, everything comming from the database is UTF8-byte encoded and needs to get UTF8-Decoded before you can work with it properly.

The default_charset seems to work only on output buffer, so the solution for that problem could only be a mechanism to tell PHP handling all strings UTF8 byte encoded, which should mean a lot of more ressources to be taken for this process - I understand that this is not a solution.

So the only solutions could be:

a) Decode and encode properly utf8 stuff and to take care if the content is utf8-byte encoded so it needs to be decoded before using it properly with other strings

b) A mechanism to tell the pg-functions in PHP to decode all data which is UTF8-Encoded. The ADODB-Layers seems to do that properly, but the pg-functions don't do that as I can see.

You can use this to reproduce it:

1. Create a table in postgres, on a UTF8 initialized database, insert something like "lacarrière" into it. Check if it's inserted correctly..

2. Check with psql the normal output, you should get either "lacarrière" or "lacarrière" so you can be sure it's inserted correctly.

3. Make a script which fetchs the string from the database to $dbString.

4. Set a string $phpString = "lacarrière";

5. Compare both strings with "==" - you'll get "false"

Another hint:

Try to send "select 'lacarrière' as test;' with pg_query to any postgres database, you'll get an error, if not... well, then I'm wrong and I've set up PHP wrong to handle UTF8-stuff.

If you send "select '".utf8_encode(lacarrière)."' as test;" to your database this should work.

Also the above meant $phpString is NOT EQUAL to the result you would get from "select '".utf8_encode(lacarrière)."' as test;", you would need to compare it to utf8_decode($dbString) to be EQUAL.

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

am 21.07.2007 19:41:38 von niel

Hi

>=20
> Well, I searched all the source code of phpPgAdmin for charsets and I fou=
nd:
>=20
> "echo "\t et=3D{$data->codemap[$dbEncoding]}\" />\r\n";"
>=20
> So this means, phpPgAdmin sets the output charset to the charset which
> is used by the databased connected to - but that's still not the
> problem, because I also know how to fix charset output in browsers.

Not exactly. As far as I can see, it only changes the value of the Content-=
type: header
in the HTML, it doesn't change the actual encoding output.

> > Once again indicating your data needs to be converted from some other
> > character set.
>=20
> It's already converted to be compatible to utf8 when fetching it from som=
e other ressources.

I didn't mean the content of the database,. I was referring to the data
that PHP is actually processing, which appears to have been converted
within PHP

> Well, I've set the default_charset to UTF8, it was set before to "" (empt=
y) -
> but the output on console (cli) and the problem is still the same also
> after changing this to UTF8, so: this is not the problem,=20

It should be "UTF-8", this is the official designation from unicode,
although case will likely be ignored. As far as I know "UTF8" is not a
recognised encoding
This however, is only the value that will be output as the
Content-Type charset, as noted above.


> I fetch something from database, which looks like "lacarrière" when =
I output it in
> PHP - well don't let us get confused from PHPs output. Then I fetch
> something from another ressource looking like "lacarri=E8re" - when I
> compare both strings in PHP it tells me that they are "not equal".

As I said before. Many of PHP's functions (the string one's for
comparing for example) are NOT multi-byte aware, so are NOT guaranteed
to work correctly.

You did not answer the most important question. What, if any, output
buffering are you using? Are you using the mbstring module? If so, is
it set to overload the old string functions?

--
Niel Archer

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: PHP + PostgreSQL: invalid byte sequence for encoding"UTF8"

am 21.07.2007 21:42:54 von aldnin

> You did not answer the most important question. What, if any, output
> buffering are you using? Are you using the mbstring module? If so, is
> it set to overload the old string functions?

Well, i checked for Multi Byte String functions, and it was enabled and configured before compiling with "=all".

After performing the query with pg_query, fetching the result with pg_fetch_all and putting the utf8 string into $dbString I tried to detect the encoding with:

mb_detect_encoding($dbSring)

I tells me:
ASCII

The content of $dbString is:
lacarrière

I overloaded the mbstring variables with:
mbstring.func_overload = 6
Setting it to "7" won't let me even echo something else.

mbstring.encoding_translation = On
mbstring.internal_encoding = UTF8

That's it, rest is default.

Is it possible for mbstring to overload the pg-functions I need?

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

am 21.07.2007 22:23:59 von niel

Hi

You still haven't answered whether you're using any output handler, and
if so which one. I use

output_handler=mb_output_handler

> I overloaded the mbstring variables with:
> mbstring.func_overload = 6
> Setting it to "7" won't let me even echo something else.

Very strange, the only additional function overloaded is mail() and that
shouldn't stop you using echo.

As well as setting the internal encoding and enabling it with
mbstring.encoding_translation = On
mbstring.internal_encoding = UTF-8

I would also use:
mbstring.language = English
; or German in your case
mbstring.detect_order = UTF-8,eucjp-win,sjis-win
mbstring.http_input = UTF-8,SJIS,EUC-JP
mbstring.http_output = UTF-8

> Is it possible for mbstring to overload the pg-functions I need?
No, and it shouldn't be needed. Those functions should be UTF-8 enabled
in order to communicate with the database and supply the correct data

You're still referring to 'UTF8' which as I pointed out isn't the
official name of the encoding system. I have no idea if PHP will
recognise it, but to be safe I suggest you use the official 'UTF-8'
(hyphen between letters and number) in case it's causing problems.
The other thing to be wary of, is output to the console. Some OSes do
not support unicode in the console. So unless you're certain yours does,
I wouldn't use it as a test.

--
Niel Archer

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

am 22.07.2007 00:26:07 von Bruno Lustosa

T24gNy8yMS8wNywgYWxkbmluIDxhbGRuaW5AeWFob28uZGU+IHdyb3RlOgo+ IFdoZW4gSSB0cnkg
dG8gc2VuZCB0aGlzIHF1ZXJ5IChzZWxlY3QgJ2xhY2FycmnDqHJlJyBhcyB0 ZXN0OykgdG8gYSBV
VEY4IGluaXRpYWxpemVkIHBnc3FsLWRhdGFiYXNlICg4LjIuNCkgZnJvbSBQ SFAgNS4yLjMgSSBn
ZXQgdGhpcyBlcnJvcjoKPgo+IEVSUk9SOiAgaW52YWxpZCBieXRlIHNlcXVl bmNlIGZvciBlbmNv
ZGluZyAiVVRGOCI6IDB4ZTg3MjY1CgpTaG9ydCBhbnN3ZXI6IHN0YXJ0IHVz aW5nIHV0Zi04IGZv
ciBqdXN0IGV2ZXJ5dGhpbmcsIGFuZCB5b3VyIHByb2JsZW1zCndpbGwgYmUg Z29uZS4KCkxvbmcg
ZXhwbGFuYXRpb246ClRoaXMgaXMgdXN1YWxseSB0aGUgY2FzZSB3aGVuIHlv dSBnZXQgZGF0YSBm
cm9tIGEgZm9ybSBhbmQgcHV0IGl0IGluCnRoZSBkYXRhYmFzZSwgYW5kIHRo ZSB0d28gYXJlbid0
IHVzaW5nIHRoZSBzYW1lIGVuY29kaW5nLgpJIGd1ZXNzIHlvdXIgcGcgY29u bmVjdGlvbiBpcyB1
c2luZyB1bmljb2RlIChzbyB0aGUgZGIgZXhwZWN0cyB1bmljb2RlCmlucHV0 KSwgYW5kIHlvdXIg
aHRtbCBpcyBzZXQgdG8gc29tZXRoaW5nIGVsc2UuIFRvIGZpeCB0aGlzLCB5 b3UgaGF2ZQp0d28g
Y2hvaWNlczoKCjEtUnVuIHV0ZjhfZW5jb2RlKCkgb24gdGhlIGlucHV0IGZy b20geW91ciBmb3Jt
czsgb3IKMi1TZXQgYWxsIHlvdXIgaHRtbCBwYWdlcyB0byB1c2UgdXRmLTgg ZW5jb2RpbmcuCgpJ
TUhPLCBvcHRpb24gMiBpcyB0aGUgd2F5IHRvIGdvLiBJJ3ZlIGJlZW4gdXNp bmcgdXRmLTggZm9y
IGV2ZXJ5dGhpbmcKZm9yIHF1aXRlIHNvbWUgdGltZSwgYW5kIGhhcyBzb2x2 ZWQgYWxsIG15IHBy
b2JsZW1zIGRlYWxpbmcgd2l0aAphY2NlbnRzLCBhbmQgc28gb24uCllvdSB3 aWxsIG5lZWQ6Ci0g
QWxsIHlvdXIgSFRNTCBmaWxlcyBlbmNvZGVkIHRvIHV0Zi04IChxdWl0ZSBl YXN5IHdpdGggaWNv
bnYsIGlmIHlvdQphcmUgdXNpbmcgTGludXgpOwotIEFkZCBhICJDb250ZW50 LXR5cGU6IHRleHQv
aHRtbDsgY2hhcnNldD11dGYtOCIgdG8gYWxsIHlvdXIgcGFnZXMuClRoaXMg aXMgZWFzaWx5IGRv
bmUgdXNpbmcgUEhQJ3MgaGVhZGVyKCkgZnVuY3Rpb24gaW4gYSBmaWxlIGlu Y2x1ZGVkCmJ5IGFs
bCB5b3VyIHNjcmlwdHMuCgpUaGlzIHdheSwgdGhlIHBhZ2VzIHdpbGwgYmUg dW5pY29kZSwgYW55
IGRhdGEgZW50ZXJlZCB3aWxsIGJlIHBvc3RlZAphcyB1bmljb2RlLCBhbmQg eW91IHdpbGwgaGF2
ZSBubyBwcm9ibGVtcyBzZW5kaW5nIHRoZW0gdG8gYSBkYXRhYmFzZQp0aGF0 IHVzZXMgdW5pY29k
ZS4KRm9yZ2V0IHRoZSA8bWV0YT4gdGFnIHRoYXQgc2V0cyB0aGUgZW5jb2Rp bmcuIEl0J3Mgb25s
eSB1c2VkIGluIGNhc2UKdGhlIHNlcnZlciBkb2Vzbid0IHNlbmQgYSBDb250 ZW50LXR5cGUgaGVh
ZGVyLCB3aGljaCBpc24ndCB0aGUgY2FzZQpub3JtYWxseS4gQnkgZGVmYXVs dCwgSSB0aGluayBh
dCBsZWFzdCBhcGFjaGUgc2VuZHMgdGhlIGNvbnRlbnQtdHlwZQphcyBpc284 ODU5LTEuCgotLSAK
QnJ1bm8gTHVzdG9zYSA8YnJ1bm9AbHVzdG9zYS5uZXQ+ClpDRSAtIFplbmQg Q2VydGlmaWVkIEVu
Z2luZWVyIC0gUEhQIQpodHRwOi8vd3d3Lmx1c3Rvc2EubmV0Lwo=

Re: PHP + PostgreSQL: invalid byte sequence for encoding"UTF8"

am 22.07.2007 02:28:34 von aldnin

thx a lot - what you're writing is really necessary to handle this problems in the future.

The reason why I was looking for a faster solution is when you have to handle huge data which is utf8, and sometimes not utf8... etc.... you understand what I mean? ;-)


Bruno Lustosa wrote:
> On 7/21/07, aldnin wrote:
>> When I try to send this query (select 'lacarrière' as test;) to a UTF8
>> initialized pgsql-database (8.2.4) from PHP 5.2.3 I get this error:
>>
>> ERROR: invalid byte sequence for encoding "UTF8": 0xe87265
>
> Short answer: start using utf-8 for just everything, and your problems
> will be gone.
>
> Long explanation:
> This is usually the case when you get data from a form and put it in
> the database, and the two aren't using the same encoding.
> I guess your pg connection is using unicode (so the db expects unicode
> input), and your html is set to something else. To fix this, you have
> two choices:
>
> 1-Run utf8_encode() on the input from your forms; or
> 2-Set all your html pages to use utf-8 encoding.
>
> IMHO, option 2 is the way to go. I've been using utf-8 for everything
> for quite some time, and has solved all my problems dealing with
> accents, and so on.
> You will need:
> - All your HTML files encoded to utf-8 (quite easy with iconv, if you
> are using Linux);
> - Add a "Content-type: text/html; charset=utf-8" to all your pages.
> This is easily done using PHP's header() function in a file included
> by all your scripts.
>
> This way, the pages will be unicode, any data entered will be posted
> as unicode, and you will have no problems sending them to a database
> that uses unicode.
> Forget the tag that sets the encoding. It's only used in case
> the server doesn't send a Content-type header, which isn't the case
> normally. By default, I think at least apache sends the content-type
> as iso8859-1.
>

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: PHP + PostgreSQL: invalid byte sequence for encoding"UTF8"

am 22.07.2007 02:51:34 von aldnin

> output_handler=mb_output_handler

This helped me to fix any output to the browser properly, so I don't need to do any utf8_decode() any more, thanks.

> Setting it to "7" won't let me even echo something else.

Right, it's strange, but true... :-(

> mbstring.detect_order = UTF-8,eucjp-win,sjis-win

That solved the problem that mb_detect_encoding() was resulting with ASCII, now its saying "UTF-8", BUT only when running the script on console, with browser it tells me still ASCII, well not important.

But still the comparison test is "not equal", so the ut8_decode() is still needed when data comes from database, it's the same result in browser and on console (even it shows UTF-8 as detected).

> The other thing to be wary of, is output to the console. Some OSes do
> not support unicode in the console. So unless you're certain yours does,
> I wouldn't use it as a test.

I know, that's why I use the comparison test ;-)

Niel wrote:
> Hi
>
> You still haven't answered whether you're using any output handler, and
> if so which one. I use
>
> output_handler=mb_output_handler
>
>> I overloaded the mbstring variables with:
>> mbstring.func_overload = 6
>> Setting it to "7" won't let me even echo something else.
>
> Very strange, the only additional function overloaded is mail() and that
> shouldn't stop you using echo.
>
> As well as setting the internal encoding and enabling it with
> mbstring.encoding_translation = On
> mbstring.internal_encoding = UTF-8
>
> I would also use:
> mbstring.language = English
> ; or German in your case
> mbstring.detect_order = UTF-8,eucjp-win,sjis-win
> mbstring.http_input = UTF-8,SJIS,EUC-JP
> mbstring.http_output = UTF-8
>
>> Is it possible for mbstring to overload the pg-functions I need?
> No, and it shouldn't be needed. Those functions should be UTF-8 enabled
> in order to communicate with the database and supply the correct data
>
> You're still referring to 'UTF8' which as I pointed out isn't the
> official name of the encoding system. I have no idea if PHP will
> recognise it, but to be safe I suggest you use the official 'UTF-8'
> (hyphen between letters and number) in case it's causing problems.
> The other thing to be wary of, is output to the console. Some OSes do
> not support unicode in the console. So unless you're certain yours does,
> I wouldn't use it as a test.
>
> --
> Niel Archer

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: PHP + PostgreSQL: invalid byte sequence for encoding

am 23.07.2007 23:00:56 von Neil Smth

At 01:18 23/07/2007, you wrote:
>Message-ID:
>From: aldnin
>Subject: Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding
>
> > This indicates that PHP not using UTF-8. That output is typical of
> > UTF-8 output as Latin characters.
>
> > I had similar problems getting PHP to work with UTF-8 and MySQL. Many
> > of PHP's function are not multibyte aware and assume a Latin character=
set.
> > What, if any, output buffering are you using? What is your
> > default_charset set to?
>
>Well, I've set the default_charset to UTF8, it=20
>was set before to "" (empty) - but the output on=20
>console (cli) and the problem is still the same=20
>also after changing this to UTF8, so: this is=20
>not the problem, and I don't need proper output=20
>on console without utf8_decode() - if I want=20
>proper output there I just do a decode, like I=20
>do when I want it to get outputed in the browser properly.
>
>Maybe a cleaner explanation of the problem:
>
>I fetch something from database, which looks=20
>like "lacarrière" when I output it in PHP -=20
>well don't let us get confused from PHPs output.=20
>Then I fetch something from another ressource=20
>looking like "lacarri=E8re" - when I compare both=20
>strings in PHP it tells me that they are "not equal".
>
>The default_charset seems to work only on output=20
>buffer, so the solution for that problem could=20
>only be a mechanism to tell PHP handling all=20
>strings UTF8 byte encoded, which should mean a=20
>lot of more ressources to be taken for this=20
>process - I understand that this is not a solution.
>
>So the only solutions could be:
>
>a) Decode and encode properly utf8 stuff and to=20
>take care if the content is utf8-byte encoded so=20
>it needs to be decoded before using it properly with other strings
>
>b) A mechanism to tell the pg-functions in PHP=20
>to decode all data which is UTF8-Encoded. The=20
>ADODB-Layers seems to do that properly, but the=20
>pg-functions don't do that as I can see.
>
>Try to send "select 'lacarri=E8re' as test;' with=20
>pg_query to any postgres database, you'll get an=20
>error, if not... well, then I'm wrong and I've=20
>set up PHP wrong to handle UTF8-stuff.



There are several areas when encoding issues can=20
arise between PHP (client) and DB server. One=20
which you've not considered is the client=20
connection, that is the encoding used when transferring resultsets to PHP.

I met this a few weeks ago in MySQL while=20
stashing XML recordsets with non ISO-8859-1 content.

The solution is pretty simple once you hit it,=20
and works in both MySQL and PGSQL because it's standard SQL-92 :

$query=3D"SET NAMES 'UTF-8'";

Issue that at the time you first make your=20
connection in your DB abstraction library - you=20
can send the query immediately after establishing=20
the connection, an all subsequent queries using=20
that connection will have the charset for transfer correctly stated.

@see :
'21.2.3. Automatic Character Set Conversion Between Server and Client'
http://www.postgresql.org/docs/8.1/static/multibyte.html


HTH
Cheers - Neil

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php