Bad encoded chars in being inserted into database

Bad encoded chars in being inserted into database

am 22.03.2010 09:48:31 von imartinez

This is a multi-part message in MIME format.

------=_NextPart_000_0510_01CAC9A4.DF744900
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi everybody.



I have a doubt about how postgres deal with bad encoded characters into
database.

We have several gforge application. They are using postgres as database.

If we export a database and import again, we have to deal with several ba=
d
encoded chars. These bad chars always come from copy & paste emails from
Lotus Notes mail client. OK, I understand the Notes client people is usin=
g
is an ancient application and does not deal very well with some Unicode
chars=85

What I cannot understand is why postgres accept these bad enconded
characters into database, exports them without problema but does not allo=
w
them when importing again.

This has been happening since postgers 7.3. However, until 7.4.XX (y don=92=
t
remember what minor version) you could import database without ERRORs.
However, since 7.4.XX it=92s impossible and it=92s imperative to clean ba=
d
characters (using iconv, for example) prior importing tables.

I agree with this postgres policy, but what I don=92t is that you can INS=
ERT
them via application. That is, no bad characters should be inserted into
database. The check should be made for both import and insert procedures
so no bad chars would appear into database.



Any suggestion / appreciation about this?



We are using php4/5-pgsql module from several distros (CentOS 4/5, debian
4/5 and Ubuntu 8.04LTS) so I discard a pgsql problem and anyway database
should deal with this=85




------=_NextPart_000_0510_01CAC9A4.DF744900
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:x=3D"urn:schemas-microsoft-com:office:excel" =
xmlns:dt=3D"uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" =
xmlns=3D"http://www.w3.org/TR/REC-html40">


charset=3Diso-8859-1">









Hi everybody.



 



I have a doubt about how postgres deal with bad =
encoded
characters into database.



We have several gforge application. They are using =
postgres
as database.



If we export a database and import again, we have =
to deal
with several bad encoded chars. These bad chars always come from copy =
&
paste emails from Lotus Notes mail client. OK, I understand the Notes =
client
people is using is an ancient application and does not deal very well =
with some
Unicode chars…



What I cannot understand is why postgres accept =
these bad enconded
characters into database, exports them without problema but does not =
allow them
when importing again.



This has been happening since postgers 7.3. =
However, until
7.4.XX (y don’t remember what minor version) you could import =
database
without ERRORs. However, since 7.4.XX it’s impossible and =
it’s
imperative to clean bad characters (using iconv, for example) prior =
importing
tables.



I agree with this postgres policy, but what I =
don’t is
that you can INSERT them via application. That is, no bad characters =
should be inserted
into database. The check should be made for both import and insert =
procedures
so no bad chars would appear into database.



 



Any suggestion / appreciation about =
this?



 



We are using php4/5-pgsql module from several =
distros (CentOS
4/5, debian 4/5 and Ubuntu 8.04LTS) so I discard a pgsql problem and =
anyway
database should deal with this…



 









------=_NextPart_000_0510_01CAC9A4.DF744900--

Re: Bad encoded chars in being inserted into database

am 22.03.2010 22:50:42 von Gabriele Bartolini

Hi,
>
> I agree with this postgres policy, but what I don=92t is that you can=20
> INSERT them via application. That is, no bad characters should be=20
> inserted into database. The check should be made for both import and=20
> insert procedures so no bad chars would appear into database.
>
Could you please tell us which PostgreSQL version you are currently=20
using? Also it would be useful to know:

* which is your database encoding?
* which is the client_encoding setting you are using?

In general, what you are saying suggests me that you are using the=20
SQL_ASCII encoding at some stage in your session (either on the server=20
side or the client side).

However, before I go on, please answer the above questions.

Thanks,
Gabriele

--=20
Gabriele Bartolini - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gabriele.bartolini@2ndQuadrant.it | www.2ndQuadrant.it


--=20
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Bad encoded chars in being inserted into database

am 22.03.2010 23:05:44 von Scott Marlowe

On Mon, Mar 22, 2010 at 2:48 AM, I=F1igo Martinez Lasala
wrote:
> Hi everybody.
>
>
>
> I have a doubt about how postgres deal with bad encoded characters into
> database.
>
> We have several gforge application. They are using postgres as database.
>
> If we export a database and import again, we have to deal with several bad
> encoded chars. These bad chars always come from copy & paste emails from
> Lotus Notes mail client. OK, I understand the Notes client people is using
> is an ancient application and does not deal very well with some Unicode
> chars=85
>
> What I cannot understand is why postgres accept these bad enconded
> characters into database, exports them without problema but does not allow
> them when importing again.
>
> This has been happening since postgers 7.3. However, until 7.4.XX (y don=
=92t
> remember what minor version) you could import database without ERRORs.
> However, since 7.4.XX it=92s impossible and it=92s imperative to clean bad
> characters (using iconv, for example) prior importing tables.

This is because postgresql's support for UTF-8 encoding (and all
encoding really) has gotten tighter over time, so that the filter to
catch improperly encoded UTF has gotten better with each major
release.

--=20
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Bad encoded chars in being inserted into database

am 23.03.2010 09:10:16 von imartinez

--=-uTaQwnr0mq6cgvQoRQnS
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Postgresql version: 8.1.13 (8.1.13-0etch1)
database encoding: UTF-8.
client_encoding: default, that is, it's no set at php level.=20

However, pg_client_encoding returns "UTF8" as client encoding.

Thank you, Gabriele.

-----Original Message-----
From: Gabriele Bartolini
To: Iñigo Martinez Lasala
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Bad encoded chars in being inserted into database
Date: Mon, 22 Mar 2010 22:50:42 +0100


Hi,
>
> I agree with this postgres policy, but what I donâ€=99t is that you=
can=20
> INSERT them via application. That is, no bad characters should be=20
> inserted into database. The check should be made for both import and=20
> insert procedures so no bad chars would appear into database.
>
Could you please tell us which PostgreSQL version you are currently=20
using? Also it would be useful to know:

* which is your database encoding?
* which is the client_encoding setting you are using?

In general, what you are saying suggests me that you are using the=20
SQL_ASCII encoding at some stage in your session (either on the server=20
side or the client side).

However, before I go on, please answer the above questions.

Thanks,
Gabriele




--=-uTaQwnr0mq6cgvQoRQnS
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: 7bit








Postgresql version: 8.1.13 (8.1.13-0etch1)

database encoding: UTF-8.

client_encoding: default, that is, it's no set at php level.



However, pg_client_encoding returns "UTF8" as client encoding.



Thank you, Gabriele.



-----Original Message-----

From: Gabriele Bartolini <>

To: Iñigo Martinez Lasala <>

Cc:

Subject: Re: [ADMIN] Bad encoded chars in being inserted into database

Date: Mon, 22 Mar 2010 22:50:42 +0100




Hi,
>
> I agree with this postgres policy, but what I don’t is that you can
> INSERT them via application. That is, no bad characters should be
> inserted into database. The check should be made for both import and
> insert procedures so no bad chars would appear into database.
>
Could you please tell us which PostgreSQL version you are currently
using? Also it would be useful to know:

* which is your database encoding?
* which is the client_encoding setting you are using?

In general, what you are saying suggests me that you are using the
SQL_ASCII encoding at some stage in your session (either on the server
side or the client side).

However, before I go on, please answer the above questions.

Thanks,
Gabriele







--=-uTaQwnr0mq6cgvQoRQnS--

Re: Bad encoded chars in being inserted into database

am 23.03.2010 09:11:58 von imartinez

--=-1MNaoAgT4Wz27SRT5hZf
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

We are working with 8.1 and migrating to 8.4....
We will see if after migration this behavior has disappeared. ;-)=20

Thank you, Scott.

-----Original Message-----
From: Scott Marlowe
To: Iñigo Martinez Lasala
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Bad encoded chars in being inserted into database
Date: Mon, 22 Mar 2010 16:05:44 -0600


On Mon, Mar 22, 2010 at 2:48 AM, Iñigo Martinez Lasala
wrote:
> Hi everybody.
>
>
>
> I have a doubt about how postgres deal with bad encoded characters into
> database.
>
> We have several gforge application. They are using postgres as database=
..
>
> If we export a database and import again, we have to deal with several =
bad
> encoded chars. These bad chars always come from copy & paste emails fro=
m
> Lotus Notes mail client. OK, I understand the Notes client people is us=
ing
> is an ancient application and does not deal very well with some Unicode
> charsâ€=A6
>
> What I cannot understand is why postgres accept these bad enconded
> characters into database, exports them without problema but does not al=
low
> them when importing again.
>
> This has been happening since postgers 7.3. However, until 7.4.XX (y do=
nâ€=99t
> remember what minor version) you could import database without ERRORs.
> However, since 7.4.XX itâ€=99s impossible and itâ€=99s imperati=
ve to clean bad
> characters (using iconv, for example) prior importing tables.

This is because postgresql's support for UTF-8 encoding (and all
encoding really) has gotten tighter over time, so that the filter to
catch improperly encoded UTF has gotten better with each major
release.



--=-1MNaoAgT4Wz27SRT5hZf
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: 7bit








We are working with 8.1 and migrating to 8.4....

We will see if after migration this behavior has disappeared. ;-)



Thank you, Scott.



-----Original Message-----

From: Scott Marlowe <>

To: Iñigo Martinez Lasala <>

Cc:

Subject: Re: [ADMIN] Bad encoded chars in being inserted into database

Date: Mon, 22 Mar 2010 16:05:44 -0600




On Mon, Mar 22, 2010 at 2:48 AM, Iñigo Martinez Lasala
<> wrote:
> Hi everybody.
>
>
>
> I have a doubt about how postgres deal with bad encoded characters into
> database.
>
> We have several gforge application. They are using postgres as database.
>
> If we export a database and import again, we have to deal with several bad
> encoded chars. These bad chars always come from copy & paste emails from
> Lotus Notes mail client. OK, I understand the Notes client people is using
> is an ancient application and does not deal very well with some Unicode
> chars…
>
> What I cannot understand is why postgres accept these bad enconded
> characters into database, exports them without problema but does not allow
> them when importing again.
>
> This has been happening since postgers 7.3. However, until 7.4.XX (y don’t
> remember what minor version) you could import database without ERRORs.
> However, since 7.4.XX it’s impossible and it’s imperative to clean bad
> characters (using iconv, for example) prior importing tables.

This is because postgresql's support for UTF-8 encoding (and all
encoding really) has gotten tighter over time, so that the filter to
catch improperly encoded UTF has gotten better with each major
release.






--=-1MNaoAgT4Wz27SRT5hZf--