Space requirements (with respect to foriegn languages)
am 26.08.2004 22:36:25 von Gerard Samuel
My site/code/database is developed primarily for the english language.
I've had people from "The Far East" add content to my site using their
native language, and it is displaying properly in the site.
But Im a bit concerned about the number of characters these languages use=
..
For example, I've had someone enter ->
chinese testingãä¸æ=87
It is saved in the database as ->
chinese testing 中文
Now, forgive my ignorance, but I have no idea what the additional
chinese characters mean, but from the values in the database, Im
assuming that it amounts to 3 characters.
But if Im correct that those are 3 characters, it is
using up 24 characters in a column.
My concern is that what if I were to limit a column to say 25 "english"
characters, and a chinese fellow, comes by and hypothetically says
"Hello World" in chinese and goes over the limit of the column, the data
will be truncated.
Is there anything that can be done to overcome this shortcoming?
Im currently using PostgreSQL 7.4.2, using SQL_ASCII as the database
characterset, FreeBSD 4.10, php 4.3.6.
Thanks for any advise you can provide...
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
http://archives.postgresql.org
Re: Space requirements (with respect to foriegn languages)
am 28.08.2004 16:39:37 von Markus Bertheau
РЧÑв, 26.08.2004, в 22:36, Gerard Samuel п=
иÑеÑ:
> My site/code/database is developed primarily for the english language.
> I've had people from "The Far East" add content to my site using their
> native language, and it is displaying properly in the site.
> But Im a bit concerned about the number of characters these languages use.
> For example, I've had someone enter ->
> chinese testingãä¸æ=87
>=20
> It is saved in the database as ->
> chinese testing 中文
Your web page uses a character set that does not contain chinese
characters. So the browser decided to send their respective HTML
entities instead. These entities, as you correctly observed, amount to
more than one (latin, ASCII) character.
> Now, forgive my ignorance, but I have no idea what the additional
> chinese characters mean, but from the values in the database, Im
> assuming that it amounts to 3 characters.
> But if Im correct that those are 3 characters, it is
> using up 24 characters in a column.
>=20
> My concern is that what if I were to limit a column to say 25 "english"
> characters, and a chinese fellow, comes by and hypothetically says
> "Hello World" in chinese and goes over the limit of the column, the data
> will be truncated.
PostgreSQL will not truncate the data, but reject it; but the general
point is correct.
> Is there anything that can be done to overcome this shortcoming?
>=20
> Im currently using PostgreSQL 7.4.2, using SQL_ASCII as the database
> characterset, FreeBSD 4.10, php 4.3.6.
Change your site to use a character set that includes chinese
characters, for example Unicode. The most common encoding of Unicode on
the web is UTF-8. It's also the encoding PostgreSQL uses when you use
UNICODE as the database encoding.
If you decide to switch your site to UTF-8 and want varchar(25) to mean
25 characters, and not 25 bytes, you have to change the database
encoding to UNICODE accordingly.
--=20
Markus Bertheau
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
Re: Space requirements (with respect to foriegn languages)
am 28.08.2004 22:23:58 von Gerard Samuel
Markus Bertheau wrote:
> РЧÑв, 26.08.2004, в 22:36, Gerard Samuel п=
иÑеÑ:
>>Im currently using PostgreSQL 7.4.2, using SQL_ASCII as the database
>>characterset, FreeBSD 4.10, php 4.3.6.
>=20
>=20
> Change your site to use a character set that includes chinese
> characters, for example Unicode. The most common encoding of Unicode on
> the web is UTF-8. It's also the encoding PostgreSQL uses when you use
> UNICODE as the database encoding.
>=20
> If you decide to switch your site to UTF-8 and want varchar(25) to mean
> 25 characters, and not 25 bytes, you have to change the database
> encoding to UNICODE accordingly.
>=20
I'll try some mock scripts to see if it will pan out...
Thanks
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html