Special Characters into database

Special Characters into database

am 28.01.2008 11:58:45 von jcsmorais

Hi there guys,

My doubt is, what's the best way of storing text into a mysql
database, if the text has special characters.. imagine iso-8859-1.

Should I convert the text using htmlentities and then put it into
database, and later when I will retrieve it I should use the reverse
process.

What's the best way of doing this?

Thanks in advance.

Re: Special Characters into database

am 28.01.2008 12:18:24 von Erwin Moller

João Morais wrote:
> Hi there guys,
>
> My doubt is, what's the best way of storing text into a mysql
> database, if the text has special characters.. imagine iso-8859-1.
>
> Should I convert the text using htmlentities and then put it into
> database, and later when I will retrieve it I should use the reverse
> process.
>
> What's the best way of doing this?
>
> Thanks in advance.

Hi,

I think it is easier to store the text raw in the db, and when
displaying it in a browser, use htmlentities.
(Of course, make sure you escape the text before inserting to avoid SQL
injection, eg via mysql_real_escape.)

In my opinion that is the cleanest way to store data, and you can also
use the data for other purposes, eg searching without having to worry
about changed characters because of the htmlentities.

Regards,
Erwin Moller

Re: Special Characters into database

am 28.01.2008 12:26:42 von jcsmorais

> I think it is easier to store the text raw in the db, and when
> displaying it in a browser, use htmlentities.

But in this case, should I use iso-8859-1 collation?

Hmm what If I have a site that's visited by users with different
nationalities and they insert text into database with different
language charsets?


> (Of course, make sure you escape the text before inserting to avoid SQL
> injection, eg via mysql_real_escape.)
Thanks for the tip, but my doubt here has nothing to do with
security. :)


> In my opinion that is the cleanest way to store data, and you can also
> use the data for other purposes, eg searching without having to worry
> about changed characters because of the htmlentities.
Thanks for your opinion mate.

Re: Special Characters into database

am 28.01.2008 12:42:21 von Willem Bogaerts

> Hmm what If I have a site that's visited by users with different
> nationalities and they insert text into database with different
> language charsets?

Your web server says which encodings it accepts. If you want different
character sets, you will have to use an encoding that supports them and
instruct your server to communicate it to the browsers.

iso-8859-1 has no room for, for instance, Russian characters. So if you
want to support those, use some broader encoding like utf-8 that
supports the whole unicode character set.

Best regards,
--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/

Re: Special Characters into database

am 28.01.2008 12:44:23 von Courtney

� wrote:
>> I think it is easier to store the text raw in the db, and when
>> displaying it in a browser, use htmlentities.
>
> But in this case, should I use iso-8859-1 collation?
>
> Hmm what If I have a site that's visited by users with different
> nationalities and they insert text into database with different
> language charsets?
>
>
>> (Of course, make sure you escape the text before inserting to avoid SQL
>> injection, eg via mysql_real_escape.)
> Thanks for the tip, but my doubt here has nothing to do with
> security. :)
>
>
>> In my opinion that is the cleanest way to store data, and you can also
>> use the data for other purposes, eg searching without having to worry
>> about changed characters because of the htmlentities.
> Thanks for your opinion mate.

I'm with Erwin here as well. Store raw and adapt the display to the user
environment.

Those posting in Chinse, will have Chinese charsets set up.

Those without chinese charsets set up will not probably understand the
chinese anyway..
;-)

Re: Special Characters into database

am 28.01.2008 12:47:55 von jcsmorais

> iso-8859-1 has no room for, for instance, Russian characters. So if you
> want to support those, use some broader encoding like utf-8 that
> supports the whole unicode character set.

Right, so if I want to support this kind of characters my:

* mysql charset
* mysql connection collation
* table collation's

Should all be utf8, and with that I won't have any trouble with
different charsets?

Including my meta tag:
?

Re: Special Characters into database

am 28.01.2008 13:26:45 von Willem Bogaerts

> ?

If you have any chance of avoiding such meta tags, do so. Send the
header it "replaces".

What would you think of

?

Sending a meta tag describing the encoding within the encoded page is
like locking the key to a safe inside it.
What's even worse, your server may send a "latin-1" header while your
code could have an "utf-8" meta tag. Which of the two would be the right
one?

Best regards,
--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/

Re: Special Characters into database

am 28.01.2008 13:31:27 von Jerry Stuckle

João Morais wrote:
> Hi there guys,
>
> My doubt is, what's the best way of storing text into a mysql
> database, if the text has special characters.. imagine iso-8859-1.
>
> Should I convert the text using htmlentities and then put it into
> database, and later when I will retrieve it I should use the reverse
> process.
>
> What's the best way of doing this?
>
> Thanks in advance.
>

Check in comp.databases.mysql. There are several things to consider
when using special characters, and the people over there know them all.

And no, you do not want to use htmlentities() to convert them. You
should never store display-specific data in the database; always store
the raw data and convert it before displaying it.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Re: Special Characters into database

am 28.01.2008 13:33:34 von jcsmorais

> If you have any chance of avoiding such meta tags, do so. Send the
> header it "replaces".
Ok, but the header should specify utf8 too?


> What would you think of
>
> ?
>
> Sending a meta tag describing the encoding within the encoded page is
> like locking the key to a safe inside it.
> What's even worse, your server may send a "latin-1" header while your
> code could have an "utf-8" meta tag. Which of the two would be the right
> one?

I see the problem.

And this:

>Right, so if I want to support this kind of characters my:
>
>* mysql charset
>* mysql connection collation
>* table collation's

Is it correct?

Re: Special Characters into database

am 28.01.2008 14:44:07 von Willem Bogaerts

> And this:
>> Right, so if I want to support this kind of characters my:
>>
>> * mysql charset
>> * mysql connection collation
>> * table collation's
>
> Is it correct?

In effect, there are really two things in MySQL that "have" encodings:
individual fields and connections. Most of the rest is just a default.
The database encoding and the table encoding are just the default to use
for the fields that are created. The connection encoding is used to see
if any conversion is needed.

Assuming you are using a moderately recent version of MySQL, you start
with sending the encoding just after opening the connection, like:
SET NAMES utf8;

Now, MySQL assumes your client software is speaking in utf8 encoding. If
you use utf-8 encoded fields as well (for instance by only specifying
the utf-8 encoding on the CREATE DATABASE statement) all should work on
the database side.

On the webserver side, you should do the same. In the htmlentities()
function, you can pass an encoding. Please note that a lot of systems
come with encodings: your OS, e-mail, PDF generators, etc. Some have
only one encoding, others can be told what encoding should be used.

Good luck,
--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/

Re: Special Characters into database

am 28.01.2008 16:56:40 von Michael Fesser

..oO(Willem Bogaerts)

>In effect, there are really two things in MySQL that "have" encodings:

There are at least four: database, table, column, connection. Of course
you can also define a default ecoding for your tables when you create
then. Usually I use ISO-8859-1 as the table default and UTF-8 for the
columns that hold TEXT or (VAR)CHARs.

>On the webserver side, you should do the same. In the htmlentities()
>function, you can pass an encoding.

Most if not all used browsers today support UTF-8, so you don't need
htmlentities() anymore. htmlspecialchars() is more than enough to take
care of the few chars that have a special meaning in HTML, all others
can be written and delivered as-is.

Micha