DBD::mysql and UTF-8

DBD::mysql and UTF-8

am 13.08.2005 20:27:26 von YorHel

Hello,

I have a perl (5.8.7) application which tries to put UTF-8 data in a
mysql (4.1.13) database (with database collation = utf8_general_ci),
using DBI (1.48) and DBD::mysql (3.0002), like this:

use Encode;
my $utf8data = decode('UTF-8', $somevar);
my $rows = $dbh->do("UPDATE test SET text = '$utf8data' WHERE name =
'test'");

I can get the same data from the database like this:

my $obj = $dbh->prepare("SELECT text FROM test WHERE name = 'test'");
$obj->execute();
my $result = ($obj->fetchrow_array())[0];
$result = decode('UTF-8', $result);

That works like a charm within perl, the problem is though, that the
data isn't really stored as I want it. When I fetch the same data with
php (phpMyAdmin), I only get weird characters. And when I add data with
phpMyAdmin to the table, I can't get that data in valid UTF-8 in perl,
but I only get weird characters.
Is there a way to work with UTF-8 and mysql in perl? without getting
weird results?

thanks in advance,
//YorHel

Re: DBD::mysql and UTF-8

am 13.08.2005 20:58:26 von xhoster

yorhel@gmail.com wrote:
> Hello,
>
> I have a perl (5.8.7) application which tries to put UTF-8 data in a
> mysql (4.1.13) database (with database collation = utf8_general_ci),
> using DBI (1.48) and DBD::mysql (3.0002), like this:
>
> use Encode;
> my $utf8data = decode('UTF-8', $somevar);

Why are you doing this? Isn't $somevar already UTF-8?


> my $rows = $dbh->do("UPDATE test SET text = '$utf8data' WHERE name =
> 'test'");
>
> I can get the same data from the database like this:
>
> my $obj = $dbh->prepare("SELECT text FROM test WHERE name = 'test'");
> $obj->execute();
> my $result = ($obj->fetchrow_array())[0];
> $result = decode('UTF-8', $result);

Now you've chain-called decode twice. Aren't encode and decode
usually used as complements of each other, not chained with
themselves only?


> That works like a charm within perl, the problem is though, that the
> data isn't really stored as I want it.

How do you know?

> When I fetch the same data with
> php (phpMyAdmin), I only get weird characters.

So what makes you think this is a Perl problem and not a php problem?

> And when I add data with
> phpMyAdmin to the table, I can't get that data in valid UTF-8 in perl,
> but I only get weird characters.
> Is there a way to work with UTF-8 and mysql in perl?

You just reported that your method works like a charm with mysql and Perl.

> without getting
> weird results?

It seems like the weird results are coming from php, not Perl.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB

Re: DBD::mysql and UTF-8

am 13.08.2005 21:28:04 von Slaven Rezic

yorhel@gmail.com writes:

> Hello,
>
> I have a perl (5.8.7) application which tries to put UTF-8 data in a
> mysql (4.1.13) database (with database collation = utf8_general_ci),
> using DBI (1.48) and DBD::mysql (3.0002), like this:
>
> use Encode;
> my $utf8data = decode('UTF-8', $somevar);

This function should be used if $somevar is a variable which contains
utf-8 data in octets to convert it into perl characters.

> my $rows = $dbh->do("UPDATE test SET text = '$utf8data' WHERE name =
> 'test'");

I am not sure that either DBI or DBD::mysql or mysql can handle perl
characters. Probably you should rather supply UTF-8 octets here.

> I can get the same data from the database like this:
>
> my $obj = $dbh->prepare("SELECT text FROM test WHERE name = 'test'");
> $obj->execute();
> my $result = ($obj->fetchrow_array())[0];
> $result = decode('UTF-8', $result);

Again decode?

Regards,
Slaven

--
Slaven Rezic - slaven rezic de

tkrevdiff - graphical display of diffs between revisions (RCS, CVS or SVN)
http://ptktools.sourceforge.net/#tkrevdiff

Re: DBD::mysql and UTF-8

am 13.08.2005 21:31:29 von YorHel

xhoster@gmail.com wrote:
> yorhel@gmail.com wrote:
> > Hello,
> >
> > I have a perl (5.8.7) application which tries to put UTF-8 data in a
> > mysql (4.1.13) database (with database collation = utf8_general_ci),
> > using DBI (1.48) and DBD::mysql (3.0002), like this:
> >
> > use Encode;
> > my $utf8data = decode('UTF-8', $somevar);
>
> Why are you doing this? Isn't $somevar already UTF-8?

According to Devel::Peek, no

>
>
> > my $rows = $dbh->do("UPDATE test SET text = '$utf8data' WHERE name =
> > 'test'");
> >
> > I can get the same data from the database like this:
> >
> > my $obj = $dbh->prepare("SELECT text FROM test WHERE name = 'test'");
> > $obj->execute();
> > my $result = ($obj->fetchrow_array())[0];
> > $result = decode('UTF-8', $result);
>
> Now you've chain-called decode twice. Aren't encode and decode
> usually used as complements of each other, not chained with
> themselves only?
>

well... I don't get the exact same data when putting something in de
DB, as getting something from the DB, after a lot of testing, this is
the only way I found.

>
> > That works like a charm within perl, the problem is though, that the
> > data isn't really stored as I want it.
>
> How do you know?
>

Sorry, forgot to mention that the command-line 'mysql' gives me the
same output as phpMyAdmin, and I have added
'default-character-set=utf8' at /etc/my.cnf, so I can assume that that
client gives me the "real" output.

> > When I fetch the same data with
> > php (phpMyAdmin), I only get weird characters.
>
> So what makes you think this is a Perl problem and not a php problem?
>
> > And when I add data with
> > phpMyAdmin to the table, I can't get that data in valid UTF-8 in perl,
> > but I only get weird characters.
> > Is there a way to work with UTF-8 and mysql in perl?
>
> You just reported that your method works like a charm with mysql and Perl.
>

But I also like to use other tools on the same database, so the IS a
problem :)

> > without getting
> > weird results?
>
> It seems like the weird results are coming from php, not Perl.
>
> Xho
>
> --
> -------------------- http://NewsReader.Com/ --------------------
> Usenet Newsgroup Service $9.95/Month 30GB

Re: DBD::mysql and UTF-8

am 14.08.2005 01:23:10 von xhoster

"YorHel" wrote:
> xhoster@gmail.com wrote:

> well... I don't get the exact same data when putting something in de
> DB, as getting something from the DB, after a lot of testing, this is
> the only way I found.

I think you maybe you are testing the wrong thing. Since you want Perl to
play nice with others, you should do the experimentation on that focus,
i.e. getting Perl to read what others have written, rather than getting
Perl to read what Perl has written.

>
> >
> > > That works like a charm within perl, the problem is though, that the
> > > data isn't really stored as I want it.
> >
> > How do you know?
> >
>
> Sorry, forgot to mention that the command-line 'mysql' gives me the
> same output as phpMyAdmin, and I have added
> 'default-character-set=utf8' at /etc/my.cnf, so I can assume that that
> client gives me the "real" output.

Is this in the [client] section or one of the server sections? (My client
can't even read the /etc/my.cnf file, so of course doesn't care what it
says.)

If you insert the utf-8 data using phpMyAdmin and then select it with
command line mysql, does it show you what you expected?

Anyway, if I knew exactly what Perl was inserting[1], what it was getting
back[2], what mysql thought it had[3], and what php thought it had, I would
be able to offer more help. Or, if you accept php as being the gold
standard, insert something with php and then retrieve it with Perl and use
syntax [2] on it.

> >
> > You just reported that your method works like a charm with mysql and
> > Perl.
> >
>
> But I also like to use other tools on the same database, so the IS a
> problem :)

Yes, but I'm not sure who the problem is with :(.

[1] print join ",", map ord, split //, $somevar;
[2] print join ",", map ord, split //, $result; # do both before and after
# the decode call.
[3] SELECT ascii(mid(text,1,1)) FROM test WHERE name = 'test'
SELECT ascii(mid(text,2,1)) FROM test WHERE name = 'test'
SELECT ascii(mid(text,3,1)) FROM test WHERE name = 'test'
etc.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB

Re: DBD::mysql and UTF-8

am 14.08.2005 10:31:17 von YorHel

xhoster@gmail.com wrote:
> "YorHel" wrote:
> > xhoster@gmail.com wrote:
>
> > well... I don't get the exact same data when putting something in de
> > DB, as getting something from the DB, after a lot of testing, this is
> > the only way I found.
>
> I think you maybe you are testing the wrong thing. Since you want Perl to
> play nice with others, you should do the experimentation on that focus,
> i.e. getting Perl to read what others have written, rather than getting
> Perl to read what Perl has written.
>
> >
> > >
> > > > That works like a charm within perl, the problem is though, that the
> > > > data isn't really stored as I want it.
> > >
> > > How do you know?
> > >
> >
> > Sorry, forgot to mention that the command-line 'mysql' gives me the
> > same output as phpMyAdmin, and I have added
> > 'default-character-set=utf8' at /etc/my.cnf, so I can assume that that
> > client gives me the "real" output.
>
> Is this in the [client] section or one of the server sections? (My client
> can't even read the /etc/my.cnf file, so of course doesn't care what it
> says.)

$ mysql --help | grep default-character-set
default-character-set utf8

seems ok...

>
> If you insert the utf-8 data using phpMyAdmin and then select it with
> command line mysql, does it show you what you expected?
>

Yes, phpMyAdmin and cli mysql give me the same results.

> Anyway, if I knew exactly what Perl was inserting[1], what it was getting
> back[2], what mysql thought it had[3], and what php thought it had, I would
> be able to offer more help. Or, if you accept php as being the gold
> standard, insert something with php and then retrieve it with Perl and use
> syntax [2] on it.
>
> > >
> > > You just reported that your method works like a charm with mysql and
> > > Perl.
> > >
> >
> > But I also like to use other tools on the same database, so the IS a
> > problem :)
>
> Yes, but I'm not sure who the problem is with :(.
>
> [1] print join ",", map ord, split //, $somevar;
> [2] print join ",", map ord, split //, $result; # do both before and after
> # the decode call.
> [3] SELECT ascii(mid(text,1,1)) FROM test WHERE name = 'test'
> SELECT ascii(mid(text,2,1)) FROM test WHERE name = 'test'
> SELECT ascii(mid(text,3,1)) FROM test WHERE name = 'test'
> etc.
>

$rows = $dbh->do("UPDATE test SET text = '$somevar' WHERE name =
'test'");
print "\n[1] " . join ",", map ord, split //, $somevar;

$obj = $dbh->prepare("SELECT text FROM test WHERE name = 'test'");
$obj->execute();
my $result = ($obj->fetchrow_array())[0];
print "\n[2] be4 " . join ",", map ord, split //, $result;
$result = decode('UTF-8', $result);
print "\n[2] aft " . join ",", map ord, split //, $result;

## Same data, but inserted with phpMyAdmin
my $obj = $dbh->prepare("SELECT text FROM test WHERE name =
'fromphp'");
$obj->execute();
my $fromphp = ($obj->fetchrow_array())[0];
print "\n[2] php " . join ",", map ord, split //, $fromphp;

my $count = 1; my @res;
while(1) {
$obj = $dbh->prepare("SELECT ascii(mid(text, $count, 1))" .
"FROM `test` WHERE name = 'test'");
$obj->execute();
my $res = ($obj->fetchrow_array())[0];
last if !$res;
push(@res, $res);
$count++;
}
print "\n[3] " . join ",", @res;


gives me

[1] 235,228,227,232,223,231,236,297,237,238
[2] be4
195,171,195,164,195,163,195,168,195,159,195,167,195,172,196, 169,195,173,195,174
[2] aft 235,228,227,232,223,231,236,297,237,238
[2] php 235,228,227,232,223,231,236,63,237,238
[3]
195,194,195,194,195,194,195,194,195,197,195,194,195,194,195, 194,195,194,195,194

It seems to me that DBD::mysql just won't send/receive UTF-8, what I
get from the php-inserted row is almost the same as $somevar, except
that it doesn't give me all characters (and therefore isn't real UTF-8)

Thnx for your help so far :)

> Xho
>
> --
> -------------------- http://NewsReader.Com/ --------------------
> Usenet Newsgroup Service $9.95/Month 30GB

Re: DBD::mysql and UTF-8

am 14.08.2005 11:26:48 von YorHel

YorHel wrote:
> xhoster@gmail.com wrote:
> > "YorHel" wrote:
> > > xhoster@gmail.com wrote:
> >
> > > well... I don't get the exact same data when putting something in de
> > > DB, as getting something from the DB, after a lot of testing, this is
> > > the only way I found.
> >
> > I think you maybe you are testing the wrong thing. Since you want Perl to
> > play nice with others, you should do the experimentation on that focus,
> > i.e. getting Perl to read what others have written, rather than getting
> > Perl to read what Perl has written.
> >
> > >
> > > >
> > > > > That works like a charm within perl, the problem is though, that the
> > > > > data isn't really stored as I want it.
> > > >
> > > > How do you know?
> > > >
> > >
> > > Sorry, forgot to mention that the command-line 'mysql' gives me the
> > > same output as phpMyAdmin, and I have added
> > > 'default-character-set=utf8' at /etc/my.cnf, so I can assume that that
> > > client gives me the "real" output.
> >
> > Is this in the [client] section or one of the server sections? (My client
> > can't even read the /etc/my.cnf file, so of course doesn't care what it
> > says.)
>
> $ mysql --help | grep default-character-set
> default-character-set utf8
>
> seems ok...
>
> >
> > If you insert the utf-8 data using phpMyAdmin and then select it with
> > command line mysql, does it show you what you expected?
> >
>
> Yes, phpMyAdmin and cli mysql give me the same results.
>
> > Anyway, if I knew exactly what Perl was inserting[1], what it was getting
> > back[2], what mysql thought it had[3], and what php thought it had, I would
> > be able to offer more help. Or, if you accept php as being the gold
> > standard, insert something with php and then retrieve it with Perl and use
> > syntax [2] on it.
> >
> > > >
> > > > You just reported that your method works like a charm with mysql and
> > > > Perl.
> > > >
> > >
> > > But I also like to use other tools on the same database, so the IS a
> > > problem :)
> >
> > Yes, but I'm not sure who the problem is with :(.
> >
> > [1] print join ",", map ord, split //, $somevar;
> > [2] print join ",", map ord, split //, $result; # do both before and after
> > # the decode call.
> > [3] SELECT ascii(mid(text,1,1)) FROM test WHERE name = 'test'
> > SELECT ascii(mid(text,2,1)) FROM test WHERE name = 'test'
> > SELECT ascii(mid(text,3,1)) FROM test WHERE name = 'test'
> > etc.
> >
>
> $rows = $dbh->do("UPDATE test SET text = '$somevar' WHERE name =
> 'test'");
> print "\n[1] " . join ",", map ord, split //, $somevar;
>
> $obj = $dbh->prepare("SELECT text FROM test WHERE name = 'test'");
> $obj->execute();
> my $result = ($obj->fetchrow_array())[0];
> print "\n[2] be4 " . join ",", map ord, split //, $result;
> $result = decode('UTF-8', $result);
> print "\n[2] aft " . join ",", map ord, split //, $result;
>
> ## Same data, but inserted with phpMyAdmin
> my $obj = $dbh->prepare("SELECT text FROM test WHERE name =
> 'fromphp'");
> $obj->execute();
> my $fromphp = ($obj->fetchrow_array())[0];
> print "\n[2] php " . join ",", map ord, split //, $fromphp;
>
> my $count = 1; my @res;
> while(1) {
> $obj = $dbh->prepare("SELECT ascii(mid(text, $count, 1))" .
> "FROM `test` WHERE name = 'test'");
> $obj->execute();
> my $res = ($obj->fetchrow_array())[0];
> last if !$res;
> push(@res, $res);
> $count++;
> }
> print "\n[3] " . join ",", @res;
>
>
> gives me
>
> [1] 235,228,227,232,223,231,236,297,237,238
> [2] be4
> 195,171,195,164,195,163,195,168,195,159,195,167,195,172,196, 169,195,173,195,174
> [2] aft 235,228,227,232,223,231,236,297,237,238
> [2] php 235,228,227,232,223,231,236,63,237,238
> [3]
> 195,194,195,194,195,194,195,194,195,197,195,194,195,194,195, 194,195,194,195,194
>
> It seems to me that DBD::mysql just won't send/receive UTF-8, what I
> get from the php-inserted row is almost the same as $somevar, except
> that it doesn't give me all characters (and therefore isn't real UTF-8)
>

Hate to reply to myself, but after some more googling, I found that
when I cal a
$dbh->do("SET NAMES 'utf8'");
right after the DBH->connect(), I can put the real UTF-8-data in the
DB, so phpMyAdmin and cli mysql both give the real UTF-8 output,
instead of those weird characters. With this, it doesn't matter whether
I call the decode() on $somevar before sending the UPDATE query, the
data will still be inserted as UTF-8.
But then again, I need to set the utf8-flag on $result with decode(),
to get the well-formatted data, which sound like a hack to me, am I not
supposed to get the UTF-8-ed data when I call $obj->fetchrow_array()?
And the use of the "SET NAMES 'utf8'"-call also seems like an hack to
me, why do I need to use that when the database I am using is already
defined as "UTF-8".

Ah well, I am glad I have a "solution" to the problem now :)


> Thnx for your help so far :)
>
> > Xho
> >
> > --
> > -------------------- http://NewsReader.Com/ --------------------
> > Usenet Newsgroup Service $9.95/Month 30GB

Re: DBD::mysql and UTF-8

am 14.08.2005 19:57:16 von Knut Haugen

[ YorHel ]

[ on UTF-8 problems with DBD::Mysql ]

> Hate to reply to myself, but after some more googling, I found that
> when I cal a
> $dbh->do("SET NAMES 'utf8'");
> right after the DBH->connect(), I can put the real UTF-8-data in the
> DB, so phpMyAdmin and cli mysql both give the real UTF-8 output,
> instead of those weird characters. With this, it doesn't matter whether
> I call the decode() on $somevar before sending the UPDATE query, the
> data will still be inserted as UTF-8.
> But then again, I need to set the utf8-flag on $result with decode(),
> to get the well-formatted data, which sound like a hack to me, am I not
> supposed to get the UTF-8-ed data when I call $obj->fetchrow_array()?
> And the use of the "SET NAMES 'utf8'"-call also seems like an hack to
> me, why do I need to use that when the database I am using is already
> defined as "UTF-8".

I have experienced the very same problem and the reason, as far as I
could determine, is that Mysql has something called 'client character
set' and 'connection character set' which get set on a client
connection. Since DBD::Mysql doesn't read the .my.cnf file the
defaults are used, which are swedish latin 1. One would think (and
hope) that mysql would use database charset or table charset or even
default charset, but it doesn't seem to do so (I ran 4.1.12 when
doing this).

See
for further details (it doesn't cover perl specifics though).

I also resorted to "SET NAMES 'utf8'" which solved the issue. I,
luckily, didn't have to use encode and decode since the data was UTF-8
on the way into the db, and should be printed as UTF-8 also.

--
Knut
Matchbox cars and soda cans