DBD::mysql and utf-8 problem

DBD::mysql and utf-8 problem

am 27.06.2008 10:41:35 von hz

Hi all,

after looking for utf8 issues in this mailing list, i had the feeling,
there would be no problems concerning utf8 and the
DBD::mysql driver anymore.

However i still have a problem. The (simplified) script:

[..]
my $aeu = "\N{LATIN SMALL LETTER A WITH DIAERESIS}";

my $dbh = DBI->connect ("DBI:mysql:test:localhost;mysql_enable_utf8=1",
"test");
my $text = $dbh->selectrow_array (qq (select ?), undef, $aeu);

print qq (length $text = ), length $text, "\n";
$text = decode ("UTF-8", $text);
print qq (length decoded $text = ), length $text,"\n";
[..]

results in =>

length ä = 2
length decoded ä = 1

on an iso-8859-1 console.


Questions:
Why does Perl not calculate the correct number of characters?
Does one (still) have to decode every string value received from the
DBD::mysql driver?


Thanks for any Help

Hemut


--
MySQL Perl Mailing List
For list archives: http://lists.mysql.com/perl
To unsubscribe: http://lists.mysql.com/perl?unsub=gcdmp-msql-mysql-modules@m .gmane.org

Re: DBD::mysql and utf-8 problem

am 27.06.2008 13:36:56 von dwalu

On Fri, 27 Jun 2008, Helmut wrote:

=>Hi all,
=>
=>after looking for utf8 issues in this mailing list, i had the feeling,
=>there would be no problems concerning utf8 and the
=>DBD::mysql driver anymore.
=>
=>However i still have a problem. The (simplified) script:
=>
=>[..]
=>my $aeu = "\N{LATIN SMALL LETTER A WITH DIAERESIS}";
=>
=>my $dbh = DBI->connect ("DBI:mysql:test:localhost;mysql_enable_utf8=1",
=>"test");
=>my $text = $dbh->selectrow_array (qq (select ?), undef, $aeu);
=>
=>print qq (length $text = ), length $text, "\n";
=>$text = decode ("UTF-8", $text);
=>print qq (length decoded $text = ), length $text,"\n";
=>[..]
=>
=>results in =>
=>
=>length Ã? = 2
=>length decoded ä = 1
=>
=>on an iso-8859-1 console.
=>
=>
=>Questions:
=>Why does Perl not calculate the correct number of characters?
=>Does one (still) have to decode every string value received from the
=>DBD::mysql driver?
=>
=>
=>Thanks for any Help
=>
=>Hemut
=>
=>
Hello Hemut,

It's not a problem with the driver or perl. You need to configure the
server or connection to return your specific charset, else it defaults to
latin1 IIRC.

See: http://dev.mysql.com/doc/refman/5.0/en/charset-connection.ht ml
--
- Dwalu
..peace
--
I am an important person in this world -
Now is the most important time in my life -
My mistakes are my best teachers -
So I will be fearless.
- Student Creed

--
MySQL Perl Mailing List
For list archives: http://lists.mysql.com/perl
To unsubscribe: http://lists.mysql.com/perl?unsub=gcdmp-msql-mysql-modules@m .gmane.org

Re: DBD::mysql and utf-8 problem

am 27.06.2008 13:58:10 von hz

Hi Dwalu,

that is clear... I checked that (in the same script) by

...
my $vars = $dbh->selectall_arrayref (qq (show variables like 'char%'));

foreach my $var (@$vars) {
print qq ($var->[0] = $var->[1] \n);
}
...

which prints out

"character_set_client = utf8
character_set_connection = utf8
character_set_database = utf8
character_set_filesystem = binary
character_set_results = utf8
character_set_server = utf8
character_set_system = utf8
character_sets_dir = /usr/share/mysql/charsets/"


So i think, this is not the problem.

Just an idea:

In the source code of the DBD::mysql driver i found the
function "sv_utf8_decode" processing the utf8 results.

Wouldn't "sv_utf8_downgrade" make sence?


Thanks

Helmut



Dwalu Z. Khasu schrieb:
> On Fri, 27 Jun 2008, Helmut wrote:
>
> =>Hi all,
> =>
> =>after looking for utf8 issues in this mailing list, i had the feeling,
> =>there would be no problems concerning utf8 and the
> =>DBD::mysql driver anymore.
> =>
> =>However i still have a problem. The (simplified) script:
> =>
> =>[..]
> =>my $aeu = "\N{LATIN SMALL LETTER A WITH DIAERESIS}";
> =>
> =>my $dbh = DBI->connect ("DBI:mysql:test:localhost;mysql_enable_utf8=1",
> =>"test");
> =>my $text = $dbh->selectrow_array (qq (select ?), undef, $aeu);
> =>
> =>print qq (length $text = ), length $text, "\n";
> =>$text = decode ("UTF-8", $text);
> =>print qq (length decoded $text = ), length $text,"\n";
> =>[..]
> =>
> =>results in =>
> =>
> =>length Ã? = 2
> =>length decoded ä = 1
> =>
> =>on an iso-8859-1 console.
> =>
> =>
> =>Questions:
> =>Why does Perl not calculate the correct number of characters?
> =>Does one (still) have to decode every string value received from the
> =>DBD::mysql driver?
> =>
> =>
> =>Thanks for any Help
> =>
> =>Hemut
> =>
> =>
> Hello Hemut,
>
> It's not a problem with the driver or perl. You need to configure the
> server or connection to return your specific charset, else it defaults to
> latin1 IIRC.
>
> See: http://dev.mysql.com/doc/refman/5.0/en/charset-connection.ht ml
>


--
HZ.labs
Dr. Helmut Zeilinger
Wiesengrund 2
D-86 684 Holzheim
Tel. 08276 58767
Fax. 08276 58787
Mobil 0160 91 55 61 68
Internet http://www.hzlabs.de
Mail hz@hzlabs.de


--
MySQL Perl Mailing List
For list archives: http://lists.mysql.com/perl
To unsubscribe: http://lists.mysql.com/perl?unsub=gcdmp-msql-mysql-modules@m .gmane.org

Re: DBD::mysql and utf-8 problem

am 27.06.2008 14:50:53 von hz

Hi Rob,


i testet this and got no change.

Of course i testet a "real" query from a real table also; with the same
results.

I have the "feeling" with the Server + Database everything is fine
(mysql command line output on a utf8 console is ok,
mysqldump works ok,
didn't yet check a jbdc driver..),
but it is really a driver issue..

Thank You

Helmut


Rob Mueller schrieb:
>> my $text = $dbh->selectrow_array (qq (select ?), undef, $aeu);
>>
>> Why does Perl not calculate the correct number of characters?
>> Does one (still) have to decode every string value received from the
>> DBD::mysql driver?
>
> I haven't tested this, but maybe there's some confusion over the
> charset type of the data. The documentation talks about the actual
> data having to be utf8 data (eg a latin1 column will come back as
> octet data, not a perl utf8 string). Maybe try this:
>
> my ($text) = $dbh->selectrow_array (qq (select convert(? using utf8)),
> undef, $aeu);
>
> Rob
>


--
MySQL Perl Mailing List
For list archives: http://lists.mysql.com/perl
To unsubscribe: http://lists.mysql.com/perl?unsub=gcdmp-msql-mysql-modules@m .gmane.org

Re: DBD::mysql and utf-8 problem

am 27.06.2008 15:43:21 von Patrick Galbraith

Helmut,

If I were to try a small change in the driver code as you indicated,
could you test that change for me to see if it works for you?

We use UTF-8 all the time where I work at Grazr and have encountered no
problems.
I'd also be interested to see what other perl drivers produce in terms
of results for other databases. I may have a PostgreSQL server on one of
my boxes.

If it's the driver that has an issue, I'd like to fix it. I'll be glad
to receive any help from you or others in testing this too ;)

Kind regards,

Patrick

Helmut wrote:
> Hi Rob,
>
>
> i testet this and got no change.
>
> Of course i testet a "real" query from a real table also; with the
> same results.
>
> I have the "feeling" with the Server + Database everything is fine
> (mysql command line output on a utf8 console is ok,
> mysqldump works ok,
> didn't yet check a jbdc driver..),
> but it is really a driver issue..
>
> Thank You
>
> Helmut
>
>
> Rob Mueller schrieb:
>>> my $text = $dbh->selectrow_array (qq (select ?), undef, $aeu);
>>>
>>> Why does Perl not calculate the correct number of characters?
>>> Does one (still) have to decode every string value received from the
>>> DBD::mysql driver?
>>
>> I haven't tested this, but maybe there's some confusion over the
>> charset type of the data. The documentation talks about the actual
>> data having to be utf8 data (eg a latin1 column will come back as
>> octet data, not a perl utf8 string). Maybe try this:
>>
>> my ($text) = $dbh->selectrow_array (qq (select convert(? using
>> utf8)), undef, $aeu);
>>
>> Rob
>>
>
>


--
Patrick Galbraith, Senior Programmer
Grazr - Easy feed grazing and sharing
http://www.grazr.com

Satyam Eva Jayate - Truth Alone Triumphs
Mundaka Upanishad




--
MySQL Perl Mailing List
For list archives: http://lists.mysql.com/perl
To unsubscribe: http://lists.mysql.com/perl?unsub=gcdmp-msql-mysql-modules@m .gmane.org

Re: DBD::mysql and utf-8 problem

am 28.06.2008 09:16:23 von hz

I forgot the copy to the list...

Hi Patrick,

of course..

Just send me the modified driver..


Best regards

Helmut

Patrick Galbraith schrieb:
> Helmut,
>
> If I were to try a small change in the driver code as you indicated,
> could you test that change for me to see if it works for you?
>
> We use UTF-8 all the time where I work at Grazr and have encountered
> no problems.
> I'd also be interested to see what other perl drivers produce in terms
> of results for other databases. I may have a PostgreSQL server on one
> of my boxes.
>
> If it's the driver that has an issue, I'd like to fix it. I'll be glad
> to receive any help from you or others in testing this too ;)
>
> Kind regards,
>
> Patrick
>
> Helmut wrote:
>> Hi Rob,
>>
>>
>> i testet this and got no change.
>>
>> Of course i testet a "real" query from a real table also; with the
>> same results.
>>
>> I have the "feeling" with the Server + Database everything is fine
>> (mysql command line output on a utf8 console is ok,
>> mysqldump works ok,
>> didn't yet check a jbdc driver..),
>> but it is really a driver issue..
>>
>> Thank You
>>
>> Helmut
>>
>>
>> Rob Mueller schrieb:
>>>> my $text = $dbh->selectrow_array (qq (select ?), undef, $aeu);
>>>>
>>>> Why does Perl not calculate the correct number of characters?
>>>> Does one (still) have to decode every string value received from
>>>> the DBD::mysql driver?
>>>
>>> I haven't tested this, but maybe there's some confusion over the
>>> charset type of the data. The documentation talks about the actual
>>> data having to be utf8 data (eg a latin1 column will come back as
>>> octet data, not a perl utf8 string). Maybe try this:
>>>
>>> my ($text) = $dbh->selectrow_array (qq (select convert(? using
>>> utf8)), undef, $aeu);
>>>
>>> Rob
>>>
>>
>>
>
>



--
MySQL Perl Mailing List
For list archives: http://lists.mysql.com/perl
To unsubscribe: http://lists.mysql.com/perl?unsub=gcdmp-msql-mysql-modules@m .gmane.org

Re: DBD::mysql and utf-8 problem

am 28.06.2008 13:08:14 von dwalu

On Fri, 27 Jun 2008, Helmut Zeilinger wrote:

=>Hi Dwalu,
=>
=>that is clear... I checked that (in the same script) by
=>
=>..
=>my $vars = $dbh->selectall_arrayref (qq (show variables like 'char%'));
=>
=>foreach my $var (@$vars) {
=> print qq ($var->[0] = $var->[1] \n);
=>}
=>..
=>
=>which prints out
=>
=>"character_set_client = utf8
=>character_set_connection = utf8
=>character_set_database = utf8
=>character_set_filesystem = binary
=>character_set_results = utf8
=>character_set_server = utf8
=>character_set_system = utf8
=>character_sets_dir = /usr/share/mysql/charsets/"
=>
=>
=>So i think, this is not the problem.
=>
=>Just an idea:
=>
=>In the source code of the DBD::mysql driver i found the
=>function "sv_utf8_decode" processing the utf8 results.
=>
=>Wouldn't "sv_utf8_downgrade" make sence?
=>
Aha! I was unaware if you'd already configured your app. Hmmmmmm, I'm
not sure about sv_utf8_downgrade, however I've created several UTF-8 apps
using perl and mysql without problems so curious why yours -doesn't just
work- :s

In any case, I don't have the cycles at the moment to test and it would
seem you may have those changes already or coming soon enough. I look
forward to reading the results--cheers!


--
- Dwalu
..peace
--
I am an important person in this world -
Now is the most important time in my life -
My mistakes are my best teachers -
So I will be fearless.
- Student Creed

--
MySQL Perl Mailing List
For list archives: http://lists.mysql.com/perl
To unsubscribe: http://lists.mysql.com/perl?unsub=gcdmp-msql-mysql-modules@m .gmane.org