DBI, Mysql & Unicode

DBI, Mysql & Unicode

am 30.11.2004 11:54:08 von Angie Ahl

Hi List

Please excuse the cross posting, I've also posted on the Perl DBI
list, but no one there seems to be able to help (or they're all
recovering from thanksgiving or something ;)

I've been using DBI & MySQL for some time now and have decided to try
and use unicode so that my web apps can be multilingual.

I'm trying to work out getting data into and out of MySQL with utf 8.

I'm inserting the data like this:

I've got a hash in the following format:

my %uni =3D (
=A0 =A0 =A0 =A0hebrew_alef =3D> {
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0character =3D> chr(0x05d0),
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0language =3D> "hebrew",
=A0 =A0 =A0 =A0},
=A0 =A0 =A0 =A0recenu =3D> {
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0character =3D> "re\x{e7}enu"=
,
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0language =3D> "french",
=A0 =A0 =A0 =A0},
);

and I'm inserting the values into the database like this:

my $funny =3D "CONVERT(_utf8'$ucode' USING utf8)";
my $sql =3D qq {INSERT INTO unitest (id, aword) VALUES ( "$id", $funny )};

Is this CONVERT business necessary/the right way to do it?.

getting the data back is done like this:

sub dbget {
=A0 =A0 =A0 =A0my $id =3D shift;
=A0 =A0 =A0 =A0my $sql =3D "select aword from unitest where id =3D \"$id\""=
;
=A0 =A0 =A0 =A0my $cur =3D $dbh->prepare($sql);
=A0 =A0 =A0 =A0$cur->execute;
=A0 =A0 =A0 =A0my $char =3D ($cur->fetchrow)[0];
=A0 =A0 =A0 =A0return decode("utf8", $char);
}

running is_utf8( &dbget($_)) ? "is unicode" : "is not unicode";
indicates that I am getting utf8 data back when I use decode, but
here's what's wrong:

the following code is used to output the data:

foreach (sort keys %uni) {
=A0 =A0 =A0 =A0&dbpush( $_, $uni{$_}->{character} );
=A0 =A0 =A0 =A0printf $tablinedef,
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0$_,
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0$uni{$_}->{language},
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0$uni{$_}->{character},
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0&dbget( $_ ),
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0is_utf8( &dbget($_)) ? "is unicode" : "is no=
t unicode";
}

$uni{$_}->{character} shows the character as expected in firefox eg: ab=EEm=
er

but &dbget doesn't show it correctly (it should apparently), but the
is_utf8 test says it is utf8.

Sorry for the long post but I'm so new to unicode that I just can't
work out what I'm missing.... Here's hoping Mr Dubois is around....
great books BTW thanks.

Angie

--
MySQL Perl Mailing List
For list archives: http://lists.mysql.com/perl
To unsubscribe: http://lists.mysql.com/perl?unsub=3Dgcdmp-msql-mysql-modules @m.gmane.org