Bookmarks

Yahoo Gmail Google Facebook Delicious Twitter Reddit Stumpleupon Myspace Digg

Search queries

sqldatasource dal, wwwxxxenden, convert raid5 to raid 10 mdadm, apache force chunked, nrao wwwxxx, xxxxxdup, procmail change subject header, wwwXxx not20, Wwwxxx.doks sas, linux raid resync after reboot

Links

XODOX
Impressum

#1: utf8 behavior and approach

Posted on 2011-01-24 14:33:05 by Peter Vereshagin

Como esta, perl?

There are things about DBD::mysql those drive my mind to a wasty misuse. I was
about to file a Bug# on an RT but will try to find a better point in a
discussion.
Short version: how could I avoid zero bytes at the end of fetched utf8
mediumtext keeping the right approach to the connect method?

Full story is here:

FastCGI is a well-known environment for Perl apps. I myself have several
ready-to-use CGI applications in my FCGI::Spawn nice and easy.
It is well-known that FCGI.pm since ~0.69 version assumes no octets for its
replaced STDOUT to be printed. The hint for googling is: "Wide character in
FCGI::Stream::PRINT", for example,
http://lists.bestpractical.com/pipermail/rt-users/2010-July/ 065600.html
Same problem with another big application, the Bugzilla, which I believe has
the same of the only XS-style data sources: DBD::mysql.
Trying the t/55utf8.t I notice thet it's ok to put data into BLOB field and
back but not MEDIUMTEXT. Here is the patch for a test:
===
--- t/55utf8.t 2010-04-12 21:37:14.000000000 +0400
+++ t/55utf8.t.new 2011-01-24 15:17:05.000000000 +0300
@@ -28,14 +28,14 @@
plan skip_all =>
"SKIP TEST: You must have MySQL version 5.0 and greater for this test to run";
}
-plan tests => 15;
+plan tests => 14;

ok $dbh->do("DROP TABLE IF EXISTS $table");

my $create =<<EOT;
CREATE TABLE $table (
name VARCHAR(64) CHARACTER SET utf8,
- bincol BLOB,
+ bincol MEDIUMTEXT,
shape GEOMETRY,
binutf VARCHAR(64) CHARACTER SET utf8 COLLATE utf8_bin
)
===
Result of such a test is https://gist.github.com/793182 ( sorry can't omlout
Adam here ).

As a fact, there are 3 ways to change the utf8 situation in your perl-mysql
application:
1. No any utf8 enablement. FCGI prints suich the texts without run-time errors
but the national characters are '??'.
2. 'Late' utf8. Can be turned on like the 55utf8.t does, by mean of
'mysql_enable_utf8' property of the dbh, OR the same happens by specifying the
same property in the attributes hash for the connect method. Characters are
correct but there is the \0, a zero byte in the resulted perl variable which is
forbidden by FCGI to print.

Both are correct approach but an incorrect utf8 behavior.

3. Specifying the 'mysql_enable_utf8=1;' in the DSN line solves all troubles.
This means to patch the existing applications, e. g., Bugzilla and perhaps RT.

This one is a correct behavior and an incorrect approach, at the least Bugzilla
is a proven to be good with mysql and utf8 application but it can't construct
sich a dsn for me without a patch. But I use right this way for ages in my apps
for myself ( since mysql-4.1, really ).

Hence all of those are the somewhat incorrect, who is responsible? Is it
correct to have that test with TEXT instead of a BLOB? That's why all of that
isn't an RT ticket yet. But anyway it is not documented that placing of
enable_utf8 option inside the connect() method does matter?

'set names utf8' and 'set character set utf8' aren't that helpful anymore.

DBD-mysql-4.017, mariadb-5.2.4, perl-5.12.2, DBI-1.615

Thank you.

ps. I'd like to have a diff between 4.017 and 4.018 from git, how can I do it?
There is no tags after 4.015 there. This may be a solved thing already, all of
this here.

7! Peter pgp: A0E26627 (4A42 6841 2871 5EA7 52AB 12F8 0CE1 4AAC A0E2 6627)
--
http://vereshagin.org

--
MySQL Perl Mailing List
For list archives: http://lists.mysql.com/perl
To unsubscribe: http://lists.mysql.com/perl?unsub=gcdmp-msql-mysql-modules@m .gmane.org

Report this message