utf8 behavior and approach

utf8 behavior and approach

am 24.01.2011 14:33:05 von Peter Vereshagin

Como esta, perl?

There are things about DBD::mysql those drive my mind to a wasty misuse. I was
about to file a Bug# on an RT but will try to find a better point in a
discussion.
Short version: how could I avoid zero bytes at the end of fetched utf8
mediumtext keeping the right approach to the connect method?

Full story is here:

FastCGI is a well-known environment for Perl apps. I myself have several
ready-to-use CGI applications in my FCGI::Spawn nice and easy.
It is well-known that FCGI.pm since ~0.69 version assumes no octets for its
replaced STDOUT to be printed. The hint for googling is: "Wide character in
FCGI::Stream::PRINT", for example,
http://lists.bestpractical.com/pipermail/rt-users/2010-July/ 065600.html
Same problem with another big application, the Bugzilla, which I believe has
the same of the only XS-style data sources: DBD::mysql.
Trying the t/55utf8.t I notice thet it's ok to put data into BLOB field and
back but not MEDIUMTEXT. Here is the patch for a test:
===
--- t/55utf8.t 2010-04-12 21:37:14.000000000 +0400
+++ t/55utf8.t.new 2011-01-24 15:17:05.000000000 +0300
@@ -28,14 +28,14 @@
plan skip_all =>
"SKIP TEST: You must have MySQL version 5.0 and greater for this test to run";
}
-plan tests => 15;
+plan tests => 14;

ok $dbh->do("DROP TABLE IF EXISTS $table");

my $create =< CREATE TABLE $table (
name VARCHAR(64) CHARACTER SET utf8,
- bincol BLOB,
+ bincol MEDIUMTEXT,
shape GEOMETRY,
binutf VARCHAR(64) CHARACTER SET utf8 COLLATE utf8_bin
)
===
Result of such a test is https://gist.github.com/793182 ( sorry can't omlout
Adam here ).

As a fact, there are 3 ways to change the utf8 situation in your perl-mysql
application:
1. No any utf8 enablement. FCGI prints suich the texts without run-time errors
but the national characters are '??'.
2. 'Late' utf8. Can be turned on like the 55utf8.t does, by mean of
'mysql_enable_utf8' property of the dbh, OR the same happens by specifying the
same property in the attributes hash for the connect method. Characters are
correct but there is the \0, a zero byte in the resulted perl variable which is
forbidden by FCGI to print.

Both are correct approach but an incorrect utf8 behavior.

3. Specifying the 'mysql_enable_utf8=1;' in the DSN line solves all troubles.
This means to patch the existing applications, e. g., Bugzilla and perhaps RT.

This one is a correct behavior and an incorrect approach, at the least Bugzilla
is a proven to be good with mysql and utf8 application but it can't construct
sich a dsn for me without a patch. But I use right this way for ages in my apps
for myself ( since mysql-4.1, really ).

Hence all of those are the somewhat incorrect, who is responsible? Is it
correct to have that test with TEXT instead of a BLOB? That's why all of that
isn't an RT ticket yet. But anyway it is not documented that placing of
enable_utf8 option inside the connect() method does matter?

'set names utf8' and 'set character set utf8' aren't that helpful anymore.

DBD-mysql-4.017, mariadb-5.2.4, perl-5.12.2, DBI-1.615

Thank you.

ps. I'd like to have a diff between 4.017 and 4.018 from git, how can I do it?
There is no tags after 4.015 there. This may be a solved thing already, all of
this here.

7! Peter pgp: A0E26627 (4A42 6841 2871 5EA7 52AB 12F8 0CE1 4AAC A0E2 6627)
--
http://vereshagin.org

--
MySQL Perl Mailing List
For list archives: http://lists.mysql.com/perl
To unsubscribe: http://lists.mysql.com/perl?unsub=gcdmp-msql-mysql-modules@m .gmane.org