Yahoo Gmail Google Facebook Delicious Twitter Reddit Stumpleupon Myspace Digg

Search queries

wwwxxxAPC, How to unsubscrube from dategen spam, WWWXXXAPC, docmd.close 2585, WWWXXXDOCO, nu vot, dhcpd lease file "binding state", WWWXXXDOCO, how to setup procmail to process html2text, how to setup procmail html2text



#1: utf8 behavior and approach

Posted on 2011-01-24 14:33:05 by Peter Vereshagin

Como esta, perl?

There are things about DBD::mysql those drive my mind to a wasty misuse. I was
about to file a Bug# on an RT but will try to find a better point in a
Short version: how could I avoid zero bytes at the end of fetched utf8
mediumtext keeping the right approach to the connect method?

Full story is here:

FastCGI is a well-known environment for Perl apps. I myself have several
ready-to-use CGI applications in my FCGI::Spawn nice and easy.
It is well-known that since ~0.69 version assumes no octets for its
replaced STDOUT to be printed. The hint for googling is: "Wide character in
FCGI::Stream::PRINT", for example, 065600.html
Same problem with another big application, the Bugzilla, which I believe has
the same of the only XS-style data sources: DBD::mysql.
Trying the t/55utf8.t I notice thet it's ok to put data into BLOB field and
back but not MEDIUMTEXT. Here is the patch for a test:
--- t/55utf8.t 2010-04-12 21:37:14.000000000 +0400
+++ t/ 2011-01-24 15:17:05.000000000 +0300
@@ -28,14 +28,14 @@
plan skip_all =>
"SKIP TEST: You must have MySQL version 5.0 and greater for this test to run";
-plan tests => 15;
+plan tests => 14;

ok $dbh->do("DROP TABLE IF EXISTS $table");

my $create =<<EOT;
- bincol BLOB,
+ bincol MEDIUMTEXT,
binutf VARCHAR(64) CHARACTER SET utf8 COLLATE utf8_bin
Result of such a test is ( sorry can't omlout
Adam here ).

As a fact, there are 3 ways to change the utf8 situation in your perl-mysql
1. No any utf8 enablement. FCGI prints suich the texts without run-time errors
but the national characters are '??'.
2. 'Late' utf8. Can be turned on like the 55utf8.t does, by mean of
'mysql_enable_utf8' property of the dbh, OR the same happens by specifying the
same property in the attributes hash for the connect method. Characters are
correct but there is the \0, a zero byte in the resulted perl variable which is
forbidden by FCGI to print.

Both are correct approach but an incorrect utf8 behavior.

3. Specifying the 'mysql_enable_utf8=1;' in the DSN line solves all troubles.
This means to patch the existing applications, e. g., Bugzilla and perhaps RT.

This one is a correct behavior and an incorrect approach, at the least Bugzilla
is a proven to be good with mysql and utf8 application but it can't construct
sich a dsn for me without a patch. But I use right this way for ages in my apps
for myself ( since mysql-4.1, really ).

Hence all of those are the somewhat incorrect, who is responsible? Is it
correct to have that test with TEXT instead of a BLOB? That's why all of that
isn't an RT ticket yet. But anyway it is not documented that placing of
enable_utf8 option inside the connect() method does matter?

'set names utf8' and 'set character set utf8' aren't that helpful anymore.

DBD-mysql-4.017, mariadb-5.2.4, perl-5.12.2, DBI-1.615

Thank you.

ps. I'd like to have a diff between 4.017 and 4.018 from git, how can I do it?
There is no tags after 4.015 there. This may be a solved thing already, all of
this here.

7! Peter pgp: A0E26627 (4A42 6841 2871 5EA7 52AB 12F8 0CE1 4AAC A0E2 6627)

MySQL Perl Mailing List
For list archives:
To unsubscribe:

Report this message