DBD::Oracle - problem with UTF-8 charset and Oracle 8.1.7

am 15.02.2007 17:32:23 von hjp

--zS7rBR6csb6tI2e1
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

I think I've reported that problem before but we didn't get close
enough to fix the problem, so I'm taking another stab at it.

Environment:

Oracle server is 8.1.7.=20
Oracle client is 9.2 (same problem with 10g)
DBD::Oracle version 1.18
LANG=3Den_US.UTF-8
NLS_LANG=3D.UTF8

When a parameter has the UTF-8 bit set, the query fails with "ORA-12704:
character set mismatch". However, strings returned from queries do have
this bit set, so in a (rather stupid) sequence like:

my $text0 =3D "test";
my ($id1, $text1) =3D $dbh->selectrow_array("select id, t from ora817utf8bu=
g where t=3D?", {}, $text0);
print "$id1, $text1\n";
my ($id2, $text2) =3D $dbh->selectrow_array("select id, t from ora817utf8bu=
g where t=3D?", {}, $text1);
print "$id2, $text2\n";

the first selectrow_array works, and the second fails.

In a level 9 trace, the relevant (I hope) bits look like this:

1 <- prepare=3D DBI::st=3DHASH(0x9432f1c) at ./ora817utf8bug line 46=20
bind :p1 <== 'test' (type 0)
rebinding :p1 (not-utf8, ftype 1, csid 0, csform 0, inout 0)
bind :p1 <== 'test' (size 4/5/0, ptype 4, otype 1)
bind :p1 <== 'test' (size 4/4, otype 1, indp 0, at_exec 1)
OCIBindByName(94d3a20,95381fc,94cbd94,":p1",3,9537908,4,1,95 38214,0=
,953820c,0,0,2)=3DSUCCESS
OCIBindDynamic(94d3830,94cbd94,95381d8,499ab0,95381d8,499da0 )=3DSUC=
CESS
OCIAttrGet(94d3830,OCI_HTYPE_BIND,95381e8,0,31,94cbd94)=3DSU CCESS
bind :p1 <== 'test' (in, not-utf8, csid 871->0->871, ftype 1, cs=
form 0->0, maxlen 4, maxdata_size 0)
OCIAttrSet(94d3830,OCI_HTYPE_BIND,bfffb38a,0,31,94cbd94)=3DS UCCESS
dbd_st_execute SELECT (out0, lob0)...
in ':p1' [0,0]: len 4, ind 0
OCIStmtExecute(94cbd20,94d3a20,94cbd94,0,0,0,0,0)=3DSUCCESS
OCIAttrGet(94d3a20,OCI_HTYPE_STMT,bfffb4ba,0,10,94cbd94)=3DS UCCESS
dbd_st_execute SELECT returned (SUCCESS, rpc0, fn4, out0)
[...]
1 <- prepare=3D DBI::st=3DHASH(0x92507a4) at ./ora817utf8bug line 48
bind :p1 <== "test" (type 0)
rebinding :p1 (is-utf8, ftype 1, csid 0, csform 0, inout 0)
bind :p1 <== "test" (size 4/5/0, ptype 4, otype 1)
bind :p1 <== 'test' (size 4/4, otype 1, indp 0, at_exec 1)
OCIBindByName(94d3a20,95382f4,94cbd94,":p1",3,954d750,4,1,95 3830c,0=
,9538304,0,0,2)=3DSUCCESS
OCIBindDynamic(94d3830,94cbd94,95382d0,499ab0,95382d0,499da0 )=3DSUC=
CESS
rebinding :p1 with UTF8 value so setting csform=3DSQLCS_IMPLICIT O=
CIAttrSet(94d3830,OCI_HTYPE_BIND,bfffb383,0,32,94cbd94)=3DSU CCESS
OCIAttrGet(94d3830,OCI_HTYPE_BIND,95382e0,0,31,94cbd94)=3DSU CCESS
bind :p1 <== "test" (in, is-utf8, csid 871->0->871, ftype 1, csf=
orm 0->2, maxlen 4, maxdata_size 0)
OCIAttrSet(94d3830,OCI_HTYPE_BIND,bfffb38a,0,31,94cbd94)=3DS UCCESS
dbd_st_execute SELECT (out0, lob0)...
in ':p1' [0,0]: len 4, ind 0
OCIStmtExecute(94cbd20,94d3a20,94cbd94,0,0,0,0,0)=3DERROR
OCIAttrGet(94d3a20,OCI_HTYPE_STMT,bfffb3be,0,129,94cbd94)=3D SUCCESS
OCIErrorGet(94cbd94,1,"",bfffafec,"ORA-12704: character set m=
ismatch ",1024,2)=3DSUCCESS

AFAICS, the difference is the csform in the final bind. In the
successful case it's "csform 0->0", in the unsuccessful case it's
"csform 0->2". I assume that 2 is SQLCS_IMPLICIT mentioned 2 lines
above.

The code works if run against Oracle 9i or 10g servers, so I think that
maybe Oracle 8.1.7 doesn't support csform=3DSQLCS_IMPLICIT?

Here is a full script to demonstrate the problem (it tries to read the
parameters for connect ($data_source, $username, $auth) from a file
specified with --cred or the env. variable DBI_CREDENTIAL_FILE - I
normally keep such files in ~/.dbi)

#!/usr/local/bin/perl -w

=3Dhead1 NAME

ora817utf8bug - demonstrate bug with Oracle 8.1.7 and UTF8

=3Dhead1 SYNOPSIS

export LANG=3Den_US.UTF-8
export NLS_LANG=3D.UTF8
export DBI_TRACE=3D9

/ora817utf8bug

=3Dhead1 DESCRIPTION

=3Dcut

use strict;
use DBI;
use Getopt::Long;
use Pod::Usage;

my $help;
my $credfile =3D $ENV{DBI_CREDENTIAL_FILE} || "default";

GetOptions(
'help|?' =3D> \$help,
'credfile=3Ds' =3D> \$credfile,
) or pod2usage(2);
pod2usage(1) if $help;

my @cred =3D read_cred($credfile);
my $dbh =3D DBI->connect($cred[0], $cred[1], $cred[2], {RaiseError =3D> 1, =
AutoCommit =3D> 0});
$dbh->{FetchHashKeyName} =3D 'NAME_lc';

$dbh->{RaiseError} =3D 0;
$dbh->do("drop table ora817utf8bug");
$dbh->do("create table ora817utf8bug(id number, t varchar2(40))");
$dbh->{RaiseError} =3D 1;
$dbh->do("insert into ora817utf8bug(id, t) values(1, 'test')");

my $text0 =3D "test";
my ($id1, $text1) =3D $dbh->selectrow_array("select id, t from ora817utf8bu=
g where t=3D?", {}, $text0);
print "$id1, $text1\n";
my ($id2, $text2) =3D $dbh->selectrow_array("select id, t from ora817utf8bu=
g where t=3D?", {}, $text1);
print "$id2, $text2\n";
$dbh->do("drop table ora817utf8bug");

$dbh->disconnect();

sub read_cred {
my ($fn) =3D @_;

if ($fn !~ m{/}) {
$fn =3D "$ENV{HOME}/.dbi/$fn";
}
open(FN, "<$fn") or die "cannot open $fn: $!";
my $line =3D ;=20
close(FN);
my @cred =3D split(/[\s\n]/, $line);=20
return @cred;
}

--=20
_ | Peter J. Holzer | If I wanted to be "academically correct",
|_|_) | Sysadmin WSR | I'd be programming in Java.
| | | hjp@wsr.ac.at | I don't, and I'm not.
__/ | http://www.hjp.at/ | -- Jesse Erlbaum on dbi-users

--zS7rBR6csb6tI2e1
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iQDQAwUBRdSLF1LjemazOuKpAQLngAXTBzbU4+iztNpgxaxLgkQ6P4REZ/+F 0IcP
4owLciuxbJ/O+kAq65NlkumzMJaA/h2AwkXoVlKX2qLES3Ugfa1jDkPanKDU xkZU
mxKuSwnzJa4OvP7jC2aJQ6m42tsbLYYpbUB+zRi/8e3g0NMBIPjnwQ74XqGd jOUo
JzeSPIclHvtSCzKT5OW60dW7qLtHum/gEJot8k/z2phrDrrPwmUnCAVb9Cl2 hXfg
/06ddCEzZ+RVGuSmQpqGBjqO8A==
=XkaF
-----END PGP SIGNATURE-----

--zS7rBR6csb6tI2e1--

Re: DBD::Oracle - problem with UTF-8 charset and Oracle 8.1.7

am 15.02.2007 18:45:31 von Alexander

http://search.cpan.org/~pythian/DBD-Oracle-1.19/Oracle.pm#Or acle_and_Unicode
says roughly that Oracle 8 and Unicode don't mix well. If possible, try
to upgrade the Oracle server to at least 9.2 and use AL32UTF32 as charset.

(I'm not an Oracle expert. I'm just using it occasionally.)

Alexander

On 15.02.2007 17:32, Peter J. Holzer wrote:
> I think I've reported that problem before but we didn't get close
> enough to fix the problem, so I'm taking another stab at it.
>
> Environment:
>
> Oracle server is 8.1.7.
> Oracle client is 9.2 (same problem with 10g)
> DBD::Oracle version 1.18
> LANG=en_US.UTF-8
> NLS_LANG=.UTF8
>
> When a parameter has the UTF-8 bit set, the query fails with "ORA-12704:
> character set mismatch". However, strings returned from queries do have
> this bit set, so in a (rather stupid) sequence like:
>
> my $text0 = "test";
> my ($id1, $text1) = $dbh->selectrow_array("select id, t from ora817utf8bug where t=?", {}, $text0);
> print "$id1, $text1\n";
> my ($id2, $text2) = $dbh->selectrow_array("select id, t from ora817utf8bug where t=?", {}, $text1);
> print "$id2, $text2\n";
>
> the first selectrow_array works, and the second fails.
>
> In a level 9 trace, the relevant (I hope) bits look like this:
>
> 1 <- prepare= DBI::st=HASH(0x9432f1c) at ./ora817utf8bug line 46
> bind :p1 <== 'test' (type 0)
> rebinding :p1 (not-utf8, ftype 1, csid 0, csform 0, inout 0)
> bind :p1 <== 'test' (size 4/5/0, ptype 4, otype 1)
> bind :p1 <== 'test' (size 4/4, otype 1, indp 0, at_exec 1)
> OCIBindByName(94d3a20,95381fc,94cbd94,":p1",3,9537908,4,1,95 38214,0,953820c,0,0,2)=SUCCESS
> OCIBindDynamic(94d3830,94cbd94,95381d8,499ab0,95381d8,499da0 )=SUCCESS
> OCIAttrGet(94d3830,OCI_HTYPE_BIND,95381e8,0,31,94cbd94)=SUCC ESS
> bind :p1 <== 'test' (in, not-utf8, csid 871->0->871, ftype 1, csform 0->0, maxlen 4, maxdata_size 0)
> OCIAttrSet(94d3830,OCI_HTYPE_BIND,bfffb38a,0,31,94cbd94)=SUC CESS
> dbd_st_execute SELECT (out0, lob0)...
> in ':p1' [0,0]: len 4, ind 0
> OCIStmtExecute(94cbd20,94d3a20,94cbd94,0,0,0,0,0)=SUCCESS
> OCIAttrGet(94d3a20,OCI_HTYPE_STMT,bfffb4ba,0,10,94cbd94)=SUC CESS
> dbd_st_execute SELECT returned (SUCCESS, rpc0, fn4, out0)
> [...]
> 1 <- prepare= DBI::st=HASH(0x92507a4) at ./ora817utf8bug line 48
> bind :p1 <== "test" (type 0)
> rebinding :p1 (is-utf8, ftype 1, csid 0, csform 0, inout 0)
> bind :p1 <== "test" (size 4/5/0, ptype 4, otype 1)
> bind :p1 <== 'test' (size 4/4, otype 1, indp 0, at_exec 1)
> OCIBindByName(94d3a20,95382f4,94cbd94,":p1",3,954d750,4,1,95 3830c,0,9538304,0,0,2)=SUCCESS
> OCIBindDynamic(94d3830,94cbd94,95382d0,499ab0,95382d0,499da0 )=SUCCESS
> rebinding :p1 with UTF8 value so setting csform=SQLCS_IMPLICIT OCIAttrSet(94d3830,OCI_HTYPE_BIND,bfffb383,0,32,94cbd94)=SUC CESS
> OCIAttrGet(94d3830,OCI_HTYPE_BIND,95382e0,0,31,94cbd94)=SUCC ESS
> bind :p1 <== "test" (in, is-utf8, csid 871->0->871, ftype 1, csform 0->2, maxlen 4, maxdata_size 0)
> OCIAttrSet(94d3830,OCI_HTYPE_BIND,bfffb38a,0,31,94cbd94)=SUC CESS
> dbd_st_execute SELECT (out0, lob0)...
> in ':p1' [0,0]: len 4, ind 0
> OCIStmtExecute(94cbd20,94d3a20,94cbd94,0,0,0,0,0)=ERROR
> OCIAttrGet(94d3a20,OCI_HTYPE_STMT,bfffb3be,0,129,94cbd94)=SU CCESS
> OCIErrorGet(94cbd94,1,"",bfffafec,"ORA-12704: character set mismatch ",1024,2)=SUCCESS
>
> AFAICS, the difference is the csform in the final bind. In the
> successful case it's "csform 0->0", in the unsuccessful case it's
> "csform 0->2". I assume that 2 is SQLCS_IMPLICIT mentioned 2 lines
> above.
>
> The code works if run against Oracle 9i or 10g servers, so I think that
> maybe Oracle 8.1.7 doesn't support csform=SQLCS_IMPLICIT?
>
> Here is a full script to demonstrate the problem (it tries to read the
> parameters for connect ($data_source, $username, $auth) from a file
> specified with --cred or the env. variable DBI_CREDENTIAL_FILE - I
> normally keep such files in ~/.dbi)
>
> #!/usr/local/bin/perl -w
>
> =head1 NAME
>
> ora817utf8bug - demonstrate bug with Oracle 8.1.7 and UTF8
>
> =head1 SYNOPSIS
>
> export LANG=en_US.UTF-8
> export NLS_LANG=.UTF8
> export DBI_TRACE=9
>
> ./ora817utf8bug
>
> =head1 DESCRIPTION
>
> =cut
>
> use strict;
> use DBI;
> use Getopt::Long;
> use Pod::Usage;
>
>
> my $help;
> my $credfile = $ENV{DBI_CREDENTIAL_FILE} || "default";
>
> GetOptions(
> 'help|?' => \$help,
> 'credfile=s' => \$credfile,
> ) or pod2usage(2);
> pod2usage(1) if $help;
>
> my @cred = read_cred($credfile);
> my $dbh = DBI->connect($cred[0], $cred[1], $cred[2], {RaiseError => 1, AutoCommit => 0});
> $dbh->{FetchHashKeyName} = 'NAME_lc';
>
> $dbh->{RaiseError} = 0;
> $dbh->do("drop table ora817utf8bug");
> $dbh->do("create table ora817utf8bug(id number, t varchar2(40))");
> $dbh->{RaiseError} = 1;
> $dbh->do("insert into ora817utf8bug(id, t) values(1, 'test')");
>
>
> my $text0 = "test";
> my ($id1, $text1) = $dbh->selectrow_array("select id, t from ora817utf8bug where t=?", {}, $text0);
> print "$id1, $text1\n";
> my ($id2, $text2) = $dbh->selectrow_array("select id, t from ora817utf8bug where t=?", {}, $text1);
> print "$id2, $text2\n";
> $dbh->do("drop table ora817utf8bug");
>
> $dbh->disconnect();
>
>
>
> sub read_cred {
> my ($fn) = @_;
>
> if ($fn !~ m{/}) {
> $fn = "$ENV{HOME}/.dbi/$fn";
> }
> open(FN, "<$fn") or die "cannot open $fn: $!";
> my $line = ;
> close(FN);
> my @cred = split(/[\s\n]/, $line);
> return @cred;
> }
>
>
>
>

--
Alexander Foken
mailto:alexander@foken.de http://www.foken.de/alexander/

Re: DBD::Oracle - problem with UTF-8 charset and Oracle 8.1.7

am 15.02.2007 20:45:18 von hjp

--cNdxnHkX5QqsyA0e
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2007-02-15 18:45:31 +0100, Alexander Foken wrote:
> http://search.cpan.org/~pythian/DBD-Oracle-1.19/Oracle.pm#Or acle_and_Unic=
ode=20
> says roughly that Oracle 8 and Unicode don't mix well.

| Oracle 8 client libraries have a number of bugs related to character
| set handling, especially when connected to an Oracle 9+ server.

However, I have the opposite situation: I have 9.2 client libraries and
an 8.1 server.

> If possible, try to upgrade the Oracle server to at least 9.2 and use
> AL32UTF32 as charset.

Unfortunately, that's not an option right now. I do have a number of
workarounds: I can either use another charset than UTF-8 (the database
uses WE8MSWIN1252 anyway) or I can explicitely encode() all strings
passed to DBI. Neither is particularly appealing, and I'd rather fix
the problem than program workarounds.

hp

--=20
_ | Peter J. Holzer | If I wanted to be "academically correct",
|_|_) | Sysadmin WSR | I'd be programming in Java.
| | | hjp@wsr.ac.at | I don't, and I'm not.
__/ | http://www.hjp.at/ | -- Jesse Erlbaum on dbi-users

--cNdxnHkX5QqsyA0e
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iQDQAwUBRdS4TlLjemazOuKpAQJLiQXSAyR7e2Ys0pKXa1iBNFOYAhI1aZf5 cXFy
DSf1IWs5mz6n6ozuGgHz3k1RSIyNxO8hvj+fVpRTB8ta0rFmIoc6xbFok9pU ioZN
H0y6eYljU16QViM0PneUvcxblNK1lp66P7w/iFo5XHJOjkGKPI+XnUqcYzy7 yftz
xK5gnVwaZ8VxdoiHknTJone76Y1IeJ73jath8AfYu1pls+jYWgmFMQ+RWrnj hfAx
ep6UL9ts7YNXJ61wwUw59oAYkQ==
=RDfq
-----END PGP SIGNATURE-----

--cNdxnHkX5QqsyA0e--