Win32::OLE->Option("CP") - encoding bug?

Win32::OLE->Option("CP") - encoding bug?

am 13.10.2010 14:02:28 von Michael.Ludwig

SSB0aGluayBJJ3ZlIGhhZCB0aGUgbWlzZm9ydHVuZSB0byBydW4gaW50byBh IGJ1ZyBpbiB0aGUg
V2luMzI6Ok9MRQ0KZW5jb2RpbmcgbG9naWMsIG9yIHBvc3NpYmx5IGF0IGEg ZGVlcGVyIGxheWVy
LiBCdXQgbWF5YmUgdGhlcmUgaXMNCnNvbWV0aGluZyB0aGF0IEkgaGF2ZSBu b3QgeWV0IHVuZGVy
c3Rvb2QgYWJvdXQgdGhpcyBpc3N1ZS4NCg0KU28gd2hhdCBpcyBpdCBhYm91 dD8NCg0KT25lIG9m
IHRoZSBvcHRpb25zIGZvciB0aGUgV2luMzI6Ok9MRSBtb2R1bGUgaXMgIkNQ IiwgdGhlIGNvZGVw
YWdlLg0KVGhlIGRvY3VtZW50YXRpb24gaGFzIHRoZSBmb2xsb3dpbmcgdG8g c2F5IGFib3V0IHRo
aXMgb3B0aW9uOg0KDQogIFRoaXMgdmFyaWFibGUgaXMgdXNlZCB0byBkZXRl cm1pbmUgdGhlIGNv
ZGVwYWdlIHVzZWQgYnkgYWxsDQogIHRyYW5zbGF0aW9ucyBiZXR3ZWVuIFBl cmwgc3RyaW5ncyBh
bmQgVW5pY29kZSBzdHJpbmdzIHVzZWQgYnkNCiAgdGhlIE9MRSBpbnRlcmZh Y2UuIFRoZSBkZWZh
dWx0IHZhbHVlIGlzIENQX0FDUCwgd2hpY2ggaXMgdGhlDQogIGRlZmF1bHQg QU5TSSBjb2RlcGFn
ZS4gT3RoZXIgcG9zc2libGUgdmFsdWVzIGFyZSBDUF9PRU1DUCwNCiAgQ1Bf TUFDQ1AsIENQX1VU
RjcgYW5kIENQX1VURjguIFRoZXNlIGNvbnN0YW50cyBhcmUgbm90IGV4cG9y dGVkDQogIGJ5IGRl
ZmF1bHQuDQoNCkxldCdzIHNlZSBob3cgdGhpcyB3b3JrcyBpbiBwcmFjdGlj ZSB1c2luZyBzb21l
IGNvZGUgaW52b2x2aW5nDQp0aGUgTWljcm9zb2Z0IFhNTCBsaWJyYXJ5ICht c3htbDYuZGxsKToN
Cg0KICAgICAgICAgIFwsLCwvDQogICAgICAgICAgKG8gbykNCi0tLS0tLW9P T28tKF8pLW9PT28t
LS0tLS0NCnVzZSBzdHJpY3Q7DQp1c2Ugd2FybmluZ3M7DQp1c2UgdXRmODsN CnVzZSBXaW4zMjo6
T0xFOw0KbXkgJFA1NiA9ICRdIDwgNS4wMDg7DQpyZXF1aXJlIEVuY29kZSB1 bmxlc3MgJFA1NjsN
Cg0Kc3ViIGFkZCB7DQogIG15KCAkZG9tLCAkdHh0ICkgPSBAXzsNCiAgbXkg JG5vZGUgPSAkZG9t
LT5jcmVhdGVFbGVtZW50KCAnRScgKTsNCiAgJG5vZGUtPmFwcGVuZENoaWxk KCAkZG9tLT5jcmVh
dGVUZXh0Tm9kZSggJHR4dCApICk7DQogICRkb20tPmRvY3VtZW50RWxlbWVu dC0+YXBwZW5kQ2hp
bGQoICRub2RlICk7DQogIG15ICRyZXQgPSAkZG9tLT54bWw7DQogICMgVGhl IHJldHVybiB2YWx1
ZSBpcyB1bnJlbGlhYmxlLiBMb29rcyBsaWtlIHlvdSdsbCBnZXQgVW5pY29k ZQ0KICAjIGNoYXJh
Y3RlcnMgb3IgbGVnYWN5IGJ5dGVzIGRlcGVuZGluZyBvbiBjb250ZW50LiBZ b3UgY2FuIGZpeCB0
aGlzDQogICMgc3RhcnRpbmcgZnJvbSBQZXJsIDUuOC4gQnV0IHNob3VsZCB5 b3UgaGF2ZSB0bz8N
CiAgJHJldCA9IEVuY29kZTo6ZW5jb2RlX3V0ZjgoICRyZXQgKSB1bmxlc3Mg JFA1NjsgIyBmb3Jj
ZSBvY3RldHMNCiAgcHJpbnQgJHJldDsNCn0NCg0KV2luMzI6Ok9MRS0+T3B0 aW9uKCBDUCA9PiBX
aW4zMjo6T0xFOjpDUF9VVEY4ICk7ICMgVVRGLTgsIHBsZWFzZQ0KDQojIGJp bm1vZGUgU1RET1VU
LCAnOnV0ZjgnIHVubGVzcyAkUDU2Ow0KbXkgJHhtbCA9ICc8VS8+JzsNCiMg JHhtbCA9ICc8P3ht
bCB2ZXJzaW9uPSIxLjAiIGVuY29kaW5nPSJ1dGYtOCI/PicgLiAkeG1sOyAj IG5vIHVzZQ0KbXkg
JGRvbSA9IFdpbjMyOjpPTEUtPm5ldyggJ01zeG1sMi5ET01Eb2N1bWVudC42 LjAnICk7DQokZG9t
LT5sb2FkWE1MKCAkeG1sICk7DQphZGQgJGRvbSwgJ2VpbnMnOw0KYWRkICRk b20sICdibMO2ZCc7
DQphZGQgJGRvbSwgJ3dlacOfJzsNCmFkZCAkZG9tLCAnY2Fmw6knOyAgICAg ICAjIHN0aWxsIExh
dGluMSwgbm8gVVRGLTggZW5jb2RpbmcNCmFkZCAkZG9tLCAnzrHOvc61z4HO s86vzrEnOyAgICAj
IEdyZWVrIG9yIFJ1c3NpYW4gZm9yY2UgVVRGLTggZW5jb2RpbmcNCmFkZCAk ZG9tLCAn0YPQsdC4
0YInOw0KLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0KDQpBIGRhdGEtZGVw ZW5kZW50IHJldHVy
biB2YWx1ZSBlbmNvZGluZyBpcyBkaWZmaWN1bHQgdG8gd29yayB3aXRoLg0K DQpJcyB0aGlzIGEg
YnVnLCBvciBhbiBvdmVyc2lnaHQgb3IgbWlzdW5kZXJzdGFuZGluZyBvbiBt eSBiZWhhbGY/DQot
LSANCk1pY2hhZWwgTHVkd2lnIA0KX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19f
X19fX19fX19fX18KQWN0aXZlUGVybCBtYWlsaW5nIGxpc3QKQWN0aXZlUGVy bEBsaXN0c2Vydi5B
Y3RpdmVTdGF0ZS5jb20KVG8gdW5zdWJzY3JpYmU6IGh0dHA6Ly9saXN0c2Vy di5BY3RpdmVTdGF0
ZS5jb20vbWFpbG1hbi9teXN1YnM=

Re: Win32::OLE->Option("CP") - encoding bug?

am 13.10.2010 21:31:50 von Michael Ludwig

THVkd2lnLCBNaWNoYWVsIHNjaHJpZWIgYW0gMTMuMTAuMjAxMCB1bSAxNDow MiAoKzAyMDApOgo+
IEkgdGhpbmsgSSd2ZSBoYWQgdGhlIG1pc2ZvcnR1bmUgdG8gcnVuIGludG8g YSBidWcgaW4gdGhl
IFdpbjMyOjpPTEUKPiBlbmNvZGluZyBsb2dpYywgb3IgcG9zc2libHkgYXQg YSBkZWVwZXIgbGF5
ZXIuIEJ1dCBtYXliZSB0aGVyZSBpcwo+IHNvbWV0aGluZyB0aGF0IEkgaGF2 ZSBub3QgeWV0IHVu
ZGVyc3Rvb2QgYWJvdXQgdGhpcyBpc3N1ZS4KCj4gTGV0J3Mgc2VlIGhvdyB0 aGlzIHdvcmtzIGlu
IHByYWN0aWNlIHVzaW5nIHNvbWUgY29kZSBpbnZvbHZpbmcKPiB0aGUgTWlj cm9zb2Z0IFhNTCBs
aWJyYXJ5IChtc3htbDYuZGxsKToKCj4gYWRkICRkb20sICdlaW5zJzsKPiBh ZGQgJGRvbSwgJ2Js
w7ZkJzsKPiBhZGQgJGRvbSwgJ3dlacOfJzsKPiBhZGQgJGRvbSwgJ2NhZsOp JzsgICAgICAgIyBz
dGlsbCBMYXRpbjEsIG5vIFVURi04IGVuY29kaW5nCj4gYWRkICRkb20sICfO sc69zrXPgc6zzq/O
sSc7ICAgICMgR3JlZWsgb3IgUnVzc2lhbiBmb3JjZSBVVEYtOCBlbmNvZGlu Zwo+IGFkZCAkZG9t
LCAn0YPQsdC40YInOwo+IC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KPiAK PiBBIGRhdGEtZGVw
ZW5kZW50IHJldHVybiB2YWx1ZSBlbmNvZGluZyBpcyBkaWZmaWN1bHQgdG8g d29yayB3aXRoLgoK
SnVzdCB0byBwcm92ZSB0aGlzIGlzICpub3QqIHJlbGF0ZWQgdG8gTVNYTUwg KGFuZCB3aG8gd291
bGQgaGF2ZSB0aG91Z2h0Cml0IHdhcz8pLCBoZXJlJ3MgYSByZXBybyBvZiB0 aGUgUGVybCBzY3Jp
cHQgSSBwb3N0ZWQgZWFybGllciBpbiBKU2NyaXB0CihXU0YpLCB3aGljaCBk b2VzICpub3QqIGV4
aGliaXQgdGhlIFBlcmwgb3IgV2luMzI6Ok9MRSBiZWhhdmlvdXIgb2YKb3Bw b3J0dW5pc3RpY2Fs
bHkgc3dpdGNoaW5nIHRoZSBvdXRwdXQgdG8gYW5vdGhlciBlbmNvZGluZywg YnV0IHdvcmtzCmNv
bW1lIGlsIGZhdXQ6CgogICAgICAgICAgXCwsLC8KICAgICAgICAgIChvIG8p Ci0tLS0tLW9PT28t
KF8pLW9PT28tLS0tLS0KPD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0i dXRmLTgiPz4KPGpv
Yj4KPHNjcmlwdCBsYW5ndWFnZT0iSmF2YVNjcmlwdCI+CmZ1bmN0aW9uIGFk ZCggZG9tLCB0eHQp
IHsKICB2YXIgbm9kZSA9IGRvbS5jcmVhdGVFbGVtZW50KCAnRScgKTsKICBu b2RlLmFwcGVuZENo
aWxkKCBkb20uY3JlYXRlVGV4dE5vZGUoIHR4dCApICk7CiAgZG9tLmRvY3Vt ZW50RWxlbWVudC5h
cHBlbmRDaGlsZCggbm9kZSApOwogIHZhciByZXQgPSBkb20ueG1sOwogIFdT Y3JpcHQuZWNobygg
cmV0ICk7Cn0KCnZhciB4bWwgPSAnJmx0O1UvJmd0Oyc7CnZhciBkb20gPSBX U2NyaXB0LmNyZWF0
ZU9iamVjdCggJ01zeG1sMi5ET01Eb2N1bWVudC42LjAnICk7CmRvbS5sb2Fk WE1MKCB4bWwgKTsK
YWRkKGRvbSwgJ2VpbnMnKTsKYWRkKGRvbSwgJ2Jsw7ZkJyk7CmFkZChkb20s ICd3ZWnDnycpOwph
ZGQoZG9tLCAnY2Fmw6knKTsKYWRkKGRvbSwgJ86xzr3Otc+BzrPOr86xJyk7 CmFkZChkb20sICfR
g9Cx0LjRgicpOwo8L3NjcmlwdD4KPC9qb2I+Ci0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0KClQ6
XE1pTHUgOjogY3NjcmlwdCBzei53c2YKPFU+PEU+ZWluczwvRT48L1U+Cgo8 VT48RT5laW5zPC9F
PjxFPmJsw7ZkPC9FPjwvVT4KCjxVPjxFPmVpbnM8L0U+PEU+YmzDtmQ8L0U+ PEU+d2Vpw588L0U+
PC9VPgoKPFU+PEU+ZWluczwvRT48RT5ibMO2ZDwvRT48RT53ZWnDnzwvRT48 RT5jYWbDqTwvRT48
L1U+Cgo8VT48RT5laW5zPC9FPjxFPmJsw7ZkPC9FPjxFPndlacOfPC9FPjxF PmNhZsOpPC9FPjxF
Ps6xzr3Otc+BzrPOr86xPC9FPjwvVT4KCjxVPjxFPmVpbnM8L0U+PEU+YmzD tmQ8L0U+PEU+d2Vp
w588L0U+PEU+Y2Fmw6k8L0U+PEU+zrHOvc61z4HOs86vzrE8L0U+PEU+0YPQ sdC40YI8L0U+PC9V
PgoKLS0gCk1pY2hhZWwgTHVkd2lnCl9fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19f
X19fX19fX19fX19fCkFjdGl2ZVBlcmwgbWFpbGluZyBsaXN0CkFjdGl2ZVBl cmxAbGlzdHNlcnYu
QWN0aXZlU3RhdGUuY29tClRvIHVuc3Vic2NyaWJlOiBodHRwOi8vbGlzdHNl cnYuQWN0aXZlU3Rh
dGUuY29tL21haWxtYW4vbXlzdWJz

Re: Win32::OLE->Option("CP") - encoding bug?

am 13.10.2010 21:58:47 von Michael Ludwig

TWljaGFlbCBMdWR3aWcgc2NocmllYiBhbSAxMy4xMC4yMDEwIHVtIDIxOjMx ICgrMDIwMCk6Cgo+
IEp1c3QgdG8gcHJvdmUgdGhpcyBpcyAqbm90KiByZWxhdGVkIHRvIE1TWE1M IChhbmQgd2hvIHdv
dWxkIGhhdmUKPiB0aG91Z2h0IGl0IHdhcz8pLCBoZXJlJ3MgYSByZXBybyBv ZiB0aGUgUGVybCBz
Y3JpcHQgSSBwb3N0ZWQgZWFybGllcgo+IGluIEpTY3JpcHQgKFdTRiksIHdo aWNoIGRvZXMgKm5v
dCogZXhoaWJpdCB0aGUgUGVybCBvciBXaW4zMjo6T0xFCj4gYmVoYXZpb3Vy IG9mIG9wcG9ydHVu
aXN0aWNhbGx5IHN3aXRjaGluZyB0aGUgb3V0cHV0IHRvIGFub3RoZXIKPiBl bmNvZGluZywgYnV0
IHdvcmtzIGNvbW1lIGlsIGZhdXQ6Cgo+IFQ6XE1pTHUgOjogY3NjcmlwdCBz ei53c2YKPiA8VT48
RT5laW5zPC9FPjwvVT4KPiAKPiA8VT48RT5laW5zPC9FPjxFPmJsw7ZkPC9F PjwvVT4KPiAKPiA8
VT48RT5laW5zPC9FPjxFPmJsw7ZkPC9FPjxFPndlacOfPC9FPjwvVT4KPiAK PiA8VT48RT5laW5z
PC9FPjxFPmJsw7ZkPC9FPjxFPndlacOfPC9FPjxFPmNhZsOpPC9FPjwvVT4K PiAKPiA8VT48RT5l
aW5zPC9FPjxFPmJsw7ZkPC9FPjxFPndlacOfPC9FPjxFPmNhZsOpPC9FPjxF Ps6xzr3Otc+BzrPO
r86xPC9FPjwvVT4KClRoaXMgbWlnaHQgbm90IHByb3ZlIGFueXRoaW5nLCBh ZnRlciBhbGw7IGNz
Y3JpcHQgcGx1cyB0aGUgV2luZG93cwpjb25zb2xlIG1pZ2h0IGJlIHBsYXlp bmcgY2xldmVyIHRy
aWNrcyB0byBtYWtlIHRoZSBvdXRwdXQgbG9vayByaWdodCwKYW5kIGZyYW5r bHksIEkgaGF2ZSBu
byBpZGVhIGhvdyB0aGF0IGNvbWJvIHdvcmtzIChJJ2QgbGlrZSB0byBrbm93 LAp0aG91Z2gpLCBi
dXQgaXQgc2VlbXMgdG8gYmUgcmF0aGVyIGNsZXZlcjogdGhlIG91dHB1dCBs b29rcyBjb3JyZWN0
CnJlZ2FyZGxlc3Mgb2YgdGhlIENIQ1Agc2V0dGluZyBpbiBteSBjbWQuZXhl IGNvbnNvbGUgd2lu
ZG93LgoKU28gbXkgSlNjcmlwdC9XU0YgZXhwZXJpbWVudCBkb2Vzbid0IHJl YWxseSBzdXBwb3J0
IHRoZSBwb2ludCBJJ20gdHJ5aW5nCnRvIG1ha2UgKHRoYXQgUGVybCBvciBX aW4zMjo6T0xFIGFy
ZSB0byBibGFtZSksIGJ1dCBJIHN0aWxsIHByZXN1bWUgdGhhdApwb2ludCB0 byBiZSBjb3JyZWN0
OiBKU2NyaXB0IGdldHMgaXQgcmlnaHQsIGFuZCBQZXJsIGRvZXNuJ3QuCgot LSAKTWljaGFlbCBM
dWR3aWcKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX18KQWN0
aXZlUGVybCBtYWlsaW5nIGxpc3QKQWN0aXZlUGVybEBsaXN0c2Vydi5BY3Rp dmVTdGF0ZS5jb20K
VG8gdW5zdWJzY3JpYmU6IGh0dHA6Ly9saXN0c2Vydi5BY3RpdmVTdGF0ZS5j b20vbWFpbG1hbi9t
eXN1YnM=

RE: Win32::OLE->Option("CP") - encoding bug?

am 13.10.2010 23:50:32 von Jan Dubois

> my $ret = $dom->xml;
> # The return value is unreliable. Looks like you'll get Unicode
> # characters or legacy bytes depending on content. You can fix this
> # starting from Perl 5.8. But should you have to?

You do get a string of characters. The characters may be UTF-8 encoded
internally when they cannot be represented by the ANSI codepage.

In general this should not really matter, as Perl can upgrade/downgrade
encodings internally as it sees fit. The one problem of course is that
Win32::OLE uses CP_ACP for the regular encoding whereas Perl internals
use Latin1, so any code points where CP_ACP is different from Latin1
will get mangled.

The downgrading of results to CP_ACP is probably a mistake; I can't
see how this would ever be useful. It helps scripts that don't know
how to deal with Unicode strings, but those shouldn't ask for CP_UTF8
results in the first place.

The internal confusion between Latin1 and CP_ACP is harder to deal
with: the core text functions all assume Latin1, and the filesystem
APIs all assume CP_ACP. So if we were to fix this to always assume
Latin1 internally, then all scripts that read filenames from backticks/
qx(), or receive them from GUI dialogs, or read them from ANSI encoded
text files will break unless they convert them to Latin1 explicitly.

Maybe that breakage is necessary eventually, but it won't happen for
Perl 5.14, so any change there is a long way off.

> A data-dependent return value encoding is difficult to work with.

Ignoring the CP_ACP/Latin1 issue, why does it matter which internal
encoding is used for your strings?

You commented out the line that put STDOUT into Unicode mode:

# binmode STDOUT, ':utf8' unless $P56;

But if you re-activate the line, then you will see that the characters
are written out the same way, regardless of the way they have been
encoded internally.

Cheers,
-Jan


_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Win32::OLE->Option("CP") - encoding bug?

am 14.10.2010 01:34:19 von Michael Ludwig

[RE: Win32::OLE->Option('CP') - encoding bug?]
Jan Dubois schrieb am 13.10.2010 um 14:50 (-0700):
> > my $ret = $dom->xml;
> > # The return value is unreliable. Looks like you'll get Unicode
> > # characters or legacy bytes depending on content. You can fix this
> > # starting from Perl 5.8. But should you have to?
>
> You do get a string of characters. The characters may be UTF-8 encoded
> internally when they cannot be represented by the ANSI codepage.
>
> In general this should not really matter, as Perl can upgrade/downgrade
> encodings internally as it sees fit. The one problem of course is that
> Win32::OLE uses CP_ACP for the regular encoding whereas Perl internals
> use Latin1, so any code points where CP_ACP is different from Latin1
> will get mangled.

This concerns 27 characters - more than I thought. Here's a listing of
the differences of CP1252 aka CP_ACP vs Latin1 aka ISO-8859-1:

128 80 = U+20AC : EURO SIGN
130 82 = U+201A : SINGLE LOW-9 QUOTATION MARK
131 83 = U+0192 : LATIN SMALL LETTER F WITH HOOK
132 84 = U+201E : DOUBLE LOW-9 QUOTATION MARK
133 85 = U+2026 : HORIZONTAL ELLIPSIS
134 86 = U+2020 : DAGGER
135 87 = U+2021 : DOUBLE DAGGER
136 88 = U+02C6 : MODIFIER LETTER CIRCUMFLEX ACCENT
137 89 = U+2030 : PER MILLE SIGN
138 8A = U+0160 : LATIN CAPITAL LETTER S WITH CARON
139 8B = U+2039 : SINGLE LEFT-POINTING ANGLE QUOTATION MARK
140 8C = U+0152 : LATIN CAPITAL LIGATURE OE
142 8E = U+017D : LATIN CAPITAL LETTER Z WITH CARON
145 91 = U+2018 : LEFT SINGLE QUOTATION MARK
146 92 = U+2019 : RIGHT SINGLE QUOTATION MARK
147 93 = U+201C : LEFT DOUBLE QUOTATION MARK
148 94 = U+201D : RIGHT DOUBLE QUOTATION MARK
149 95 = U+2022 : BULLET
150 96 = U+2013 : EN DASH
151 97 = U+2014 : EM DASH
152 98 = U+02DC : SMALL TILDE
153 99 = U+2122 : TRADE MARK SIGN
154 9A = U+0161 : LATIN SMALL LETTER S WITH CARON
155 9B = U+203A : SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
156 9C = U+0153 : LATIN SMALL LIGATURE OE
158 9E = U+017E : LATIN SMALL LETTER Z WITH CARON
159 9F = U+0178 : LATIN CAPITAL LETTER Y WITH DIAERESIS

http://msdn.microsoft.com/de-de/goglobal/cc305145%28en-us%29 .aspx
http://www.microsoft.com/globaldev/reference/wincp.mspx

> The downgrading of results to CP_ACP is probably a mistake; I can't
> see how this would ever be useful. It helps scripts that don't know
> how to deal with Unicode strings, but those shouldn't ask for CP_UTF8
> results in the first place.

Thanks. I also think it is a mistake.

> The internal confusion between Latin1 and CP_ACP is harder to deal
> with: the core text functions all assume Latin1, and the filesystem
> APIs all assume CP_ACP. So if we were to fix this to always assume
> Latin1 internally, then all scripts that read filenames from backticks/
> qx(), or receive them from GUI dialogs, or read them from ANSI encoded
> text files will break unless they convert them to Latin1 explicitly.
>
> Maybe that breakage is necessary eventually, but it won't happen for
> Perl 5.14, so any change there is a long way off.

I was blissfully unaware of this issue.

> > A data-dependent return value encoding is difficult to work with.
>
> Ignoring the CP_ACP/Latin1 issue, why does it matter which internal
> encoding is used for your strings?

It matters in a situation where I want to store that string in a
database accepting XML documents using Perl 5.6.1 (yes). In the case
where the string includes Greek or Russian characters, UTF-8 encoding
happens; in the case where all the characters are < 256, UTF-8 encoding
does not happen, which results in a parse error.

The (crappy) solution I have now is to try and store the string; if that
fails, I encode it into UTF-8 and try again.

> You commented out the line that put STDOUT into Unicode mode:
>
> # binmode STDOUT, ':utf8' unless $P56;
>
> But if you re-activate the line, then you will see that the characters
> are written out the same way, regardless of the way they have been
> encoded internally.

Yes. It is easier from 5.8 onwards.

Thanks for your help!
--
Michael Ludwig
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs