Win32::OLE - Module Option "CP"

Win32::OLE - Module Option "CP"

am 10.09.2010 12:31:31 von Michael.Ludwig

The Win32::OLE manual says the following about the CP option:

----
This variable is used to determine the codepage used by all translations between Perl strings and Unicode strings used by the OLE interface. The default value is CP_ACP, which is the default ANSI codepage. Other possible values are CP_OEMCP, CP_MACCP, CP_UTF7 and CP_UTF8. These constants are not exported by default.
----

I don't understand the impact of this setting. I presume there isn't any, but I want to be sure.

The context I'm asking this question in is XML processing involving IIS, CGI, Perl 5.6.1 (I know), Win32::OLE and MSXML 6.0.

--
Michael Ludwig
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: Win32::OLE - Module Option "CP"

am 10.09.2010 21:21:49 von Jan Dubois

On Fri, 10 Sep 2010, Ludwig, Michael wrote:
> The Win32::OLE manual says the following about the CP option:
>
> ----
> This variable is used to determine the codepage used by all
> translations between Perl strings and Unicode strings used by the OLE
> interface. The default value is CP_ACP, which is the default ANSI
> codepage. Other possible values are CP_OEMCP, CP_MACCP, CP_UTF7 and
> CP_UTF8. These constants are not exported by default.
> ----
>
> I don't understand the impact of this setting. I presume there isn't
> any, but I want to be sure.

OLE Automation transfers strings internally encoded in UTF-16 (as BSTR
types). Win32::OLE needs to transform them into regular Perl strings.
By default it converts to CP_ACP, the standard 8-bit character set on
Windows. That means any Unicode character that is not representable
in CP_ACP will be translated to a "replacement" character (e.g. '?').

If you want to preserve the original Unicode string, then you need
to tell Win32::OLE to use CP_UTF8 instead.

CP_ACP is just the default for backwards compatibility reasons.

You probably don't want to use any of the other encodings, like CP_MACCP
or CP_UTF7, ever. :)

Cheers,
-Jan


_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: Win32::OLE - Module Option "CP"

am 13.09.2010 09:26:34 von Michael.Ludwig

> -----Original Message-----
> From: Jan Dubois
> On Fri, 10 Sep 2010, Ludwig, Michael wrote:
> > The Win32::OLE manual says the following about the CP option:
> >
> > ----
> > This variable is used to determine the codepage used by all
> > translations between Perl strings and Unicode strings used by the
> > OLE interface. The default value is CP_ACP, which is the default
> > ANSI codepage. Other possible values are CP_OEMCP, CP_MACCP, CP_UTF7
> > and CP_UTF8. These constants are not exported by default.
> > ----
> >
> > I don't understand the impact of this setting. I presume there isn't
> > any, but I want to be sure.
>
> OLE Automation transfers strings internally encoded in UTF-16 (as BSTR
> types). Win32::OLE needs to transform them into regular Perl strings.
> By default it converts to CP_ACP, the standard 8-bit character set on
> Windows. That means any Unicode character that is not representable
> in CP_ACP will be translated to a "replacement" character (e.g. '?').
>
> If you want to preserve the original Unicode string, then you need to
> tell Win32::OLE to use CP_UTF8 instead.

I can confirm that it works, and does make a difference. For Unicode
text processing, you need CP_UTF8. Interaction with MSXML and Unicode
documents didn't make sense before I specified CP_UTF8.

> CP_ACP is just the default for backwards compatibility reasons.
>
> You probably don't want to use any of the other encodings, like
> CP_MACCP or CP_UTF7, ever. :)

Serialized a doc to UTF-7 the other day. It does look funny. :-)
But reparsing appears to be a problem ...

Best,

Michael
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs