Creating XML from an Oracle DB

Creating XML from an Oracle DB

am 09.10.2006 09:41:36 von Keith.Barnard

--686iG.47rm3GFUE.1Y+kJe.60DRDaP
MIME-Version: 1.0
Content-type: text/plain; charset=US-ASCII

Hi Tim,

I have spent about three hours using Google and Metalink trying to find the

answer to what must be a FAQ when generating XML from Oracle but I cannot
find anything that answers the question.

I work for a company in the UK and we are generating an XML file from an
Oracle 9.2 database using DBMS_XMLQUERY and DBMS_LOB. The XML file has to
be
in UTF-8 format (encoding="utf-8") but this is not the format that it is
held in the database. From what I have read, it seems that it is AL32UTF8.

As a result the UK pound sign contained in a VARCHAR2 column on the
database
is causing problems. When we try to open the XML file using Internet
Explorer it will not display its contents.

The TO_ASCII value of the pound sign is 49827 when selected from the
database. We need to find a way of converting from one format to the other.

I could probably use a combination of Oracle functions such as TRANSLATE
and
CHR to convert individual occurrences of the pound sign. However, any text
column may have a pound sign in it and it is possible that there are other
characters that will cause problems of this sort. There must be a more
elegant and foolproof way of doing this.

Any advice would be much appreciated.

Keith

--686iG.47rm3GFUE.1Y+kJe.60DRDaP
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

************************************************************ **********
Visit the Virgin Atlantic website for all the latest news and=20
fantastic offers - http://www.virgin.com/atlantic
This e-mail (and any attachments) may contain privileged and/or
confidential information. If you are not the intended recipient please
do not disclose, copy, distribute, disseminate or take any action in
reliance on it. If you have received this message in error please reply
and tell us and then delete all copies on your system. Any opinion on
or advice or information contained in this email is not necessarily=20
that
of the owners or officers of this company.
Should you wish to communicate with us by e-mail, we cannot guarantee
the security of any data outside our own computer system.
************************************************************ **********

--686iG.47rm3GFUE.1Y+kJe.60DRDaP--

RE: Creating XML from an Oracle DB

am 09.10.2006 17:05:15 von Philip.Garrett

Keith.Barnard@fly.virgin.com wrote:
> Hi Tim,
>=20
> I have spent about three hours using Google and Metalink trying to
> find the=20
>=20
> answer to what must be a FAQ when generating XML from Oracle but I
> cannot find anything that answers the question.
>=20
> I work for a company in the UK and we are generating an XML file from
> an Oracle 9.2 database using DBMS_XMLQUERY and DBMS_LOB. The XML file
> has to be
> in UTF-8 format (encoding=3D"utf-8") but this is not the format that
> it is held in the database. From what I have read, it seems that it
> is AL32UTF8.=20

Oracle's utf-8 support for the XML packages is shameful.
http://www.dbforums.com/showthread.php?t=3D1212787.

For any of this to happen automatically, you need to make sure that your
original data (what you're making the XML from) is stored correctly in
the database's character set. This means that if your data actually
contains utf-8, the database characterset should be AL32UTF8. It
appears your data is indeed in utf-8, because 49827 (0xC2A3) is the
utf-8 representation of the pound sign.

You can determine your database's character set with this query:
select value from v$nls_parameters
where parameter =3D 'NLS_CHARACTERSET'

Assuming your database characterset actually matches the data that's in
it, you can just set the client character set to your desired output,
and the encoding is done for you:

# data from Oracle will now be converted correctly
# into Perl's internal encoding.
export NLS_LANG=3DAMERICAN_AMERICA.AL32UTF8

# in perl...
# convert from Perl's internal encoding to utf-8 when
# printing to stdout.
binmode(STDOUT, ':utf8'); # convert from Perl's internal to utf-8

If your database characterset does not match the data (for example, your
database characterset is US7ASCII) then sorry, you will have to manually
convert each unicode column. If you're using Oracle 10G, then there are
functions to convert XML encodings explicitly using character set Ids.
Otherwise, you'll have to figure out the right hocus pocus to transfer
the data from Oracle to Perl without losing character information, and
then encode/decode in Perl with the Encode module.

If you haven't already, I recommend reading the Oracle 9i Globalization
Best Practices document. It can help get your head around how the
character sets work. http://tinyurl.com/mtsxg [oracle.com].

Hope it helps. Regards,

Philip

Re: Creating XML from an Oracle DB

am 16.11.2006 12:45:21 von dispo41

How about trying to create a brand new database using UTF8 *instead* of
AL32UTF8 ?

I'm no expert, and I'm not doing the same thing you are, but I solved *my*
UTF8 problems
thusly:-

US7ASCII
UTF8

""Garrett, Philip (MAN-Corporate)"" wrote in
message news:D9C13100F14E4C4795A1E83B125B40350232B631@MSCEXCHS02.man .co...
Keith.Barnard@fly.virgin.com wrote:
> Hi Tim,
>
> I have spent about three hours using Google and Metalink trying to
> find the
>
> answer to what must be a FAQ when generating XML from Oracle but I
> cannot find anything that answers the question.
>
> I work for a company in the UK and we are generating an XML file from
> an Oracle 9.2 database using DBMS_XMLQUERY and DBMS_LOB. The XML file
> has to be
> in UTF-8 format (encoding="utf-8") but this is not the format that
> it is held in the database. From what I have read, it seems that it
> is AL32UTF8.

Oracle's utf-8 support for the XML packages is shameful.
http://www.dbforums.com/showthread.php?t=1212787.

For any of this to happen automatically, you need to make sure that your
original data (what you're making the XML from) is stored correctly in
the database's character set. This means that if your data actually
contains utf-8, the database characterset should be AL32UTF8. It
appears your data is indeed in utf-8, because 49827 (0xC2A3) is the
utf-8 representation of the pound sign.

You can determine your database's character set with this query:
select value from v$nls_parameters
where parameter = 'NLS_CHARACTERSET'

Assuming your database characterset actually matches the data that's in
it, you can just set the client character set to your desired output,
and the encoding is done for you:

# data from Oracle will now be converted correctly
# into Perl's internal encoding.
export NLS_LANG=AMERICAN_AMERICA.AL32UTF8

# in perl...
# convert from Perl's internal encoding to utf-8 when
# printing to stdout.
binmode(STDOUT, ':utf8'); # convert from Perl's internal to utf-8

If your database characterset does not match the data (for example, your
database characterset is US7ASCII) then sorry, you will have to manually
convert each unicode column. If you're using Oracle 10G, then there are
functions to convert XML encodings explicitly using character set Ids.
Otherwise, you'll have to figure out the right hocus pocus to transfer
the data from Oracle to Perl without losing character information, and
then encode/decode in Perl with the Encode module.

If you haven't already, I recommend reading the Oracle 9i Globalization
Best Practices document. It can help get your head around how the
character sets work. http://tinyurl.com/mtsxg [oracle.com].

Hope it helps. Regards,

Philip

RE: Creating XML from an Oracle DB

am 16.11.2006 16:49:48 von Philip.Garrett

Hi,

Chris wrote:
> How about trying to create a brand new database using UTF8 *instead*
> of AL32UTF8 ?

Oracle's "UTF8" char set is old and incomplete. AL32UTF8 is the newer,
more complete version.

See http://tinyurl.com/y4qjd9 (oracle.com).
"Oracle recommends that you switch to AL32UTF8 for full support of
supplementary characters in the database character set."

Regards,
Philip

Re: Creating XML from an Oracle DB

am 16.11.2006 17:37:08 von hjp

--1Ow488MNN9B9o/ov
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2006-11-16 10:49:48 -0500, Garrett, Philip (MAN-Corporate) wrote:
> Chris wrote:
> > How about trying to create a brand new database using UTF8 *instead*
> > of AL32UTF8 ?
>=20
> Oracle's "UTF8" char set is old and incomplete. AL32UTF8 is the newer,
> more complete version.

Oracle's UTF8 is not only incomplete, it's also not really UTF-8, but a
rather weird combination of UTF-16 and UTF-8. Oracle's AL32UTF8 is
really what the rest of the world calls UTF-8. The difference in
encoding matters only for characters beyond U+10000, though, so most
users probably won't notice.

hp

--=20
_ | Peter J. Holzer | If I wanted to be "academically correct",
|_|_) | Sysadmin WSR | I'd be programming in Java.
| | | hjp@wsr.ac.at | I don't, and I'm not.
__/ | http://www.hjp.at/ | -- Jesse Erlbaum on dbi-users

--1Ow488MNN9B9o/ov
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iQDQAwUBRVyTtFLjemazOuKpAQL1lgXUCJNoAumhszcSubZH/uGoolDIqfN0 /ZpO
9adWbCjCcDd2LP2ts29fkY9Sp+LMHpoWqGyNIOoBvnsUZ4xwIM9igLrDZ0UH pb7l
JtV6tKx7uIErrK4IEpjTeN7TgD1HKTiDRLz49z5Bkzf11cw7dylMlI4SrowM n8pm
QzUg4pxAHfYs7n3pnUisVoA/o1CG1J2RRreEXP/lNIeb6GowNVZE0HwNs4+V PsWV
dledH+oxD3khXDIM5IAeIgQ1LQ==
=yUif
-----END PGP SIGNATURE-----

--1Ow488MNN9B9o/ov--