Encoding problem in SQLXML - bulkload

Encoding problem in SQLXML - bulkload

am 30.03.2008 11:42:55 von worlman385

I got an XML Data in UTF-8 encoding like this

Dah$)A(&li

but when I run the schema to insert data into SQL Server 2005 Express
edition, the data will become like this

Dah$)A!'0„7li

How can I solve the encoding problem?

XML data:
=======================================



Director
Dah$)A(&li
Hall


Writer
Dah$)A(&li
Hall




XML schema:
=======================================

elementFormDefault="qualified"
xmlns:sql="urn:schemas-microsoft-com:mapping-schema">



parent-key="program"
child="productionCrew" child-key="program"/>

parent-key="role givenname surname" child="crew"
child-key="role givenname surname"/>







sql:relationship="OrderOD ODProduct">
















Database table:
=======================================

CREATE TABLE ProgramListings.dbo.productionCrew
(
program VARCHAR(20),
role VARCHAR(20),
givenname VARCHAR(20),
surname VARCHAR(20),
PRIMARY KEY(program, role, surname)
)

CREATE TABLE ProgramListings.dbo.crew
(
role VARCHAR(20),
givenname VARCHAR(20),
surname VARCHAR(20),
PRIMARY KEY(role, surname)

)

Re: Encoding problem in SQLXML - bulkload

am 30.03.2008 16:37:25 von Dan Guzman

>I got an XML Data in UTF-8 encoding like this
>
> Dah$)A(&li
>
> but when I run the schema to insert data into SQL Server 2005 Express
> edition, the data will become like this
>
> Dah$)A!'0"7li
>
> How can I solve the encoding problem?

Your XML is not well-formed. Entity references need to be specified in
place of illegal XML characters (e.g. "&" instead of "&"). Note that
this is not specific to SQLXML but part of the basic XML standards. See
http://www.w3.org/TR/REC-xml/.

A CDATA section is commonly used in XML in order to eliminate the need to
escape illegal characters. In a CDATA section, only the end tag ("]]>").
For example:



Director

Hall


Writer

Hall



--
Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

wrote in message
news:23ouu39b3arbpskfmgdtfof9b81ilskjg3@4ax.com...
>I got an XML Data in UTF-8 encoding like this
>
> Dah$)A(&li
>
> but when I run the schema to insert data into SQL Server 2005 Express
> edition, the data will become like this
>
> Dah$)A!'0"7li
>
> How can I solve the encoding problem?
>
> XML data:
> =======================================
>
>
>
> Director
> Dah$)A(&li
> Hall
>

>
> Writer
> Dah$)A(&li
> Hall
>

>

>
>
> XML schema:
> =======================================
>
> > elementFormDefault="qualified"
> xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
>
>
>
> > parent-key="program"
> child="productionCrew" child-key="program"/>
>
> > parent-key="role givenname surname" child="crew"
> child-key="role givenname surname"/>
>

>

>
>
>
>
>
> > sql:relationship="OrderOD ODProduct">
>
>
>
>
>
>

>

>

>

>
>

>

>
>
>

>
> Database table:
> =======================================
>
> CREATE TABLE ProgramListings.dbo.productionCrew
> (
> program VARCHAR(20),
> role VARCHAR(20),
> givenname VARCHAR(20),
> surname VARCHAR(20),
> PRIMARY KEY(program, role, surname)
> )
>
> CREATE TABLE ProgramListings.dbo.crew
> (
> role VARCHAR(20),
> givenname VARCHAR(20),
> surname VARCHAR(20),
> PRIMARY KEY(role, surname)
>
> )

Re: Encoding problem in SQLXML - bulkload

am 30.03.2008 16:56:11 von Dan Guzman

> In a CDATA section, only the end tag ("]]>").

This sentence should have been:

In a CDATA section, only the end tag ("]]>") is recognized as markup.


--
Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

"Dan Guzman" wrote in message
news:6E2D5295-651D-4DE4-ACF6-149251E0433B@microsoft.com...
> >I got an XML Data in UTF-8 encoding like this
>>
>> Dah$)A(&li
>>
>> but when I run the schema to insert data into SQL Server 2005 Express
>> edition, the data will become like this
>>
>> Dah$)A!'0"7li
>>
>> How can I solve the encoding problem?
>
> Your XML is not well-formed. Entity references need to be specified in
> place of illegal XML characters (e.g. "&" instead of "&"). Note that
> this is not specific to SQLXML but part of the basic XML standards. See
> http://www.w3.org/TR/REC-xml/.
>
> A CDATA section is commonly used in XML in order to eliminate the need to
> escape illegal characters. In a CDATA section, only the end tag ("]]>").
> For example:
>
>
>
> Director
>
> Hall
>

>
> Writer
>
> Hall
>

>

>
> --
> Hope this helps.
>
> Dan Guzman
> SQL Server MVP
> http://weblogs.sqlteam.com/dang/
>
> wrote in message
> news:23ouu39b3arbpskfmgdtfof9b81ilskjg3@4ax.com...
>>I got an XML Data in UTF-8 encoding like this
>>
>> Dah$)A(&li
>>
>> but when I run the schema to insert data into SQL Server 2005 Express
>> edition, the data will become like this
>>
>> Dah$)A!'0"7li
>>
>> How can I solve the encoding problem?
>>
>> XML data:
>> =======================================
>>
>>
>>
>> Director
>> Dah$)A(&li
>> Hall
>>

>>
>> Writer
>> Dah$)A(&li
>> Hall
>>

>>

>>
>>
>> XML schema:
>> =======================================
>>
>> >> elementFormDefault="qualified"
>> xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
>>
>>
>>
>> >> parent-key="program"
>> child="productionCrew" child-key="program"/>
>>
>> >> parent-key="role givenname surname" child="crew"
>> child-key="role givenname surname"/>
>>

>>

>>
>>
>>
>>
>>
>> >> sql:relationship="OrderOD ODProduct">
>>
>>
>>
>>
>>
>>

>>

>>

>>

>>
>>

>>

>>
>>
>>

>>
>> Database table:
>> =======================================
>>
>> CREATE TABLE ProgramListings.dbo.productionCrew
>> (
>> program VARCHAR(20),
>> role VARCHAR(20),
>> givenname VARCHAR(20),
>> surname VARCHAR(20),
>> PRIMARY KEY(program, role, surname)
>> )
>>
>> CREATE TABLE ProgramListings.dbo.crew
>> (
>> role VARCHAR(20),
>> givenname VARCHAR(20),
>> surname VARCHAR(20),
>> PRIMARY KEY(role, surname)
>>
>> )
>

Re: Encoding problem in SQLXML - bulkload

am 31.03.2008 00:11:23 von Dan Guzman

> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
> to ASCII so the data is messed up when loaded from XML to database.

Does your XML include a processing instruction to specify UTF-8 encoding?
For example:



--
Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

wrote in message
news:7m50v3dsmbcdqcrvhf2qm0c0q0apc724mo@4ax.com...
> Thanks your help Dan!
>
> But no! the XML is well formed:
> http://www.oniva.com/upload/1356/x1.jpg
>
> but after using the COM object of SQLXML to load XML file into
> Database, the data will look like this:
> http://www.oniva.com/upload/1356/x1.jpg
>
> notice the givenname data is changed.
>
> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
> to ASCII so the data is messed up when loaded from XML to database.
>
> Since some data in XML is non-ASCII
>
>
>
>>Your XML is not well-formed. Entity references need to be specified in
>>place of illegal XML characters (e.g. "&" instead of "&"). Note that
>>this is not specific to SQLXML but part of the basic XML standards. See
>>http://www.w3.org/TR/REC-xml/.
>>
>>A CDATA section is commonly used in XML in order to eliminate the need to
>>escape illegal characters. In a CDATA section, only the end tag ("]]>").
>>For example:
>>
>>
>>
>>Director
>>
>>Hall
>>

>>
>>Writer
>>
>>Hall
>>

>>

Re: Encoding problem in SQLXML - bulkload

am 31.03.2008 00:13:47 von Erland Sommarskog

(worlman385@yahoo.com) writes:
> Thanks your help Dan!
>
> But no! the XML is well formed:
> http://www.oniva.com/upload/1356/x1.jpg
>
> but after using the COM object of SQLXML to load XML file into
> Database, the data will look like this:
> http://www.oniva.com/upload/1356/x1.jpg
>
> notice the givenname data is changed.
>
> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
> to ASCII so the data is messed up when loaded from XML to database.

Your original post was encoded in iso-2022-cn, which may explain why I
and Dan so very funny characters.

Since your target columns are varchar, I need to ask: what is the collation
of these columns? If that collation does not include é in its ANSI set,
you cannot get the name right. Then again, then you should get "e". What
you got appears to be UTF-8 interpreted as ANSI.

I don't have any experience of XML bulkload, so I don't know what is
going on. You could try to add



first in the file, even though this should not be needed since UTF-8
is the default for XML.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx

Re: Encoding problem in SQLXML - bulkload

am 31.03.2008 00:46:06 von worlman385

Thanks your help Dan!

But no! the XML is well formed:
http://www.oniva.com/upload/1356/x1.jpg

but after using the COM object of SQLXML to load XML file into
Database, the data will look like this:
http://www.oniva.com/upload/1356/x1.jpg

notice the givenname data is changed.

I think the input is UTF-8 data but the SQLXML interface convert UTF-8
to ASCII so the data is messed up when loaded from XML to database.

Since some data in XML is non-ASCII



>Your XML is not well-formed. Entity references need to be specified in
>place of illegal XML characters (e.g. "&" instead of "&"). Note that
>this is not specific to SQLXML but part of the basic XML standards. See
>http://www.w3.org/TR/REC-xml/.
>
>A CDATA section is commonly used in XML in order to eliminate the need to
>escape illegal characters. In a CDATA section, only the end tag ("]]>").
>For example:
>
>
>
>Director
>
>Hall
>

>
>Writer
>
>Hall
>

>

Re: Encoding problem in SQLXML - bulkload

am 31.03.2008 00:57:04 von worlman385

Sorry, the link of second one should be
http://www.oniva.com/upload/1356/x2.jpg

>
>but after using the COM object of SQLXML to load XML file into
>Database, the data will look like this:
>http://www.oniva.com/upload/1356/x1.jpg
>

Re: Encoding problem in SQLXML - bulkload

am 31.03.2008 01:25:01 von worlman385

Yes, I put the string like

also tried other encoding string but no luck.

I tried to put the parameter in the connection string:
sendStringParametersAsUnicode=true

but doesn't help

=======================
hr = pIXMLBulkLoad->put_ConnectionString(L"provider=SQLOLEDB;data
source=(local)\\SQLEXPRESS;database=ProgramListings;integrat ed
security=SSPI;sendStringParametersAsUnicode=true");
}









>> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
>> to ASCII so the data is messed up when loaded from XML to database.
>
>Does your XML include a processing instruction to specify UTF-8 encoding?
>For example:
>
>

Re: Encoding problem in SQLXML - bulkload

am 31.03.2008 01:34:10 von worlman385

Just solve the problem

use NVARCHAR instead of VARCHAR

NVARCHAR - support unicdoe

==============

CREATE TABLE ProgramListings.dbo.productionCrew
(
program NVARCHAR(20),
role NVARCHAR(20),
givenname NVARCHAR(20),
surname NVARCHAR(20),
PRIMARY KEY(program, role, surname)
)


>> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
>> to ASCII so the data is messed up when loaded from XML to database.
>
>Your original post was encoded in iso-2022-cn, which may explain why I
>and Dan so very funny characters.
>
>Since your target columns are varchar, I need to ask: what is the collation
>of these columns? If that collation does not include $)A(& in its ANSI set,
>you cannot get the name right. Then again, then you should get "e". What
>you got appears to be UTF-8 interpreted as ANSI.
>
>I don't have any experience of XML bulkload, so I don't know what is
>going on. You could try to add
>
>
>
>first in the file, even though this should not be needed since UTF-8
>is the default for XML.
>

Re: Encoding problem in SQLXML - bulkload

am 31.03.2008 01:34:33 von worlman385

Just solve the problem

use NVARCHAR instead of VARCHAR

NVARCHAR - support unicdoe

==============

CREATE TABLE ProgramListings.dbo.productionCrew
(
program NVARCHAR(20),
role NVARCHAR(20),
givenname NVARCHAR(20),
surname NVARCHAR(20),
PRIMARY KEY(program, role, surname)
)

Re: Encoding problem in SQLXML - bulkload

am 31.03.2008 01:34:56 von worlman385

Just solve the problem

use NVARCHAR instead of VARCHAR

NVARCHAR - support unicdoe

==============

CREATE TABLE ProgramListings.dbo.productionCrew
(
program NVARCHAR(20),
role NVARCHAR(20),
givenname NVARCHAR(20),
surname NVARCHAR(20),
PRIMARY KEY(program, role, surname)
)

Re: Encoding problem in SQLXML - bulkload

am 31.03.2008 13:42:14 von Dan Guzman

> Just solve the problem
>
> use NVARCHAR instead of VARCHAR

I'm glad you were able to figure this out. I had assumed that your database
default collation was appropriate for the characters being stored.

--
Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

wrote in message
news:4s80v3dun5cl5vf7vks6mnkqt7tdk116uq@4ax.com...
> Just solve the problem
>
> use NVARCHAR instead of VARCHAR
>
> NVARCHAR - support unicdoe
>
> ==============
>
> CREATE TABLE ProgramListings.dbo.productionCrew
> (
> program NVARCHAR(20),
> role NVARCHAR(20),
> givenname NVARCHAR(20),
> surname NVARCHAR(20),
> PRIMARY KEY(program, role, surname)
> )