Encoding problem in SQLXML - bulkload
Encoding problem in SQLXML - bulkload
am 30.03.2008 11:42:55 von worlman385
I got an XML Data in UTF-8 encoding like this
Dah$)A(&li
but when I run the schema to insert data into SQL Server 2005 Express
edition, the data will become like this
Dah$)A!'07li
How can I solve the encoding problem?
XML data:
=======================================
Director
Dah$)A(&li
Hall
Writer
Dah$)A(&li
Hall
XML schema:
=======================================
elementFormDefault="qualified"
xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
parent-key="program"
child="productionCrew" child-key="program"/>
parent-key="role givenname surname" child="crew"
child-key="role givenname surname"/>
sql:relationship="OrderOD ODProduct">
Database table:
=======================================
CREATE TABLE ProgramListings.dbo.productionCrew
(
program VARCHAR(20),
role VARCHAR(20),
givenname VARCHAR(20),
surname VARCHAR(20),
PRIMARY KEY(program, role, surname)
)
CREATE TABLE ProgramListings.dbo.crew
(
role VARCHAR(20),
givenname VARCHAR(20),
surname VARCHAR(20),
PRIMARY KEY(role, surname)
)
Re: Encoding problem in SQLXML - bulkload
am 30.03.2008 16:37:25 von Dan Guzman
>I got an XML Data in UTF-8 encoding like this
>
> Dah$)A(&li
>
> but when I run the schema to insert data into SQL Server 2005 Express
> edition, the data will become like this
>
> Dah$)A!'0"7li
>
> How can I solve the encoding problem?
Your XML is not well-formed. Entity references need to be specified in
place of illegal XML characters (e.g. "&" instead of "&"). Note that
this is not specific to SQLXML but part of the basic XML standards. See
http://www.w3.org/TR/REC-xml/.
A CDATA section is commonly used in XML in order to eliminate the need to
escape illegal characters. In a CDATA section, only the end tag ("]]>").
For example:
Director
Hall
Writer
Hall
--
Hope this helps.
Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/
wrote in message
news:23ouu39b3arbpskfmgdtfof9b81ilskjg3@4ax.com...
>I got an XML Data in UTF-8 encoding like this
>
> Dah$)A(&li
>
> but when I run the schema to insert data into SQL Server 2005 Express
> edition, the data will become like this
>
> Dah$)A!'0"7li
>
> How can I solve the encoding problem?
>
> XML data:
> =======================================
>
>
>
> Director
> Dah$)A(&li
> Hall
>
>
> Writer
> Dah$)A(&li
> Hall
>
>
>
>
> XML schema:
> =======================================
>
>
> elementFormDefault="qualified"
> xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
>
>
>
>
> parent-key="program"
> child="productionCrew" child-key="program"/>
>
>
> parent-key="role givenname surname" child="crew"
> child-key="role givenname surname"/>
>
>
>
>
>
>
>
>
> sql:relationship="OrderOD ODProduct">
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Database table:
> =======================================
>
> CREATE TABLE ProgramListings.dbo.productionCrew
> (
> program VARCHAR(20),
> role VARCHAR(20),
> givenname VARCHAR(20),
> surname VARCHAR(20),
> PRIMARY KEY(program, role, surname)
> )
>
> CREATE TABLE ProgramListings.dbo.crew
> (
> role VARCHAR(20),
> givenname VARCHAR(20),
> surname VARCHAR(20),
> PRIMARY KEY(role, surname)
>
> )
Re: Encoding problem in SQLXML - bulkload
am 30.03.2008 16:56:11 von Dan Guzman
> In a CDATA section, only the end tag ("]]>").
This sentence should have been:
In a CDATA section, only the end tag ("]]>") is recognized as markup.
--
Hope this helps.
Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/
"Dan Guzman" wrote in message
news:6E2D5295-651D-4DE4-ACF6-149251E0433B@microsoft.com...
> >I got an XML Data in UTF-8 encoding like this
>>
>> Dah$)A(&li
>>
>> but when I run the schema to insert data into SQL Server 2005 Express
>> edition, the data will become like this
>>
>> Dah$)A!'0"7li
>>
>> How can I solve the encoding problem?
>
> Your XML is not well-formed. Entity references need to be specified in
> place of illegal XML characters (e.g. "&" instead of "&"). Note that
> this is not specific to SQLXML but part of the basic XML standards. See
> http://www.w3.org/TR/REC-xml/.
>
> A CDATA section is commonly used in XML in order to eliminate the need to
> escape illegal characters. In a CDATA section, only the end tag ("]]>").
> For example:
>
>
>
> Director
>
> Hall
>
>
> Writer
>
> Hall
>
>
>
> --
> Hope this helps.
>
> Dan Guzman
> SQL Server MVP
> http://weblogs.sqlteam.com/dang/
>
> wrote in message
> news:23ouu39b3arbpskfmgdtfof9b81ilskjg3@4ax.com...
>>I got an XML Data in UTF-8 encoding like this
>>
>> Dah$)A(&li
>>
>> but when I run the schema to insert data into SQL Server 2005 Express
>> edition, the data will become like this
>>
>> Dah$)A!'0"7li
>>
>> How can I solve the encoding problem?
>>
>> XML data:
>> =======================================
>>
>>
>>
>> Director
>> Dah$)A(&li
>> Hall
>>
>>
>> Writer
>> Dah$)A(&li
>> Hall
>>
>>
>>
>>
>> XML schema:
>> =======================================
>>
>>
>> elementFormDefault="qualified"
>> xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
>>
>>
>>
>>
>> parent-key="program"
>> child="productionCrew" child-key="program"/>
>>
>>
>> parent-key="role givenname surname" child="crew"
>> child-key="role givenname surname"/>
>>
>>
>>
>>
>>
>>
>>
>>
>> sql:relationship="OrderOD ODProduct">
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Database table:
>> =======================================
>>
>> CREATE TABLE ProgramListings.dbo.productionCrew
>> (
>> program VARCHAR(20),
>> role VARCHAR(20),
>> givenname VARCHAR(20),
>> surname VARCHAR(20),
>> PRIMARY KEY(program, role, surname)
>> )
>>
>> CREATE TABLE ProgramListings.dbo.crew
>> (
>> role VARCHAR(20),
>> givenname VARCHAR(20),
>> surname VARCHAR(20),
>> PRIMARY KEY(role, surname)
>>
>> )
>
Re: Encoding problem in SQLXML - bulkload
am 31.03.2008 00:11:23 von Dan Guzman
> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
> to ASCII so the data is messed up when loaded from XML to database.
Does your XML include a processing instruction to specify UTF-8 encoding?
For example:
--
Hope this helps.
Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/
wrote in message
news:7m50v3dsmbcdqcrvhf2qm0c0q0apc724mo@4ax.com...
> Thanks your help Dan!
>
> But no! the XML is well formed:
> http://www.oniva.com/upload/1356/x1.jpg
>
> but after using the COM object of SQLXML to load XML file into
> Database, the data will look like this:
> http://www.oniva.com/upload/1356/x1.jpg
>
> notice the givenname data is changed.
>
> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
> to ASCII so the data is messed up when loaded from XML to database.
>
> Since some data in XML is non-ASCII
>
>
>
>>Your XML is not well-formed. Entity references need to be specified in
>>place of illegal XML characters (e.g. "&" instead of "&"). Note that
>>this is not specific to SQLXML but part of the basic XML standards. See
>>http://www.w3.org/TR/REC-xml/.
>>
>>A CDATA section is commonly used in XML in order to eliminate the need to
>>escape illegal characters. In a CDATA section, only the end tag ("]]>").
>>For example:
>>
>>
>>
>>Director
>>
>>Hall
>>
>>
>>Writer
>>
>>Hall
>>
>>
Re: Encoding problem in SQLXML - bulkload
am 31.03.2008 00:13:47 von Erland Sommarskog
(worlman385@yahoo.com) writes:
> Thanks your help Dan!
>
> But no! the XML is well formed:
> http://www.oniva.com/upload/1356/x1.jpg
>
> but after using the COM object of SQLXML to load XML file into
> Database, the data will look like this:
> http://www.oniva.com/upload/1356/x1.jpg
>
> notice the givenname data is changed.
>
> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
> to ASCII so the data is messed up when loaded from XML to database.
Your original post was encoded in iso-2022-cn, which may explain why I
and Dan so very funny characters.
Since your target columns are varchar, I need to ask: what is the collation
of these columns? If that collation does not include é in its ANSI set,
you cannot get the name right. Then again, then you should get "e". What
you got appears to be UTF-8 interpreted as ANSI.
I don't have any experience of XML bulkload, so I don't know what is
going on. You could try to add
first in the file, even though this should not be needed since UTF-8
is the default for XML.
--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downlo ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books .mspx
Re: Encoding problem in SQLXML - bulkload
am 31.03.2008 00:46:06 von worlman385
Thanks your help Dan!
But no! the XML is well formed:
http://www.oniva.com/upload/1356/x1.jpg
but after using the COM object of SQLXML to load XML file into
Database, the data will look like this:
http://www.oniva.com/upload/1356/x1.jpg
notice the givenname data is changed.
I think the input is UTF-8 data but the SQLXML interface convert UTF-8
to ASCII so the data is messed up when loaded from XML to database.
Since some data in XML is non-ASCII
>Your XML is not well-formed. Entity references need to be specified in
>place of illegal XML characters (e.g. "&" instead of "&"). Note that
>this is not specific to SQLXML but part of the basic XML standards. See
>http://www.w3.org/TR/REC-xml/.
>
>A CDATA section is commonly used in XML in order to eliminate the need to
>escape illegal characters. In a CDATA section, only the end tag ("]]>").
>For example:
>
>
>
>Director
>
>Hall
>
>
>Writer
>
>Hall
>
>
Re: Encoding problem in SQLXML - bulkload
am 31.03.2008 00:57:04 von worlman385
Sorry, the link of second one should be
http://www.oniva.com/upload/1356/x2.jpg
>
>but after using the COM object of SQLXML to load XML file into
>Database, the data will look like this:
>http://www.oniva.com/upload/1356/x1.jpg
>
Re: Encoding problem in SQLXML - bulkload
am 31.03.2008 01:25:01 von worlman385
Yes, I put the string like
also tried other encoding string but no luck.
I tried to put the parameter in the connection string:
sendStringParametersAsUnicode=true
but doesn't help
=======================
hr = pIXMLBulkLoad->put_ConnectionString(L"provider=SQLOLEDB;data
source=(local)\\SQLEXPRESS;database=ProgramListings;integrat ed
security=SSPI;sendStringParametersAsUnicode=true");
}
>> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
>> to ASCII so the data is messed up when loaded from XML to database.
>
>Does your XML include a processing instruction to specify UTF-8 encoding?
>For example:
>
>
Re: Encoding problem in SQLXML - bulkload
am 31.03.2008 01:34:10 von worlman385
Just solve the problem
use NVARCHAR instead of VARCHAR
NVARCHAR - support unicdoe
==============
CREATE TABLE ProgramListings.dbo.productionCrew
(
program NVARCHAR(20),
role NVARCHAR(20),
givenname NVARCHAR(20),
surname NVARCHAR(20),
PRIMARY KEY(program, role, surname)
)
>> I think the input is UTF-8 data but the SQLXML interface convert UTF-8
>> to ASCII so the data is messed up when loaded from XML to database.
>
>Your original post was encoded in iso-2022-cn, which may explain why I
>and Dan so very funny characters.
>
>Since your target columns are varchar, I need to ask: what is the collation
>of these columns? If that collation does not include $)A(& in its ANSI set,
>you cannot get the name right. Then again, then you should get "e". What
>you got appears to be UTF-8 interpreted as ANSI.
>
>I don't have any experience of XML bulkload, so I don't know what is
>going on. You could try to add
>
>
>
>first in the file, even though this should not be needed since UTF-8
>is the default for XML.
>
Re: Encoding problem in SQLXML - bulkload
am 31.03.2008 01:34:33 von worlman385
Just solve the problem
use NVARCHAR instead of VARCHAR
NVARCHAR - support unicdoe
==============
CREATE TABLE ProgramListings.dbo.productionCrew
(
program NVARCHAR(20),
role NVARCHAR(20),
givenname NVARCHAR(20),
surname NVARCHAR(20),
PRIMARY KEY(program, role, surname)
)
Re: Encoding problem in SQLXML - bulkload
am 31.03.2008 01:34:56 von worlman385
Just solve the problem
use NVARCHAR instead of VARCHAR
NVARCHAR - support unicdoe
==============
CREATE TABLE ProgramListings.dbo.productionCrew
(
program NVARCHAR(20),
role NVARCHAR(20),
givenname NVARCHAR(20),
surname NVARCHAR(20),
PRIMARY KEY(program, role, surname)
)
Re: Encoding problem in SQLXML - bulkload
am 31.03.2008 13:42:14 von Dan Guzman
> Just solve the problem
>
> use NVARCHAR instead of VARCHAR
I'm glad you were able to figure this out. I had assumed that your database
default collation was appropriate for the characters being stored.
--
Hope this helps.
Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/
wrote in message
news:4s80v3dun5cl5vf7vks6mnkqt7tdk116uq@4ax.com...
> Just solve the problem
>
> use NVARCHAR instead of VARCHAR
>
> NVARCHAR - support unicdoe
>
> ==============
>
> CREATE TABLE ProgramListings.dbo.productionCrew
> (
> program NVARCHAR(20),
> role NVARCHAR(20),
> givenname NVARCHAR(20),
> surname NVARCHAR(20),
> PRIMARY KEY(program, role, surname)
> )