CodePage directive ...or... UTF-8 to UCS-2

CodePage directive ...or... UTF-8 to UCS-2

am 08.11.2007 19:40:01 von Jamie Hates Computers

When I try to set the CodePage and/or CharSet of my Classic ASP page, I get
the opposite of the intended effect.

Specifically, I have a page that was accepting and displaying UTF-8 text
just fine. But, the server-side code inserts data into SQL Server, so I
followed the advice in the KB232580 article to set the ASP CodePage (to
automaticall convert to and from UCS-2 for MSSQL). The result? Now the page
can't even display the UTF-8 characters correctly, as it was doing before.
(And, of course, the SQL Server inserts are still not showing UTF-8.)

In other words, if I try to Response.Write the form field values, I get
garbled text where I used to see the correct characters before following the
article's advice.

I have tried using Session.CodePage and the CodePage directive (separately).
I have also tried setting Response.CharSet. After trying either of the
CodePage solutions, something very strange happens -- I CANNOT UNDO the
change. When I take out the code and upload the new page, the ASP continues
to misbehave, responding as if the directive was still there. It will not go
back to working as it did before I read that KB article. I can reboot the
server to get it to go back, but that's not very practical.

So --- Does anyone know why...

A) the solutions in that article don't work for me and/or...

B) why I cannot get the page to go back to working the way it used to when I
take the code out (even on a day when I never tried the Session solution,
only the page directive) and/or...

C) how the heck to I get UTF-8 characters converted to UCS-2 for my SQL
Server inserts if this solution doesn't work as advertised?

Many thanks, in advance.

Re: CodePage directive ...or... UTF-8 to UCS-2

am 08.11.2007 23:53:27 von Anthony Jones

"Jamie Hates Computers"
wrote in message news:34A82D54-2FD0-43D4-B063-0484BA97861F@microsoft.com...
> When I try to set the CodePage and/or CharSet of my Classic ASP page, I
get
> the opposite of the intended effect.
>
> Specifically, I have a page that was accepting and displaying UTF-8 text
> just fine. But, the server-side code inserts data into SQL Server, so I
> followed the advice in the KB232580 article to set the ASP CodePage (to
> automaticall convert to and from UCS-2 for MSSQL). The result? Now the
page
> can't even display the UTF-8 characters correctly, as it was doing before.
> (And, of course, the SQL Server inserts are still not showing UTF-8.)
>
> In other words, if I try to Response.Write the form field values, I get
> garbled text where I used to see the correct characters before following
the
> article's advice.
>
> I have tried using Session.CodePage and the CodePage directive
(separately).
> I have also tried setting Response.CharSet. After trying either of the
> CodePage solutions, something very strange happens -- I CANNOT UNDO the
> change. When I take out the code and upload the new page, the ASP
continues
> to misbehave, responding as if the directive was still there. It will not
go
> back to working as it did before I read that KB article. I can reboot the
> server to get it to go back, but that's not very practical.
>
> So --- Does anyone know why...
>
> A) the solutions in that article don't work for me and/or...
>
> B) why I cannot get the page to go back to working the way it used to when
I
> take the code out (even on a day when I never tried the Session solution,
> only the page directive) and/or...
>
> C) how the heck to I get UTF-8 characters converted to UCS-2 for my SQL
> Server inserts if this solution doesn't work as advertised?
>
> Many thanks, in advance.
>

The article is quite old and some what misleading in that it discusses UTF-8
characters coming into ASP being a problem for SQL Server. It isn't; its
already problem before the code goes anywhere near SQL Server.

The problem is this; When a browser posts a form it will have encoded the
characters using the CharSet of the Form page. When an Action page receives
a post from a form it assumes the characters are encoded according to its
current codepage. Therefore when the CharSet of the form page does not
match the Codepage of the action page the characters will be corrupted.

The reason you can't seem to get the characters rendered correctly anymore
is probably because what is stored in the DB is corrupt.



Here are the definitive guidelines for handling UTF-8 in a website:-
Save all pages as UTF-8 and place the Codepage=65001 directive at the top.
Make sure all pages specify Response.CharSet = "UTF-8". That's it.

Once you get rid of any corrupt data received when Forms were out of sync
your done.




Heres the kicker, a Form can be out-of-sync with itself. I've seen this
loads of times.

Response.CharSet = "UTF-8" is set but the codepage isn't.

The form uses itself as the action page.

User enters characters out of the ASCII range.

UTF-8 is post to the page on server but the characters are mis-interpreted
as Windows-1252 or some other codepage.

The now corrupt characters get placed in the database.

The corrupt characters are written to the response but are not encoded to
UTF-8 so because the codepage is Windows-1252 so the characters go back to
the client exactly as they were received.

However the client thinks its getting UTF-8 the corrupt characters are
therefore interpreted correctly since UTF-8 was how they start off.

In fact a site can operate for ages where the pages are using
Response.Charset = "UTF-8" but the wrong codepage. Everything would look ok
on the website. Its only when some other tool like a reporting tool queries
the DB that its discovered the there are corrupt characters present.

--
Anthony Jones - MVP ASP/ASP.NET

Re: CodePage directive ...or... UTF-8 to UCS-2

am 09.11.2007 22:36:46 von Egbert Nierop

"Jamie Hates Computers"
wrote in message news:34A82D54-2FD0-43D4-B063-0484BA97861F@microsoft.com...
> When I try to set the CodePage and/or CharSet of my Classic ASP page, I
> get
> the opposite of the intended effect.
>
> Specifically, I have a page that was accepting and displaying UTF-8 text
> just fine. But, the server-side code inserts data into SQL Server, so I
> followed the advice in the KB232580 article to set the ASP CodePage (to
> automaticall convert to and from UCS-2 for MSSQL). The result? Now the
> page
> can't even display the UTF-8 characters correctly, as it was doing before.
> (And, of course, the SQL Server inserts are still not showing UTF-8.)
>

Do you use nvarchar/nchar/ntext etc?