Encoding problem......

Encoding problem......

am 05.12.2006 18:34:28 von Atlas

I'm working on a multilanguage ASP/HTML site using a IIS6 web server.

It perfectly works with two languages (english and italian) in this way:
- basically the same ASP code for every language
- language-specific content is stored in text files, every language has it's
own directory contents.
- to enhance usability and formatting the language-specific contents are
stored with html syntax; basically the code that normally stands between the
and statements.
- the ASP code loads those (html) files and it arranges them with
dinamically generated html code, then it outputs everything to the browser.

No problems here. It works

So I've decided to give it a try with chinese and to accomplish the task I
did:

- Inserted the following code in the .asp pages

<%@ CodePage=65001 Language="VBScript"%>
<%
Response.CodePage = 65001
Response.CharSet = "utf-8"
%>

- Changed the HTML header of output pages:



- Saved the asp pages containing chinese characters in UTF-8 encoding

Till here everything works properly and the browser (IE7) shows chinese
characters correctly.

Now the problem.

As I've said before ASP code loads some html code from text files on disk
and formats them. Those html templates have the chinese-counterpart text
inside and are stored in UTF-8, nevertheless when showed into the browser
the characters are scrambled up in something else. The chinese characters
generated by ASP are shown correctly, but the contents in the text files are
not!

Could it be that when the ASP code loads the text file from disk, the
contents gets screwed up
or
the web server tries to 'translate' the allready UTF-8 encoded text?

Re: Encoding problem......

am 06.12.2006 13:03:08 von Atlas

Any help ??? :-)




"Atlas" wrote in message
news:12nbbd8iq2bg952@news.supernews.com...
> I'm working on a multilanguage ASP/HTML site using a IIS6 web server.
>
> It perfectly works with two languages (english and italian) in this way:
> - basically the same ASP code for every language
> - language-specific content is stored in text files, every language has
> it's own directory contents.
> - to enhance usability and formatting the language-specific contents are
> stored with html syntax; basically the code that normally stands between
> the and statements.
> - the ASP code loads those (html) files and it arranges them with
> dinamically generated html code, then it outputs everything to the
> browser.
>
> No problems here. It works
>
> So I've decided to give it a try with chinese and to accomplish the task I
> did:
>
> - Inserted the following code in the .asp pages
>
> <%@ CodePage=65001 Language="VBScript"%>
> <%
> Response.CodePage = 65001
> Response.CharSet = "utf-8"
> %>
>
> - Changed the HTML header of output pages:
>
>
>
> - Saved the asp pages containing chinese characters in UTF-8 encoding
>
> Till here everything works properly and the browser (IE7) shows chinese
> characters correctly.
>
> Now the problem.
>
> As I've said before ASP code loads some html code from text files on disk
> and formats them. Those html templates have the chinese-counterpart text
> inside and are stored in UTF-8, nevertheless when showed into the browser
> the characters are scrambled up in something else. The chinese characters
> generated by ASP are shown correctly, but the contents in the text files
> are not!
>
> Could it be that when the ASP code loads the text file from disk, the
> contents gets screwed up
> or
> the web server tries to 'translate' the allready UTF-8 encoded text?
>
>
>
>
>
>

Re: Encoding problem......

am 06.12.2006 13:51:26 von Anthony Jones

"Atlas" wrote in message
news:12nbbd8iq2bg952@news.supernews.com...
> I'm working on a multilanguage ASP/HTML site using a IIS6 web server.
>
> It perfectly works with two languages (english and italian) in this way:
> - basically the same ASP code for every language
> - language-specific content is stored in text files, every language has
it's
> own directory contents.
> - to enhance usability and formatting the language-specific contents are
> stored with html syntax; basically the code that normally stands between
the
> and statements.
> - the ASP code loads those (html) files and it arranges them with
> dinamically generated html code, then it outputs everything to the
browser.
>
> No problems here. It works
>
> So I've decided to give it a try with chinese and to accomplish the task I
> did:
>
> - Inserted the following code in the .asp pages
>
> <%@ CodePage=65001 Language="VBScript"%>
> <%
> Response.CodePage = 65001
> Response.CharSet = "utf-8"
> %>
>
> - Changed the HTML header of output pages:
>
>
>
> - Saved the asp pages containing chinese characters in UTF-8 encoding
>
> Till here everything works properly and the browser (IE7) shows chinese
> characters correctly.
>
> Now the problem.
>
> As I've said before ASP code loads some html code from text files on disk
> and formats them. Those html templates have the chinese-counterpart text
> inside and are stored in UTF-8, nevertheless when showed into the browser
> the characters are scrambled up in something else. The chinese characters
> generated by ASP are shown correctly, but the contents in the text files
are
> not!

Clearly the focal point of this problem is the code you use to place the
content of these text files into the response. You may be able to elicit
more help if you post the example code you are using to place the content
into the response.

However my guess is you are using Scripting.FileSystemObject to do this.
This object does not support UTF-8.



>
> Could it be that when the ASP code loads the text file from disk, the
> contents gets screwed up
> or
> the web server tries to 'translate' the allready UTF-8 encoded text?
>
>
>
>
>
>

Re: Encoding problem......

am 06.12.2006 14:48:20 von Atlas

> Clearly the focal point of this problem is the code you use to place the
> content of these text files into the response. You may be able to elicit
> more help if you post the example code you are using to place the content
> into the response.
>
> However my guess is you are using Scripting.FileSystemObject to do this.
> This object does not support UTF-8.

Ditto!

I'am using Scripting.FileSystemObject to load contents from disk!

Are there some solutions?

Re: Encoding problem......

am 06.12.2006 14:53:36 von Atlas

Anthony I'm posting the code I use to load UTF-8 coded files form disk

Function loadContent(functUriZ)
myToken = "@#$"

if functUriZ = "" then
cliUrl = request.servervariables("URL")
else
cliUrl = functUriZ
end if

Set objFSO = Server.CreateObject("Scripting.FileSystemObject")
begz = len(cliUrl)
while mid(cliUrl,begz,1) <> "/"
begz = begz - 1
wend
endz = InStr(begz,cliUrl,".")
strFileName = Server.MapPath("content/") & "\" &
mid(cliUrl,begz+1,endz-begz) & "htm"

If (objFSO.FileExists(strFileName))=true Then

Set TS = objFSO.OpenTextFile(strFileName, 1, false)
temp = empty
idx = -1
p = 1
while (Not TS.AtEndOfStream)
temp = TS.Readline
if mid(temp,1,len(myToken)) = myToken then
idx = idx + 1
dynamicContent(idx) = ""
p = 4
else
p = 1
end if
dynamicContent(idx) = dynamicContent(idx) & mid(temp,p,len(temp) - p +
1)
wend

TS.close()
End If
End Function

Re: Encoding problem......

am 06.12.2006 17:06:36 von Anthony Jones

"Atlas" wrote in message
news:12ndih8rah68496@news.supernews.com...
> > Clearly the focal point of this problem is the code you use to place the
> > content of these text files into the response. You may be able to
elicit
> > more help if you post the example code you are using to place the
content
> > into the response.
> >
> > However my guess is you are using Scripting.FileSystemObject to do this.
> > This object does not support UTF-8.
>
> Ditto!
>
> I'am using Scripting.FileSystemObject to load contents from disk!
>
> Are there some solutions?
>

Yes use XML and XSL.

Re: Encoding problem......

am 06.12.2006 17:08:09 von Anthony Jones

"Atlas" wrote in message
news:12ndir43hedrg52@news.supernews.com...
> Anthony I'm posting the code I use to load UTF-8 coded files form disk
>
> Function loadContent(functUriZ)
> myToken = "@#$"
>
> if functUriZ = "" then
> cliUrl = request.servervariables("URL")
> else
> cliUrl = functUriZ
> end if
>
> Set objFSO = Server.CreateObject("Scripting.FileSystemObject")
> begz = len(cliUrl)
> while mid(cliUrl,begz,1) <> "/"
> begz = begz - 1
> wend
> endz = InStr(begz,cliUrl,".")
> strFileName = Server.MapPath("content/") & "\" &
> mid(cliUrl,begz+1,endz-begz) & "htm"
>
> If (objFSO.FileExists(strFileName))=true Then
>
> Set TS = objFSO.OpenTextFile(strFileName, 1, false)
> temp = empty
> idx = -1
> p = 1
> while (Not TS.AtEndOfStream)
> temp = TS.Readline
> if mid(temp,1,len(myToken)) = myToken then
> idx = idx + 1
> dynamicContent(idx) = ""
> p = 4
> else
> p = 1
> end if
> dynamicContent(idx) = dynamicContent(idx) & mid(temp,p,len(temp) - p +
> 1)
> wend
>
> TS.close()
> End If
> End Function
>

Another option would be to save your text files as Unicode.
FileSystemObject can read unicode files

However if you take the learning curve XML/XSL is a more appropriate
solution.

Re: Encoding problem......

am 06.12.2006 17:38:08 von Atlas

>
>
> Another option would be to save your text files as Unicode.
> FileSystemObject can read unicode files
>
> However if you take the learning curve XML/XSL is a more appropriate
> solution.
>

Oooooooooooooohhh yes!!!!!!!!
changed the format to unicode and opentextfile to -1 (unicode) and voilà!
working perfectly!!!!!

Thanks a lot!!!

Re: Encoding problem......

am 07.12.2006 13:31:41 von Atlas

>> Another option would be to save your text files as Unicode.
>> FileSystemObject can read unicode files
>>
>> However if you take the learning curve XML/XSL is a more appropriate
>> solution.
>>
>
> Oooooooooooooohhh yes!!!!!!!!
> changed the format to unicode and opentextfile to -1 (unicode) and voilà!
> working perfectly!!!!!
>
> Thanks a lot!!!

Sadness after happiness.

Unfortunatelly I was successfully testing your suggestions on a IIS6, and
the production server is running on IIS 5.0.

As a result it returned some errors on response.codepage statements
(unsupported).
So I had a go with session.codepage, but I'm getting unwanted results.
Should I move to another ISP or I still can try something.
I'm not sure at this point that IIS 5.0 is capable of handling UTF-8 and
Unicode code mixture......

Re: Encoding problem......

am 07.12.2006 15:51:49 von Anthony Jones

"Atlas" wrote in message
news:12ng2dgtf61m5f2@news.supernews.com...
> >> Another option would be to save your text files as Unicode.
> >> FileSystemObject can read unicode files
> >>
> >> However if you take the learning curve XML/XSL is a more appropriate
> >> solution.
> >>
> >
> > Oooooooooooooohhh yes!!!!!!!!
> > changed the format to unicode and opentextfile to -1 (unicode) and
voilà!
> > working perfectly!!!!!
> >
> > Thanks a lot!!!
>
> Sadness after happiness.
>
> Unfortunatelly I was successfully testing your suggestions on a IIS6, and
> the production server is running on IIS 5.0.
>
> As a result it returned some errors on response.codepage statements
> (unsupported).
> So I had a go with session.codepage, but I'm getting unwanted results.
> Should I move to another ISP or I still can try something.
> I'm not sure at this point that IIS 5.0 is capable of handling UTF-8 and
> Unicode code mixture......
>
>

Problem is Response.codepage is new in IIS6 wasn't present on IIS5.
Session.CodePage will stick for the duration of the session. Hence any
pages that do not assign to Session.CodePage will end up using what ever
codepage was last set.

A kludgy alternative is:-

Dim lCodePage : lCodePage = Session.CodePage
Session.CodePage = WhateEverYourDesiredCodePageIs

..
.. Do all your stuff here
..

Session.CodePage = lCodePage

Or make sure all your pages throughout the whole application specify the
Session.Codepage applicable.

Personally I would use Response.charset=UTF-8 and Session.CodePage=65001
throughout the whole site. Just be sure to save any static content that
contains characters outside the ASCII range as UTF-8 files and don't use any
such characters in script literals. Your Unicode text file based approach
will still work since you are no doubt using Response.Write to send the
content.

Re: Encoding problem......

am 09.12.2006 19:21:13 von Atlas

> Problem is Response.codepage is new in IIS6 wasn't present on IIS5.
> Session.CodePage will stick for the duration of the session. Hence any
> pages that do not assign to Session.CodePage will end up using what ever
> codepage was last set.
>
> A kludgy alternative is:-
>
> Dim lCodePage : lCodePage = Session.CodePage
> Session.CodePage = WhateEverYourDesiredCodePageIs
>
> .
> . Do all your stuff here
> .
>
> Session.CodePage = lCodePage
>
> Or make sure all your pages throughout the whole application specify the
> Session.Codepage applicable.
>
> Personally I would use Response.charset=UTF-8 and Session.CodePage=65001
> throughout the whole site. Just be sure to save any static content that
> contains characters outside the ASCII range as UTF-8 files and don't use
> any
> such characters in script literals.

Dunno if my approach is overloading, but I've taken a transactional
approach, so every asp page includes some initalizing asp pages. I could set
in those includes the session settings


> Your Unicode text file based approach
> will still work since you are no doubt using Response.Write to send the
> content.

Not always; often the asp code contains HTML code plus some <%=something%>.
Is it equivalent to a response.write?

Re: Encoding problem......

am 09.12.2006 20:53:18 von Anthony Jones

"Atlas" wrote in message
news:u8Deh.18332$P04.9810@tornado.fastwebnet.it...
>
> > Problem is Response.codepage is new in IIS6 wasn't present on IIS5.
> > Session.CodePage will stick for the duration of the session. Hence any
> > pages that do not assign to Session.CodePage will end up using what ever
> > codepage was last set.
> >
> > A kludgy alternative is:-
> >
> > Dim lCodePage : lCodePage = Session.CodePage
> > Session.CodePage = WhateEverYourDesiredCodePageIs
> >
> > .
> > . Do all your stuff here
> > .
> >
> > Session.CodePage = lCodePage
> >
> > Or make sure all your pages throughout the whole application specify the
> > Session.Codepage applicable.
> >
> > Personally I would use Response.charset=UTF-8 and Session.CodePage=65001
> > throughout the whole site. Just be sure to save any static content that
> > contains characters outside the ASCII range as UTF-8 files and don't use
> > any
> > such characters in script literals.
>
> Dunno if my approach is overloading, but I've taken a transactional
> approach, so every asp page includes some initalizing asp pages. I could
set
> in those includes the session settings
>

I'm not sure what 'transactional approach' means in this context however if
every page shares a common include, then yes placing the lines:-

Session.CodePage = 65001
Response.ContentType = "text/thml"
Response.CharSet = UTF-8

Would ensure everything ends up as UTF-8 when sent to the client.

>
> > Your Unicode text file based approach
> > will still work since you are no doubt using Response.Write to send the
> > content.
>
> Not always; often the asp code contains HTML code plus some
<%=something%>.
> Is it equivalent to a response.write?

Not it's equivalent to Response.BinaryWrite with the chunk of bytes outside
of the script delimiters being sent. Hence HTML code saved in an ASP file
needs to be encoded as per the Response.CharSet value sent to the client.
If the HTML is entirely composed of ASCII characters (0-127) then even a
file saved in in an ANSI format will be ok. However where the HTML code
contains characters outside this range you will need to save the file in
UTF-8 encoding. The only limitation here is you can't then use characters
outside the ASCII range in stings literal (contants) inside the ASP script
code.





>
>

Re: Encoding problem......

am 19.12.2006 19:33:43 von Atlas

"Anthony Jones" wrote in message
news:uK8Utu8GHHA.4116@TK2MSFTNGP05.phx.gbl...
>
> "Atlas" wrote in message
> news:u8Deh.18332$P04.9810@tornado.fastwebnet.it...
>>
>> > Problem is Response.codepage is new in IIS6 wasn't present on IIS5.
>> > Session.CodePage will stick for the duration of the session. Hence any
>> > pages that do not assign to Session.CodePage will end up using what
>> > ever
>> > codepage was last set.
>> >
>> > A kludgy alternative is:-
>> >
>> > Dim lCodePage : lCodePage = Session.CodePage
>> > Session.CodePage = WhateEverYourDesiredCodePageIs
>> >
>> > .
>> > . Do all your stuff here
>> > .
>> >
>> > Session.CodePage = lCodePage
>> >
>> > Or make sure all your pages throughout the whole application specify
>> > the
>> > Session.Codepage applicable.
>> >
>> > Personally I would use Response.charset=UTF-8 and
>> > Session.CodePage=65001
>> > throughout the whole site. Just be sure to save any static content
>> > that
>> > contains characters outside the ASCII range as UTF-8 files and don't
>> > use
>> > any
>> > such characters in script literals.
>>
>> Dunno if my approach is overloading, but I've taken a transactional
>> approach, so every asp page includes some initalizing asp pages. I could
> set
>> in those includes the session settings
>>
>
> I'm not sure what 'transactional approach' means in this context however
> if
> every page shares a common include, then yes placing the lines:-
>
> Session.CodePage = 65001
> Response.ContentType = "text/thml"
> Response.CharSet = UTF-8
>
> Would ensure everything ends up as UTF-8 when sent to the client.
>
>>
>> > Your Unicode text file based approach
>> > will still work since you are no doubt using Response.Write to send the
>> > content.
>>
>> Not always; often the asp code contains HTML code plus some
> <%=something%>.
>> Is it equivalent to a response.write?
>
> Not it's equivalent to Response.BinaryWrite with the chunk of bytes
> outside
> of the script delimiters being sent. Hence HTML code saved in an ASP file
> needs to be encoded as per the Response.CharSet value sent to the client.
> If the HTML is entirely composed of ASCII characters (0-127) then even a
> file saved in in an ANSI format will be ok. However where the HTML code
> contains characters outside this range you will need to save the file in
> UTF-8 encoding. The only limitation here is you can't then use characters
> outside the ASCII range in stings literal (contants) inside the ASP script
> code.
>
>

Anthony I had some tries using your advises, but still getting garbage out
on the browser, and thought I had some troubles with the server, so I've
asked the ISP to move our site to IIS6 and they did, when I've discovered
that I was still getting garbage on the new server; finally I've discovered
that when trasferring the unicode/UTF8 pages by FTP, the client was
scrambling up the contents 'cause transferring in A (ASCII) mode. Once
switched transfers to I (binary) it worked perfectly. So probabily it would
have worked using your tips also on IIS5.

Nevertheless, thanks a lot for helping