Using SSI to include a UTF-8 encoded file causes a strangecharacter to be sent to the browser
am 07.10.2009 10:20:01 von Chris Biggs
------=_Part_51_5775824.1254903601226
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Hi,
Environment:
Apache 2.2.13
Windows XP SP3
Background:
My intention is to use SSI to include one html file in another. The text in both can be
either English or Russian - or a combination. So I used UTF-8 encoding for the files as
it seemed to be the right thing to do.
Problem:
The two files HTML files that can be used to demonstrate this problem are as follows:
outer.shtml:
utf8.utf8:
As can be seen outer.shtml includes the file utf8.utf8.
When these files are saved as "ANSI" (using Notepad) and call the included file "something.htm",
they display correctly in a browser as you would expect. However, when I save them as UTF-8
(again using Notepad) I seem to get a strange character sent to the browser, just before the
start of the included file. I have attached the file received by the browser. (see outer[1].txt
in the attached zip file). As you may be able to see there is an "odd" character between
"" and ". This has the effect of pushing the inner table down a line which
is not wanted.
I have modified the httpd.conf only to allow me to have 4 virtual hosts, enable SSI and, when I
noticed the problem, I also uncommented the line:
Include conf/extra/httpd-languages.conf
As part of by attempt to fix the problem, I have also added .shtml to a line in the above file
as follows:
AddCharset UTF-8 .utf8 .shtml
However, nothing seems to stop this odd character being presented to the browser.
If you could provide some help or guidance to solve this problem - or perhaps suggest whether
this is a bug and should be logged, I would be grateful.
Many Thanks for your attention thus far.
Regards,
Christopher Biggs
------=_Part_51_5775824.1254903601226
Content-Type: application/x-zip-compressed; name=output.zip
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename=output.zip
UEsDBBQAAgAIAFeLRjsgHU38UgAAAIIAAAAMAAAAb3V0ZXJbMV0udHh0e797 v01GSW6OHS+XTVJ+
SiWILklMyklVSMovSkktslUyVLKzKSkC4hSFvPzyosQCu/dAPWA1MAm7ChAo ttEHMoFEEYiAyGOI
AI3Xh9mjD7YYAFBLAQIUABQAAgAIAFeLRjsgHU38UgAAAIIAAAAMAAAAAAAA AAEAIAAAAAAAAABv
dXRlclsxXS50eHRQSwUGAAAAAAEAAQA6AAAAfAAAAAAA
------=_Part_51_5775824.1254903601226
Content-Type: text/plain; charset=us-ascii
------------------------------------------------------------ ---------
The official User-To-User support forum of the Apache HTTP Server Project.
See for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
------=_Part_51_5775824.1254903601226--
Re: Using SSI to include a UTF-8 encoded file causesa strange character to be sent to the browseram 07.10.2009 10:55:33 von aw
Hi.
Chris Biggs wrote:
....
> When these files are saved as "ANSI" (using Notepad)
(or rather in this case, as UTF-8)
Tips :
1) *don't use Notepad to edit HTML pages*. Use a real editor, properly
aware of character sets and encodings, and which will highlight
incorrect UTF-8 characters.
Notepad has a big problem when saving UTF-8 encoded files : it writes a
"BOM" at the beginning of the file, which is not only totally
unnecessary for UTF-8, but also confuses other programs.
A BOM is a sequence of 2 or 3 bytes, meant in some cases to indicate the
"byte order" of the file that follows.
For UTF-8, there is only one valid byte order, so the BOM is not
necessary and could/should be ignored.
However, when such a file with a BOM prefix is being included by some
software in the middle of another file (as you do with SSI), it usually
causes the kind of problem you are seeing : "bizarre" characters in the
middle.
2) use a proper in the section of your html files. That should
tell the browser what the encoding of the page is.
3) But this is really only a substitute for the real standard-conformant
way of indicating the encoding to the browser : the webserver should
send, with each html page, a HTTP header like :
Content-type: text/html; charset=UTF-8
Unfortunately, MS's IE (all versions and sub-versions) have a long
history of ignoring or misinterpreting this part of the HTTP RFC, and
deciding themselves what content the document has.
This is *wrong*, but unfortunately also, in the real world IE is much
used, so one has to learn to work around this.
------------------------------------------------------------ ---------
The official User-To-User support forum of the Apache HTTP Server Project.
See for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: Using SSI to include a UTF-8 encoded file causes aam 07.10.2009 13:49:17 von Jan Ingvoldstad
--0016e6d99a7fca58b9047556ef6d
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
On Wed, Oct 7, 2009 at 10:55 AM, Andr=E9 Warnier wrote:
> 1) *don't use Notepad to edit HTML pages*. Use a real editor, properly
> aware of character sets and encodings, and which will highlight incorrect
> UTF-8 characters.
> Notepad has a big problem when saving UTF-8 encoded files : it writes a
> "BOM" at the beginning of the file, which is not only totally unnecessary
> for UTF-8, but also confuses other programs.
> A BOM is a sequence of 2 or 3 bytes, meant in some cases to indicate the
> "byte order" of the file that follows.
>
Just for the sake of information, DreamWeaver MX also pulls this nice stunt=
,
_also when editing PHP files_.
This can lead to annoying problems when a PHP script tries to modify the
HTTP headers, since the headers will already have been written, and
(depending on PHP solution; mod_php, fastcgi, suphp etc.) will produce nast=
y
errors.
When you open a file with a BOM in a UTF-8 aware editor, the BOM is hidden.
Software producing BOM makes things go BOOM.
--=20
Jan
--0016e6d99a7fca58b9047556ef6d
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
On Wed, Oct 7, 2009 at 10:55 AM, Andr=E9 Warnier=
<&g=
t; wrote:
1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex=
;">
1) *don't use Notepad to edit HTML pages*. =A0Use a real editor, proper=
ly aware of character sets and encodings, and which will highlight incorrec=
t UTF-8 characters.
Notepad has a big problem when saving UTF-8 encoded files : it writes a &qu=
ot;BOM" at the beginning of the file, which is not only totally unnece=
ssary for UTF-8, but also confuses other programs.
A BOM is a sequence of 2 or 3 bytes, meant in some cases to indicate the &q=
uot;byte order" of the file that follows. Jus=
t for the sake of information, DreamWeaver MX also pulls this nice stunt, _=
also when editing PHP files_.
This can lead to annoying problems when a PHP script tries to modify th=
e HTTP headers, since the headers will already have been written, and (depe=
nding on PHP solution; mod_php, fastcgi, suphp etc.) will produce nasty err=
ors.
When you open a file with a BOM in a UTF-8 aware editor, the BOM is hid=
den.
Software producing BOM makes things go BOOM.
-- =
Jan
--0016e6d99a7fca58b9047556ef6d--
|