mail() encoding problems
am 19.08.2007 16:16:56 von Nook
Hi all,
Another problem that has been bugging me for a while now, but which I swept
under the rug too long too now is a mail encoding problem at my (shared)
webhost.
The problem is that on different occassions when I send the exact same mail
(same e-mail address, same name, same body content) through my site's
contactform it produces the wellknown strange characters instead of the
intended diacritical characters. This problem also shows up at the
contactform for a client of mine, which site is hosted on the same webhost.
I've contacted my webhost a few times already, but they don't know what
could be the cause of the problem either.
This is the setup I'm using on my webhost:
The host runs fast-cgi PHP5 with customizable php.ini files per
domain/website on Slackware Linux.
All my php files are saved as 1252 ANSI. All my HTML output have
meta tags. The difference in these two encodings shouldn't be a problem is
my guess. But perhaps I'm wrong?
And my Mailer class uses the following code to send mail:
public function send()
{
if( is_null( $this->getErrors() ) )
{
$cc = implode( ', ', $this->cc );
$bcc = implode( ', ', $this->bcc );
$to = implode( ', ', $this->to );
$headers = 'From: ' . $this->from . LF;
if ( !empty( $cc ) ) $headers .= 'Cc: ' . $cc . LF;
if ( !empty( $bcc ) ) $headers .= 'Bcc: ' . $bcc . LF;
$headers .= 'MIME-Version: 1.0' . LF;
$headers .= 'X-Mailer: ' . $this->mailerName . LF;
$headers .= 'Content-Type: text/plain; charset=iso-8859-1' . LF;
$headers .= 'Content-Transfer-Encoding: 8bit' . LF;
$subject = $this->subject;
$body = $this->body;
if( mail( $to, $subject, $body, $headers ) )
{
return true;
}
$this->setError( 'general', 'error', basename( __FILE__, '.php' ),
__LINE__ );
return false;
}
else
{
return false;
}
}
Sometimes, when I run in to the problem I change the following lines in my
send() function to utf8 encoding
$headers .= 'Content-Type: text/plain; charset=utf-8' . LF;
$body = utf8_encode( $this->body );
Then, it seems to work for a while. But then all of a sudden it shows the
same problem again. Eventhough I send the same test mail.
The body content I send is:
test ëèï
Which shows up as:
test ëèï
Does anybody have any idea what might be causing this weird problem?
Thanks
Re: mail() encoding problems
am 19.08.2007 17:18:54 von Nook
amygdala wrote:
> Hi all,
>
>
> The body content I send is:
>
> test ëèï
>
> Which shows up as:
>
> test ëèï
>
Alright, I've done another test. I sent the exact same mail twice in a row
(only seconds apart). One ends up as 'weird', the other normal.
Using webmail (roundcube) at my webhost , I am able to view the whole
sourcecode of the mail. I saved both as textfiles, and analyzed them with
UltraEdit's compare file function.
No differences (apart from the usual message-id's, etc.). But the strange
part is, UltraEdit's UltraCompare determines one mail sourcecode to be UTF-8
encoded, and the other as being ANSI encoded.
I am really stunned here. What could be going on here?
Re: mail() encoding problems
am 22.08.2007 01:15:29 von Manuel Lemos
Hello,
on 08/19/2007 11:16 AM amygdala said the following:
> Sometimes, when I run in to the problem I change the following lines in my
> send() function to utf8 encoding
>
> $headers .= 'Content-Type: text/plain; charset=utf-8' . LF;
>
> $body = utf8_encode( $this->body );
>
> Then, it seems to work for a while. But then all of a sudden it shows the
> same problem again. Eventhough I send the same test mail.
>
> The body content I send is:
>
> test ëèï
>
> Which shows up as:
>
> test ëèï
>
> Does anybody have any idea what might be causing this weird problem?
Those are the characters you typed encoded as UTF-8. It seems correct
but pointless. If you have only windows-1252 characters, there is no
need to convert them into utf-8. It is not wrong but it is useless.
The only thing really wrong is that you should not send 8 bit encoded
messages as many mail gateways do not supported. Instead of 8 bit you
should use quoted-printable encoding.
You can also have 8 bit characters in the headers but they must be
encoded with q-encoding to avoid the same problem with the message body.
This is a bit complicated to encode by hand. I use this MIME message
composing and sending class to take care of all that for me.
http://www.phpclasses.org/mimemessage
--
Regards,
Manuel Lemos
Metastorage - Data object relational mapping layer generator
http://www.metastorage.net/
PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
Re: mail() encoding problems
am 22.08.2007 21:00:36 von Nook
"Manuel Lemos" schreef in bericht
news:fafrmh$dsl$1@aioe.org...
> Hello,
Hi Manuel,
Thanks for the response.
> The only thing really wrong is that you should not send 8 bit encoded
> messages as many mail gateways do not supported. Instead of 8 bit you
> should use quoted-printable encoding.
Alright, I didn't know this. This makes sense. Cause like I said, even
without me encoding it as utf-8, the problem randomly occured. Perhaps the
apparent random appearance of strange characters and/or missing of
diactrical characters could be explained due to the fact that subsequent
mail messages can get send through different mail gateways to their
end-destination, just like a tcp packets? Or is that not correct? Not very
imporant for me to get answered, but it would help me get a better
understanding of things. ;-)
> You can also have 8 bit characters in the headers but they must be
> encoded with q-encoding to avoid the same problem with the message body.
I probably won't be needing that at this point in time. But that's good to
know also.
> This is a bit complicated to encode by hand. I use this MIME message
> composing and sending class to take care of all that for me.
>
> http://www.phpclasses.org/mimemessage
>
Although I only had a quick glance at your class, and it probably does the
job well, it looks like a bit of overkill for my purposes. So in conclusion,
is it fair to say that I only need to change:
$headers .= 'Content-Transfer-Encoding: 8bit' . LF;
to
$headers .= 'Content-Transfer-Encoding: quoted-printable' . LF;
and leave
$headers .= 'Content-Type: text/plain; charset=iso-8859-1' . LF;
as is?
Thank you in advance.
Re: mail() encoding problems
am 22.08.2007 21:11:34 von Nook
"amygdala" schreef in bericht
news:46cc8851$0$25492$9a622dc7@news.kpnplanet.nl...
>
> Although I only had a quick glance at your class, and it probably does the
> job well, it looks like a bit of overkill for my purposes. So in
> conclusion, is it fair to say that I only need to change:
>
> $headers .= 'Content-Transfer-Encoding: 8bit' . LF;
>
> to
>
> $headers .= 'Content-Transfer-Encoding: quoted-printable' . LF;
>
> and leave
>
> $headers .= 'Content-Type: text/plain; charset=iso-8859-1' . LF;
>
> as is?
>
To answer my own question, it doesn't suffice. Without charset=utf-8, my
test mail (sometimes) still shows up as
test ëèï
Re: mail() encoding problems
am 22.08.2007 23:23:51 von Manuel Lemos
Hello,
on 08/22/2007 04:00 PM amygdala said the following:
>
>
>> The only thing really wrong is that you should not send 8 bit encoded
>> messages as many mail gateways do not supported. Instead of 8 bit you
>> should use quoted-printable encoding.
>
> Alright, I didn't know this. This makes sense. Cause like I said, even
> without me encoding it as utf-8, the problem randomly occured. Perhaps the
> apparent random appearance of strange characters and/or missing of
> diactrical characters could be explained due to the fact that subsequent
> mail messages can get send through different mail gateways to their
> end-destination, just like a tcp packets? Or is that not correct? Not very
> imporant for me to get answered, but it would help me get a better
> understanding of things. ;-)
No, the diacritical characters appear when transform your text to UTF-8
.. UTF-8 still uses 8 bits per character. If you read the mail message
that is sent you see those characters because whatever console or text
display program you are using does not decode UTF-8 and show the correct
characters.
>> You can also have 8 bit characters in the headers but they must be
>> encoded with q-encoding to avoid the same problem with the message body.
>
> I probably won't be needing that at this point in time. But that's good to
> know also.
>
>> This is a bit complicated to encode by hand. I use this MIME message
>> composing and sending class to take care of all that for me.
>>
>> http://www.phpclasses.org/mimemessage
>>
>
> Although I only had a quick glance at your class, and it probably does the
> job well, it looks like a bit of overkill for my purposes. So in conclusion,
> is it fair to say that I only need to change:
>
> $headers .= 'Content-Transfer-Encoding: 8bit' . LF;
>
> to
>
> $headers .= 'Content-Transfer-Encoding: quoted-printable' . LF;
>
> and leave
>
> $headers .= 'Content-Type: text/plain; charset=iso-8859-1' . LF;
>
> as is?
No, you need actually encode your body data using quoted-printable. Just
changing the headers does not do it. Using quoted-printable 8 bit and
non-printable characters are transformed in escaped sequences of ASCII
(7 bit) characters.
--
Regards,
Manuel Lemos
Metastorage - Data object relational mapping layer generator
http://www.metastorage.net/
PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
Re: mail() encoding problems
am 23.08.2007 09:44:21 von Nook
"Manuel Lemos" schreef in bericht
news:fai9hl$7o8$1@aioe.org...
> Hello,
>
> on 08/22/2007 04:00 PM amygdala said the following:
>>
>>
>>> The only thing really wrong is that you should not send 8 bit encoded
>>> messages as many mail gateways do not supported. Instead of 8 bit you
>>> should use quoted-printable encoding.
>>
>> Alright, I didn't know this. This makes sense. Cause like I said, even
>> without me encoding it as utf-8, the problem randomly occured. Perhaps
>> the
>> apparent random appearance of strange characters and/or missing of
>> diactrical characters could be explained due to the fact that subsequent
>> mail messages can get send through different mail gateways to their
>> end-destination, just like a tcp packets? Or is that not correct? Not
>> very
>> imporant for me to get answered, but it would help me get a better
>> understanding of things. ;-)
>
> No, the diacritical characters appear when transform your text to UTF-8
> . UTF-8 still uses 8 bits per character. If you read the mail message
> that is sent you see those characters because whatever console or text
> display program you are using does not decode UTF-8 and show the correct
> characters.
>
But what could be causing my characters to randomly be transformed to UTF-8?
Do you have any idea? I don't do it anywhere in my application (except for
the test I stated in my first mail)? I've changed everything back to how it
was though, and now it still shows
test ëèï
and sometimes
test ëèï
without me changing anything in the programs code? That is weird, isn't it?
>>> You can also have 8 bit characters in the headers but they must be
>>> encoded with q-encoding to avoid the same problem with the message body.
>>
>> I probably won't be needing that at this point in time. But that's good
>> to
>> know also.
>>
>>> This is a bit complicated to encode by hand. I use this MIME message
>>> composing and sending class to take care of all that for me.
>>>
>>> http://www.phpclasses.org/mimemessage
>>>
>>
>> Although I only had a quick glance at your class, and it probably does
>> the
>> job well, it looks like a bit of overkill for my purposes. So in
>> conclusion,
>> is it fair to say that I only need to change:
>>
>> $headers .= 'Content-Transfer-Encoding: 8bit' . LF;
>>
>> to
>>
>> $headers .= 'Content-Transfer-Encoding: quoted-printable' . LF;
>>
>> and leave
>>
>> $headers .= 'Content-Type: text/plain; charset=iso-8859-1' . LF;
>>
>> as is?
>
> No, you need actually encode your body data using quoted-printable. Just
> changing the headers does not do it. Using quoted-printable 8 bit and
> non-printable characters are transformed in escaped sequences of ASCII
> (7 bit) characters.
>
Alright, I'll give that a try, and report back. Thanks.
Re: mail() encoding problems
am 23.08.2007 20:18:29 von Manuel Lemos
Hello,
on 08/23/2007 04:44 AM amygdala said the following:
>>>> The only thing really wrong is that you should not send 8 bit encoded
>>>> messages as many mail gateways do not supported. Instead of 8 bit you
>>>> should use quoted-printable encoding.
>>> Alright, I didn't know this. This makes sense. Cause like I said, even
>>> without me encoding it as utf-8, the problem randomly occured. Perhaps
>>> the
>>> apparent random appearance of strange characters and/or missing of
>>> diactrical characters could be explained due to the fact that subsequent
>>> mail messages can get send through different mail gateways to their
>>> end-destination, just like a tcp packets? Or is that not correct? Not
>>> very
>>> imporant for me to get answered, but it would help me get a better
>>> understanding of things. ;-)
>> No, the diacritical characters appear when transform your text to UTF-8
>> . UTF-8 still uses 8 bits per character. If you read the mail message
>> that is sent you see those characters because whatever console or text
>> display program you are using does not decode UTF-8 and show the correct
>> characters.
>
> But what could be causing my characters to randomly be transformed to UTF-8?
> Do you have any idea? I don't do it anywhere in my application (except for
> the test I stated in my first mail)? I've changed everything back to how it
> was though, and now it still shows
>
> test ëèï
>
> and sometimes
>
> test ëèï
>
> without me changing anything in the programs code? That is weird, isn't it?
If you use utf8_encode, you transform iso-8859-1 text in utf-8.
Since you only have text in one encoding that only uses 8 bit, utf-8 is
not useful for you.
--
Regards,
Manuel Lemos
Metastorage - Data object relational mapping layer generator
http://www.metastorage.net/
PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
Re: mail() encoding problems
am 23.08.2007 21:27:22 von Nook
Manuel Lemos wrote:
> Hello,
>
>>
>> But what could be causing my characters to randomly be transformed
>> to UTF-8? Do you have any idea? I don't do it anywhere in my
>> application (except for the test I stated in my first mail)? I've
>> changed everything back to how it was though, and now it still shows
>>
>> test ëèï
>>
>> and sometimes
>>
>> test ëèï
>>
>> without me changing anything in the programs code? That is weird,
>> isn't it?
>
> If you use utf8_encode, you transform iso-8859-1 text in utf-8.
>
> Since you only have text in one encoding that only uses 8 bit, utf-8
> is not useful for you.
I understand this. But the point I am trying to bring across, most likely I
wasn't too clear on that, is that I don't use UTF-8 encoding anywhere
anymore. I only used it once or twice, to see if that would solve my
problem. But to be very clear about it: I removed all utf8_encode()
functions, and my code is therefor back to it's usual state. But my problem
remains the same:
Sometimes the unintended
ëèï
sometimes the intended
ëèï
It seems to happen pretty random.
Therefor I am still stunned as to what could be causing this problem. It
looks as if something (maybe some mail gateway?) is transforming my e-mails
to UTF-8.
Do you have any other idea of what might be going on here? Your insights are
very welcome.
Re: mail() encoding problems
am 23.08.2007 23:30:29 von Nook
amygdala wrote:
> Manuel Lemos wrote:
>> Hello,
>>
>
>
>
>>>
>>> But what could be causing my characters to randomly be transformed
>>> to UTF-8? Do you have any idea? I don't do it anywhere in my
>>> application (except for the test I stated in my first mail)? I've
>>> changed everything back to how it was though, and now it still shows
>>>
>>> test ëèï
>>>
>>> and sometimes
>>>
>>> test ëèï
>>>
>>> without me changing anything in the programs code? That is weird,
>>> isn't it?
>>
>> If you use utf8_encode, you transform iso-8859-1 text in utf-8.
>>
>> Since you only have text in one encoding that only uses 8 bit, utf-8
>> is not useful for you.
>
> I understand this. But the point I am trying to bring across, most
> likely I wasn't too clear on that, is that I don't use UTF-8 encoding
> anywhere anymore. I only used it once or twice, to see if that would
> solve my problem. But to be very clear about it: I removed all
> utf8_encode() functions, and my code is therefor back to it's usual
> state. But my problem remains the same:
>
> Sometimes the unintended
>
> ëèï
>
> sometimes the intended
>
> ëèï
>
> It seems to happen pretty random.
>
> Therefor I am still stunned as to what could be causing this problem.
> It looks as if something (maybe some mail gateway?) is transforming
> my e-mails to UTF-8.
>
> Do you have any other idea of what might be going on here? Your
> insights are very welcome.
To add to this: the malformed e-mails have an earlier timestamp than that of
the correct e-mails, eventhough I have sent the malformed e-mail later then
the correct one. This leads me to believe that the incorrect e-mail indeed
get send with another mailserver or get send through a different
mail-gateway 'route' if such a thing exists. I'll contact my webhost once
more, and tell them about this new finding. Perhaps, with this new
information, they suddenly know what might cause the problem.
Cheers
Re: mail() encoding problems
am 24.08.2007 03:57:40 von Manuel Lemos
Hello,
on 08/23/2007 04:27 PM amygdala said the following:
> I understand this. But the point I am trying to bring across, most likely I
> wasn't too clear on that, is that I don't use UTF-8 encoding anywhere
> anymore. I only used it once or twice, to see if that would solve my
> problem. But to be very clear about it: I removed all utf8_encode()
> functions, and my code is therefor back to it's usual state. But my problem
> remains the same:
>
> Sometimes the unintended
>
> ëèï
>
> sometimes the intended
>
> ëèï
>
> It seems to happen pretty random.
>
> Therefor I am still stunned as to what could be causing this problem. It
> looks as if something (maybe some mail gateway?) is transforming my e-mails
> to UTF-8.
>
> Do you have any other idea of what might be going on here? Your insights are
> very welcome.
Could you be taking the text for the message from user submitted forms?
If so, make sure you set the encoding of the page that exhibit the forms
to an explicit value. If you do not do that, keep in mind that different
browsers assume different default character encodings. That could
explain why sometimes you get the encoding right and other times you don't.
If that is not the problem, consider using the class that I recommended
you and see if you still have the problem.
http://www.phpclasses.org/mimemessage
--
Regards,
Manuel Lemos
Metastorage - Data object relational mapping layer generator
http://www.metastorage.net/
PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/