Advice needed; php5, utf-8, mb_*
Advice needed; php5, utf-8, mb_*
am 14.08.2007 11:26:22 von working_boy
Hello!
I am transferring large php application which also uses few third
party php libraries to UTF-8.
And now, of course I have problems with string functions which are not
multi-byte safe, especially in those third party libraries.
My first, optimistic attempt was to automatically override "ordinary"
string functions with its multi-byte versions
(.htaccess: php_value mbstring.func_overload 7).
But that didn't work out. For example phpmailer class failed.
Intrestingly enough it SEEMS that it works just fine with "ordinary"
string functions and UTF-8. But those bugs (which can be manifested
when using "ordinary" string functions with multi-bytes characters)
are note easy to catch especially when my primary language uses mostly
1byte characters.
So, for now only thing I can do is go through all third party
libraries and try to figure out which string functions are should work
on bytes and which should work on characters and replace them
accordingly.
For example, when strlen is supposed to return length in bytes I
should leave it as is, but when it's supposed to return number of
characters I should replace it with mb_strlen ... and so on for all
multi-byte unsafe string functions.
The problem is that it is not easy to find out which functions should
be replaced and which not, also I have to repeat the process each time
new version is released.
So, those anybody have any ideas how these problems can be solved more
elegantly?
Thx.
Re: Advice needed; php5, utf-8, mb_*
am 14.08.2007 12:25:29 von Jerry Stuckle
working_boy@net.hr wrote:
> Hello!
>
> I am transferring large php application which also uses few third
> party php libraries to UTF-8.
>
> And now, of course I have problems with string functions which are not
> multi-byte safe, especially in those third party libraries.
>
> My first, optimistic attempt was to automatically override "ordinary"
> string functions with its multi-byte versions
> (.htaccess: php_value mbstring.func_overload 7).
>
> But that didn't work out. For example phpmailer class failed.
> Intrestingly enough it SEEMS that it works just fine with "ordinary"
> string functions and UTF-8. But those bugs (which can be manifested
> when using "ordinary" string functions with multi-bytes characters)
> are note easy to catch especially when my primary language uses mostly
> 1byte characters.
>
> So, for now only thing I can do is go through all third party
> libraries and try to figure out which string functions are should work
> on bytes and which should work on characters and replace them
> accordingly.
>
> For example, when strlen is supposed to return length in bytes I
> should leave it as is, but when it's supposed to return number of
> characters I should replace it with mb_strlen ... and so on for all
> multi-byte unsafe string functions.
>
> The problem is that it is not easy to find out which functions should
> be replaced and which not, also I have to repeat the process each time
> new version is released.
>
>
> So, those anybody have any ideas how these problems can be solved more
> elegantly?
>
> Thx.
>
Perhaps donate some of your work to the opensource projects? I suspect
phpmailer would appreciate your efforts, for instance.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Re: Advice needed; php5, utf-8, mb_*
am 14.08.2007 13:04:25 von Ulf Kadner
working_boy@net.hr wrote:
> My first, optimistic attempt was to automatically override "ordinary"
> string functions with its multi-byte versions
> (.htaccess: php_value mbstring.func_overload 7).
Setting a php_value by htaccess for apache will only work if php runs as
module. Youre php is used as a module? For LightHttp Webserver it also
works for CGI. -v please
> But that didn't work out. For example phpmailer class failed.
Failed? Didn't work? Nice description for the problem!
--
_,
_(_p> Ulf [Kado] Kadner
\<_) Mitglied der Freizeitvögel? ;-)
^^
Re: Advice needed; php5, utf-8, mb_*
am 14.08.2007 13:28:34 von working_boy
On Aug 14, 1:04 pm, Ulf Kadner wrote:
> working_...@net.hr wrote:
> > My first, optimistic attempt was to automatically override "ordinary"
> > string functions with its multi-byte versions
> > (.htaccess: php_value mbstring.func_overload 7).
>
> Setting a php_value by htaccess for apache will only work if php runs as
> module. Youre php is used as a module? For LightHttp Webserver it also
> works for CGI. -v please
>
> > But that didn't work out. For example phpmailer class failed.
>
> Failed? Didn't work? Nice description for the problem!
>
There is no need for detail description of the problem with
phpmailer.
I am not asking how to port phpmailer to full UTF-8 php application.
It was just one real-word example.
I am asking about more general advice how to "attack" this problem
with multi-byte strings and third party-libraries. Phpmailer is not
the only class I am using.
Setting php_value works just fine. When func_overload is on multi
byte versions are used instead of "ordinary" string functions.
(I can describe problem with phpmailer when func_overload is on but
this is not really relevant for this thread.
I am sending both html and txt versions of the message in UTF-8.
Func_overload is on. (so php uses mb_strlen instead of strlen , and so
on .. for all string functions which are multi-byte unsafe).
In this scenario html is send like some sort of attachment but since
func_overload is on strlen does not return number of bytes but number
of characters (which is not equal when using UTF-8) so attachments are
not properly sent. So instead of viewing only html or only txt message
I can see (in my e-mail client) both version (html and txt) one below
the other with some damaged headers.
Also there is possibility that strlen is not the only function which
causes this kind of behavior in phpmailer.
When func_overload is off and "ordinary" string functions are used it
SEEMS that phpmailer works fine but I didn't test enough to be sure. )
Re: Advice needed; php5, utf-8, mb_*
am 14.08.2007 13:41:57 von working_boy
On Aug 14, 12:25 pm, Jerry Stuckle wrote:
> > So, those anybody have any ideas how these problems can be solved more
> > elegantly?
>
> > Thx.
>
> Perhaps donate some of your work to the opensource projects? I suspect
> phpmailer would appreciate your efforts, for instance.
>
So, in your opinion the only solution is to analyze php code and try
to figure out what was the author intentions? To analyze for example
is strlen supposed to return number of bytes or number of characters?
If it is supposed to return number of bytes - then leave it as is,
otherwise rename it in mb_strlen?
This is major undertaking for someone not familiar with inner working
of this class. Also I am not sure that phpmailer would really
appreciate this. MB is just an extension and there is possibility that
it is not even installed on all shared hosting servers. Also, for most
westerners iso-8859-1 is good enough, and multi-byte functions are
slower then ordinary functions. PHP6 is around the corner and I am not
really sure how many people would benefit from this. If I decide add
full multi-byte support to phpmailer I will released it, but I can not
promise that it will be in synch with current version of original
phpmailer class.
And again, this is not about phpmailer class. I am interested in other
people opinions and experiences with multi byte string functions and
third party libraries and what is the best thing to do.
Re: Advice needed; php5, utf-8, mb_*
am 14.08.2007 16:41:51 von Jerry Stuckle
working_boy@net.hr wrote:
> On Aug 14, 12:25 pm, Jerry Stuckle wrote:
>>> So, those anybody have any ideas how these problems can be solved more
>>> elegantly?
>>> Thx.
>> Perhaps donate some of your work to the opensource projects? I suspect
>> phpmailer would appreciate your efforts, for instance.
>>
>
>
> So, in your opinion the only solution is to analyze php code and try
> to figure out what was the author intentions? To analyze for example
> is strlen supposed to return number of bytes or number of characters?
> If it is supposed to return number of bytes - then leave it as is,
> otherwise rename it in mb_strlen?
>
> This is major undertaking for someone not familiar with inner working
> of this class. Also I am not sure that phpmailer would really
> appreciate this. MB is just an extension and there is possibility that
> it is not even installed on all shared hosting servers. Also, for most
> westerners iso-8859-1 is good enough, and multi-byte functions are
> slower then ordinary functions. PHP6 is around the corner and I am not
> really sure how many people would benefit from this. If I decide add
> full multi-byte support to phpmailer I will released it, but I can not
> promise that it will be in synch with current version of original
> phpmailer class.
>
> And again, this is not about phpmailer class. I am interested in other
> people opinions and experiences with multi byte string functions and
> third party libraries and what is the best thing to do.
>
No, work with them to come out with a multibyte version of the code.
If you're going to change the code yourself anyway to make it work, why
not work with the people who developed the product to make it work right
and make it available to those who need it?
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Re: Advice needed; php5, utf-8, mb_*
am 14.08.2007 18:06:59 von Toby A Inkster
working_boy wrote:
> So, those anybody have any ideas how these problems can be solved more
> elegantly?
Personally I'm of the stick-my-head-in-the-sand-and-hope-PHP-6-fixes-this
school of thought.
--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.12-12mdksmp, up 54 days, 19:46.]
Fake Steve is Dead; Long Live Fake Bob!
http://tobyinkster.co.uk/blog/2007/08/13/fake-bob/