ctype_print, the British Pound and other non-ASCII characters

ctype_print, the British Pound and other non-ASCII characters

am 26.02.2010 04:35:12 von Bob

I'm seeing mischief from ctype_print.

So far as I can tell, the British Pound symbol, '£' is considered a
printable character according to the locale I use on my Ubuntu box. But
even across two years, two boxes, several versions of Ubuntu (from 7.04
to 9.10, one x86, one AMD64), and two major versions of PHP (PHP 4 and
now PHP 5.2.11), I cannot get ctype_print to return true when a string
given to it contains the British Pound symbol. (Or other non-ASCII
characters such as ø or ß.)

The locale I'm using is en_GB.UTF-8 and when I call setlocale(LC_ALL,
'en_GB.UTF-8') in PHP, it returns the name of this locale rather than
FALSE, so that seems to be in order. (However, to be sure I have
installed and reinstalled the language pack in Ubuntu as suggested by
others.)

I've even read through the en_GB and i18n locale definition files to
confirm that (for the British Pound symbol) does appear within
the print and graph sections, so both ctype_print and ctype_graph should
consider it acceptable.

What's most maddening is that ctype_print does return true on my shared
hosting server, so I know that it can be achieved. I'm just hoping that
someone here can tell me what I'm doing wrong, or what my operating
system is doing wrong.

For your information, I'm currently running the following:

Ubuntu 9.10 (AMD64)
Apache 2.2.14
PHP 5.2.11 running as a CGI (to mirror the config of my shared host)
Locale in use: en_GB.UTF-8
LANG=en_GB.UTF-8

Can anyone tell me how to get ctype_print to behave?

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: ctype_print, the British Pound and other non-ASCII characters

am 26.02.2010 19:39:58 von Nathan Rixham

Bob wrote:
> I'm seeing mischief from ctype_print.
>
> So far as I can tell, the British Pound symbol, '£' is considered a
> printable character according to the locale I use on my Ubuntu box. But
> even across two years, two boxes, several versions of Ubuntu (from 7.04
> to 9.10, one x86, one AMD64), and two major versions of PHP (PHP 4 and
> now PHP 5.2.11), I cannot get ctype_print to return true when a string
> given to it contains the British Pound symbol. (Or other non-ASCII
> characters such as ø or ß.)
>
> The locale I'm using is en_GB.UTF-8 and when I call setlocale(LC_ALL,
> 'en_GB.UTF-8') in PHP, it returns the name of this locale rather than
> FALSE, so that seems to be in order. (However, to be sure I have
> installed and reinstalled the language pack in Ubuntu as suggested by
> others.)
>
> I've even read through the en_GB and i18n locale definition files to
> confirm that (for the British Pound symbol) does appear within
> the print and graph sections, so both ctype_print and ctype_graph should
> consider it acceptable.
>
> What's most maddening is that ctype_print does return true on my shared
> hosting server, so I know that it can be achieved. I'm just hoping that
> someone here can tell me what I'm doing wrong, or what my operating
> system is doing wrong.
>
> For your information, I'm currently running the following:
>
> Ubuntu 9.10 (AMD64)
> Apache 2.2.14
> PHP 5.2.11 running as a CGI (to mirror the config of my shared host)
> Locale in use: en_GB.UTF-8
> LANG=en_GB.UTF-8
>
> Can anyone tell me how to get ctype_print to behave?

Tested on a few ubuntu boxes (8&9s) and:

When using en_US.utf8 all is fine

var_dump( ctype_print( 'abcd ef £ ghs als kl ,!' ) ); // TRUE

then:

# locale-gen en_GB.UTF-8
Generating locales...
en_GB.UTF-8... done
Generation complete.

# locale -a
C
en_GB.utf8
en_US
en_US.utf8
POSIX

setlocale(LC_ALL, 'en_GB.UTF-8');
var_dump( ctype_print( 'abcd ef £ ghs als kl ,!' ) ); // FALSE

wondering if this is a PHP issue or a mapping generation issue on ubuntu..

have you checked the output of #locale to ensure LC_CTYPE is set to the
appropriate value?

regards!

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: ctype_print, the British Pound and other non-ASCII characters

am 26.02.2010 22:23:14 von Bob

Hello, Nathan.

I'm glad to hear that someone else can reproduce the problem with
en_GB.UTF-8. I was worried it was some bad luck quirk that I was never
going to get to the bottom of.

I tried using en_US.utf8 (and also en_US.UTF-8) in setlocale (and it did
not return false, so again looks like the locale is found and accepted).
But I still got a return of false from ctype_print for non-ASCII
characters. So even with en_US I'm getting bad behaviour.

When you switch back to en_US.UTF-8 (or en_US.utf8) do you get true from
ctype_print as expected? (I'm hoping that you don't suddenly find
ctype_print refuses to behave properly under all locales.)

Output from `locale` shows that all types are 'en_GB.UTF-8' except LC_ALL
which is blank (as I believe it should be).

Do you know how I can dig further? I don't know anything about debugging
PHP or Linux, so I don't know how to trace the source of this strange
result.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: ctype_print, the British Pound and other non-ASCII characters

am 26.02.2010 22:46:54 von Bob

In php.i18n, an interesting discussion about this problem has appeared.

It looks like the problem is Ubuntu and not PHP, as a short chunk of code
written in C and using the native isprint equivalent to ctype_print also
returns false for the British Pound symbol.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php