Usage of strlen(tuf8_decode()) and "/u" regex modifier

Usage of strlen(tuf8_decode()) and "/u" regex modifier

am 21.09.2009 16:10:53 von GoForThisWorld

Hello,

As indicated below, the "strlen(tuf8_decode())" and the "/u" regex
modifier do not work as per my understanding.

1) What is my misunderstanding?


$the_string = 'Марина Орлова';
echo "

author (85 bytes):$the_string," . strlen($the_string) . ',' . strlen( utf8_decode( $the_string ) ) . ',' .
strlen( utf8_decode( utf8_encode($the_string) ) ) . ',' . "

";
// all the number echoed are 85, I expected at least one to be 13


$max_length = 20;
$is_short = preg_match( '/^.{1,$max_length}$/u', uft8_encode( $the_string ) ) );
// expect the above to return 1

$max_length = 10;
$is_short = preg_match( '/^.{1,$max_length}$/u', uft8_encode( $the_string ) ) );
// expect the above to return 0

?>

More generally, given a string $the_string:

2) how to determine what encoding is being used?

3) how to determine the number of visible characters?

4) if it has more than N visible characters, how to
truncate it after N visible characters?

Thanks!


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

RE: Usage of strlen(tuf8_decode()) and "/u" regex modifier

am 21.09.2009 16:19:58 von Andrea Giammarchi

--_ecb27cd2-af00-4faf-ab2c-d69ab436af80_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable





> $the_string =3D 'М=3Bа=3Bр=3Bи=3Bн=3B&=
#1072=3B О=3Bр=3Bл=3Bо=3Bв=3Bа=3B'=3B

did you actually wrote this or i tis the PHP ml that converted utf-8 chars?
I can read only an ASCII string with length 85 ... please tell me you are n=
ot confusing HTML entities with UTF-8 encoded characters ...

____________________________________________________________ _____
More than messages=96check out the rest of the Windows Live=99.
http://www.microsoft.com/windows/windowslive/=

--_ecb27cd2-af00-4faf-ab2c-d69ab436af80_--