regex - filtering out chinese utf8 characters

regex - filtering out chinese utf8 characters

am 30.07.2009 20:13:49 von Merlin Morgenstern

Hi there,

I am trying to filter out content that is not ascii. Can I do this with
regex? For example:

$regex = '[AZ][09]';
if (preg_match($regex, $text)) {
return TRUE;
}
else {
return FALSE;
}

The reason I need to do this is that I am doing a mysql query with the
text and I need to make sure it is not UTF8. Otherwise I do get
following error:

Error: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and
(utf8_general_ci,COERCIBLE) for operation '='

I am new to regex and would be happy for a jump start to get this fixed.

Best regards, Merlin

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: regex - filtering out chinese utf8 characters

am 30.07.2009 20:28:19 von Stuart Connolly

--Apple-Mail-4-88359678
Content-Type: text/plain;
charset=US-ASCII;
format=flowed;
delsp=yes
Content-Transfer-Encoding: 7bit

Hi Merlin,

I think the pattern you're looking for is '/[a-zA-Z0-9]/' which will
match all alphanumeric characters.

Cheers

Stuart

On 30 Jul 2009, at 19:13, Merlin Morgenstern wrote:

> Hi there,
>
> I am trying to filter out content that is not ascii. Can I do this
> with regex? For example:
>
> $regex = '[AZ][09]';
> if (preg_match($regex, $text)) {
> return TRUE;
> }
> else {
> return FALSE;
> }
>
> The reason I need to do this is that I am doing a mysql query with
> the text and I need to make sure it is not UTF8. Otherwise I do get
> following error:
>
> Error: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and
> (utf8_general_ci,COERCIBLE) for operation '='
>
> I am new to regex and would be happy for a jump start to get this
> fixed.
>
> Best regards, Merlin
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>


--Apple-Mail-4-88359678
Content-Disposition: attachment;
filename=smime.p7s
Content-Type: application/pkcs7-signature;
name=smime.p7s
Content-Transfer-Encoding: base64

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEH AQAAoIIGLTCCAuYw
ggJPoAMCAQICEFVjv/P9IzaDsbaC7xonT8wwDQYJKoZIhvcNAQEFBQAwYjEL MAkGA1UEBhMCWkEx
JTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNV BAMTI1RoYXd0ZSBQ
ZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA4MTAyNTE0NTgwNVoX DTA5MTAyNTE0NTgw
NVowSDEfMB0GA1UEAxMWVGhhd3RlIEZyZWVtYWlsIE1lbWJlcjElMCMGCSqG SIb3DQEJARYWc3R1
YXJ0QHN0dWNvbm5vbGx5LmNvbTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCC AQoCggEBANl2ewmT
nisf6Woj8uJ9TG8qJru3E9GxHfXV0vWM7pYNzX+QjiiKwQM97Oh8N3ENQ5fW ELd9qRi0Fg1+yng9
nvnYxs5wihWsFtL6lSj6xkBNyrKVZ7Z/RL6c/MiybT6E8DBBhDYjx9YC/012 HzXCykAp25HKk4CU
hgRt47KkymOXLdti9YmxxUqU5zNvRQ1MtdRcXP5du+X+6/JS6Kk7Ra4GJvpF i5DLbsGq8u6FGxTF
+kxFrkuNfx+IjIi+W55fsTdxk8f93KGj7YOt73Olg4tm8UX49tQh6SyQ+AZB sUVbOypSmtA0oGfi
wA4gAxDEB0HIlMp80akO8+Iq+B7VHpsCAwEAAaMzMDEwIQYDVR0RBBowGIEW c3R1YXJ0QHN0dWNv
bm5vbGx5LmNvbTAMBgNVHRMBAf8EAjAAMA0GCSqGSIb3DQEBBQUAA4GBAEWb /DogWXe7dDRKEaY6
qefNHtjMen8n5VCzBfyVshWilNOdkwlF+pT8yMSTDdDjt/BVf4EHOn4CzCj0 OJ3cvmgZ97HH3e8x
WHNb+BSPCzFSd7h+wxwjl3LwkDW3PHsrDx+tJkXm5lkhAXvitrUj6o6ZY1Nn ppw7WXuy5L5HOV2Q
MIIDPzCCAqigAwIBAgIBDTANBgkqhkiG9w0BAQUFADCB0TELMAkGA1UEBhMC WkExFTATBgNVBAgT
DFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMRowGAYDVQQKExFU aGF3dGUgQ29uc3Vs
dGluZzEoMCYGA1UECxMfQ2VydGlmaWNhdGlvbiBTZXJ2aWNlcyBEaXZpc2lv bjEkMCIGA1UEAxMb
VGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIENBMSswKQYJKoZIhvcNAQkBFhxw ZXJzb25hbC1mcmVl
bWFpbEB0aGF3dGUuY29tMB4XDTAzMDcxNzAwMDAwMFoXDTEzMDcxNjIzNTk1 OVowYjELMAkGA1UE
BhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4x LDAqBgNVBAMTI1Ro
YXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMIGfMA0GCSqGSIb3 DQEBAQUAA4GNADCB
iQKBgQDEpjxVc1X7TrnKmVoeaMB1BHCd3+n/ox7svc31W/Iadr1/DDph8r9R zgHU5VAKMNcCY1os
iRVwjt3J8CuFWqo/cVbLrzwLB+fxH5E2JCoTzyvV84J3PQO+K/67GD4Hv0CA AmTXp6a7n2XRxSpU
hQ9IBH+nttE8YQRAHmQZcmC3+wIDAQABo4GUMIGRMBIGA1UdEwEB/wQIMAYB Af8CAQAwQwYDVR0f
BDwwOjA4oDagNIYyaHR0cDovL2NybC50aGF3dGUuY29tL1RoYXd0ZVBlcnNv bmFsRnJlZW1haWxD
QS5jcmwwCwYDVR0PBAQDAgEGMCkGA1UdEQQiMCCkHjAcMRowGAYDVQQDExFQ cml2YXRlTGFiZWwy
LTEzODANBgkqhkiG9w0BAQUFAAOBgQBIjNFQg+oLLswNo2asZw9/r6y+wheh Q5aUnX9MIbj4Nh+q
LZ82L8D0HFAgk3A8/a3hYWLD2ToZfoSxmRsAxRoLgnSeJVCUYsfbJ3FXJY3d qZw5jowgT2Vfldr3
94fWxghOrvbqNOUQGls1TXfjViF4gtwhGTXeJLHTHUb/XV9lTzGCAxAwggMM AgEBMHYwYjELMAkG
A1UEBhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0 ZC4xLDAqBgNVBAMT
I1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBAhBVY7/z/SM2 g7G2gu8aJ0/MMAkG
BSsOAwIaBQCgggFvMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZI hvcNAQkFMQ8XDTA5
MDczMDE4MjgxOVowIwYJKoZIhvcNAQkEMRYEFOnF9rDoGMEiUE4dpTkPRodJ JTnUMIGFBgkrBgEE
AYI3EAQxeDB2MGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUgQ29u c3VsdGluZyAoUHR5
KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgSXNz dWluZyBDQQIQVWO/
8/0jNoOxtoLvGidPzDCBhwYLKoZIhvcNAQkQAgsxeKB2MGIxCzAJBgNVBAYT AlpBMSUwIwYDVQQK
ExxUaGF3dGUgQ29uc3VsdGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3 dGUgUGVyc29uYWwg
RnJlZW1haWwgSXNzdWluZyBDQQIQVWO/8/0jNoOxtoLvGidPzDANBgkqhkiG 9w0BAQEFAASCAQAc
ulagsDwxrJ/t4gqaMazlbpysfuFUOwqV4VmhL/LySSmi0zZZwIegiBBk8c2w dknB4fWfbfdU7vkd
B8zBie4EZULkBJ5r9V42A5p7gRSnaY0+ZngdiCZErz1yVKSrp0gM7DVU+8Fp J1bZH9LmDnvtm9Zu
5dviwb97khvBBPyFOakrkAM+SwyCL8mGq5aTkl1N+EBvGN+sLzTThb6VdVyN 6T7zUdKMoQ+YScMM
HQv1tX/L0fJnhhVb2wepVUGE1tjXD9/3wFcU/qjyuO3/Xqq95YOvhkr4t/Al fzXD15eScEjmkRyR
iGFIfXuDINY+MM7pRxdeGL+45Dk2mEmY5L4nAAAAAAAA

--Apple-Mail-4-88359678--

Re: regex - filtering out chinese utf8 characters

am 30.07.2009 23:25:46 von Daniel Kolbo

Merlin Morgenstern wrote:
> Hi there,
>
> I am trying to filter out content that is not ascii. Can I do this with
> regex? For example:
>
> $regex = '[AZ][09]';
> if (preg_match($regex, $text)) {
> return TRUE;
> }
> else {
> return FALSE;
> }
>
> The reason I need to do this is that I am doing a mysql query with the
> text and I need to make sure it is not UTF8. Otherwise I do get
> following error:
>
> Error: Illegal mix of collations (latin1_swedish_ci,IMPLICIT)
> and (utf8_general_ci,COERCIBLE) for operation '='
>
> I am new to regex and would be happy for a jump start to get this fixed.
>
> Best regards, Merlin
>
You prolly have already been here:
http://www.regular-expressions.info/

But if not, that site is certainly useful for all things regex.

Sorry I can't be of more help for your specific question.

dK
`

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: regex - filtering out chinese utf8 characters

am 30.07.2009 23:37:05 von List Manager

Merlin Morgenstern wrote:
> Hi there,
>
> I am trying to filter out content that is not ascii. Can I do this with
> regex? For example:
>
> $regex = '[AZ][09]';
> if (preg_match($regex, $text)) {
> return TRUE;
> }
> else {
> return FALSE;
> }
>
> The reason I need to do this is that I am doing a mysql query with the
> text and I need to make sure it is not UTF8. Otherwise I do get
> following error:
>
> Error: Illegal mix of collations (latin1_swedish_ci,IMPLICIT)
> and (utf8_general_ci,COERCIBLE) for operation '='
>
> I am new to regex and would be happy for a jump start to get this fixed.
>
> Best regards, Merlin
>

You might want to read up on iconv. I think it will do what you are
wanting to do.

http://us2.php.net/manual/en/book.iconv.php

specifically...

http://us2.php.net/manual/en/function.iconv.php


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php