GeoIP Character Encoding
am 29.07.2009 00:10:33 von APseudoUtopia
Hey,
I'm using the PECL GeoIP module on php 5.2.10. When I look up an IP
address, the geoip_record_by_name() function is giving me a string
that contains "special" characters, such as the following:
'Portugal, 09, Vila Real De Santo António'
'Norway, 08, Ã
lesund'
'Portugal, 04, Vila Nova De Famalicão'
(Note the ó, Ã
, and ã).
I'm using PostgreSQL as my database. The database's encoding is UTF8,
and the locale is C.
When I try to insert the above strings into a VARCHAR column, I get
errors similar to the following:
ERROR: invalid byte sequence for encoding "UTF8": 0xf36e696f
ERROR: invalid byte sequence for encoding "UTF8": 0xc56c
ERROR: invalid byte sequence for encoding "UTF8": 0xe36f2c
Now, I believe I can solve the problem by changing the client_encoding
of my postgresql client (Right now, it is set to UTF8). However, I'm
trying to figure out what encoding the GeoIP function is returning to
me so that I can set the client_encoding appropriately. Is it LATIN1?
How can I figure it out? And can I change it to UTF8?
Thank you for your time.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: GeoIP Character Encoding
am 29.07.2009 10:11:18 von news.NOSPAM.0ixbtqKe
On Tue, 28 Jul 2009 18:10:33 -0400, APseudoUtopia wrote:
> 'Portugal, 09, Vila Real De Santo António'
> 'Norway, 08, Ã
lesund'
> 'Portugal, 04, Vila Nova De Famalicão'
>
> (Note the ó, Ã
, and ã).
>
> I'm using PostgreSQL as my database. The database's encoding is UTF8,
> and the locale is C.
>
> When I try to insert the above strings into a VARCHAR column, I get
> errors similar to the following:
>
> ERROR: invalid byte sequence for encoding "UTF8": 0xf36e696f
> ERROR: invalid byte sequence for encoding "UTF8": 0xc56c
> ERROR: invalid byte sequence for encoding "UTF8": 0xe36f2c
>
> Now, I believe I can solve the problem by changing the client_encoding
> of my postgresql client (Right now, it is set to UTF8). However, I'm
> trying to figure out what encoding the GeoIP function is returning to
> me so that I can set the client_encoding appropriately. Is it LATIN1?
> How can I figure it out? And can I change it to UTF8?
The hex sequences are consistent with ISO-8859-1 (aka latin1).
You may want to check out mbstring, iconv or recode:
/Nisse
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: GeoIP Character Encoding
am 29.07.2009 10:34:53 von news.NOSPAM.0ixbtqKe
On Tue, 28 Jul 2009 18:10:33 -0400, APseudoUtopia wrote:
> I'm using the PECL GeoIP module on php 5.2.10. When I look up an IP
> address, the geoip_record_by_name() function is giving me a string
> that contains "special" characters, such as the following:
The PECL GeoIP page links to , and a
search for "charset" reveals the following:
the binary database return the cityname in iso-8859-1 by default.
I guess your output is in a different charset.
The CAPI based wrappers have a set_charset method.
If you use any other API, use the language charset encoding to
transform iso-8859-1 into the desired output format.
for php you might use:
Code:
$city = mb_convert_encoding( $old_city, 'UTF-8', 'ISO-8859-1');
or
Code:
$city = utf8_encode ( $oold_city )
/Nisse
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php