GeoIP Character Encoding

GeoIP Character Encoding

am 29.07.2009 00:10:33 von APseudoUtopia

Hey,

I'm using the PECL GeoIP module on php 5.2.10. When I look up an IP
address, the geoip_record_by_name() function is giving me a string
that contains "special" characters, such as the following:

'Portugal, 09, Vila Real De Santo António'
'Norway, 08, Ålesund'
'Portugal, 04, Vila Nova De Famalicão'

(Note the ó, Å, and ã).

I'm using PostgreSQL as my database. The database's encoding is UTF8,
and the locale is C.

When I try to insert the above strings into a VARCHAR column, I get
errors similar to the following:

ERROR: invalid byte sequence for encoding "UTF8": 0xf36e696f
ERROR: invalid byte sequence for encoding "UTF8": 0xc56c
ERROR: invalid byte sequence for encoding "UTF8": 0xe36f2c

Now, I believe I can solve the problem by changing the client_encoding
of my postgresql client (Right now, it is set to UTF8). However, I'm
trying to figure out what encoding the GeoIP function is returning to
me so that I can set the client_encoding appropriately. Is it LATIN1?
How can I figure it out? And can I change it to UTF8?

Thank you for your time.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GeoIP Character Encoding

am 29.07.2009 10:11:18 von news.NOSPAM.0ixbtqKe

On Tue, 28 Jul 2009 18:10:33 -0400, APseudoUtopia wrote:

> 'Portugal, 09, Vila Real De Santo António'
> 'Norway, 08, Ålesund'
> 'Portugal, 04, Vila Nova De Famalicão'
>
> (Note the ó, Å, and ã).
>
> I'm using PostgreSQL as my database. The database's encoding is UTF8,
> and the locale is C.
>
> When I try to insert the above strings into a VARCHAR column, I get
> errors similar to the following:
>
> ERROR: invalid byte sequence for encoding "UTF8": 0xf36e696f
> ERROR: invalid byte sequence for encoding "UTF8": 0xc56c
> ERROR: invalid byte sequence for encoding "UTF8": 0xe36f2c
>
> Now, I believe I can solve the problem by changing the client_encoding
> of my postgresql client (Right now, it is set to UTF8). However, I'm
> trying to figure out what encoding the GeoIP function is returning to
> me so that I can set the client_encoding appropriately. Is it LATIN1?
> How can I figure it out? And can I change it to UTF8?

The hex sequences are consistent with ISO-8859-1 (aka latin1).
You may want to check out mbstring, iconv or recode:




/Nisse

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: GeoIP Character Encoding

am 29.07.2009 10:34:53 von news.NOSPAM.0ixbtqKe

On Tue, 28 Jul 2009 18:10:33 -0400, APseudoUtopia wrote:

> I'm using the PECL GeoIP module on php 5.2.10. When I look up an IP
> address, the geoip_record_by_name() function is giving me a string
> that contains "special" characters, such as the following:

The PECL GeoIP page links to , and a
search for "charset" reveals the following:






the binary database return the cityname in iso-8859-1 by default.
I guess your output is in a different charset.

The CAPI based wrappers have a set_charset method.

If you use any other API, use the language charset encoding to
transform iso-8859-1 into the desired output format.

for php you might use:

Code:
$city = mb_convert_encoding( $old_city, 'UTF-8', 'ISO-8859-1');

or

Code:
$city = utf8_encode ( $oold_city )




/Nisse

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php