How to import dump file with some different charset
am 02.07.2006 18:19:45 von paff
Hi,
sorry about writing here in english, but I couldn't find any other
newsgroup which was good to ask this question. I guess that almost all
of you can speak english anyway...
I asked this question on polish (I'm from Poland) newsgroup and nobody
helped me, so I'm trying to find help here :)
Of course I've checked everything in Google but I have not find anything
that works.
I have just been employed to modify, change all the protocols etc. of
some application that uses databases.
Everything would be ok, but the first step is to convert database from
ISO to Unicode and I can not do that :-/
I have some database - it's about 4 GB with all the index files.
Database is in latin2 charset and I must to have it in UTF-8.
I made a dump file in this way:
mysqldump -u USER -pPASSWORD DATABASENAME >dump.sql
Ok, it seems that dump.sql is in UTF-8, that's really great.
But in this file there is specified CHARACTER SET as latin2 in each
table declaration. So, I changed all the "latin2" to "utf8", thinking
that it can solve all my problems.
Then I made:
mysql -u USER -p DATABASENAME < dump.sql
And the result database is again in ISO :-/
How can I make mysql to create the new database using utf8? Is it really
so difficult ? :)
How can I do it?
By the way - I'm waiting for thuesday's match... :)
Best greetings
Pawel Marek
Re: How to import dump file with some different charset
am 02.07.2006 21:39:58 von Axel Schwenke
paff wrote:
> sorry about writing here in english, but I couldn't find any other
> newsgroup which was good to ask this question.
There is comp.databases.mysql for questions in English.
> Of course I've checked everything in Google but I have not find anything
> that works.
You should have checked the manual first:
http://dev.mysql.com/doc/refman/5.0/en/charset.html
BTW, what MySQL version do you have? Character set support in MySQL
was extended with 4.1. You definitely need 4.1 or later for UTF8.
> Everything would be ok, but the first step is to convert database from
> ISO to Unicode and I can not do that :-/
> Database is in latin2 charset and I must to have it in UTF-8.
You can do that online by
ALTER TABLE CONVERT TO CHARACTER SET UTF8
> I made a dump file in this way:
> mysqldump -u USER -pPASSWORD DATABASENAME >dump.sql
>
> Ok, it seems that dump.sql is in UTF-8, that's really great.
So it seems you used mysqldump from at least MySQL 4.1. Good.
> But in this file there is specified CHARACTER SET as latin2 in each
> table declaration. So, I changed all the "latin2" to "utf8", thinking
> that it can solve all my problems.
What *exactly* did you change? The dump contains CHARSET/COLLATE
definitions at database, table and column level. You *must* have
changed the column definitions and you probably should change the
database/table definitions - to use UTF8 as default for new columns.
Please see the manual how and when defaults are used.
> Then I made:
> mysql -u USER -p DATABASENAME < dump.sql
>
> And the result database is again in ISO :-/
Then you either:
- changed the wrong things in the dumpfile, or
- you didn'n theck the new state correctly, what do you get for
SHOW FULL FIELDS FROM ?
I guess your database actually *stores* all data in UTF8. but your
client still gets latin, because it's not configured correctly.
http://dev.mysql.com/doc/refman/5.0/en/charset-connection.ht ml
HTH, XL
Re: How to import dump file with some different charset
am 02.07.2006 23:43:39 von paff
Axel Schwenke napisa?(a):
> There is comp.databases.mysql for questions in English.
My news serwer didn't know about it ;-)
I see
comp.databases.ms-access
comp.databases.ms-sqlserver
> You should have checked the manual first:
> http://dev.mysql.com/doc/refman/5.0/en/charset.html
I have seen it, but I didn't found what I need.
Thanks for your reply, yesterday at work I will try your suggestions.
MySQL is in version 5.
> What *exactly* did you change? The dump contains CHARSET/COLLATE
> definitions at database, table and column level.
So I have changed all the "latin2" strings to "utf8" id dump file (don't
ask me what in case when "latin2" is a value of some field - now it's
just a test phase ;-)
> Then you either:
> - changed the wrong things in the dumpfile, or
> - you didn'n theck the new state correctly, what do you get for
> SHOW FULL FIELDS FROM
?
I have just checked your suggestion to use ALTER TABLE with CONVERT on
my example table which (I think so) is still in ISO. After this
operation PHPAdmin show, that data size is still the same, but indexes
are 4 times bigger... (?)
How can I check if data are really stored in UTF8? When I connect to
database with my application, I get an ISO string.
When I use SHOW FULL FIELDS FROM TABLE, I get such a table with
information about table structure:
Field, Type, Collation, Null, Key, Default, Extra, Privileges, Comment
Collation is everywhere set to NULL.
Pawel Marek
Re: How to import dump file with some different charset
am 03.07.2006 09:22:17 von Axel Schwenke
paff wrote:
> Axel Schwenke napisa?(a):
>
> I have seen it, but I didn't found what I need.
> Thanks for your reply, yesterday at work I will try your suggestions.
^^^^^^^^^
Ohh. Do you own a time machine? :-))
>> What *exactly* did you change? The dump contains CHARSET/COLLATE
>> definitions at database, table and column level.
>
> So I have changed all the "latin2" strings to "utf8" id dump file (don't
> ask me what in case when "latin2" is a value of some field - now it's
> just a test phase ;-)
OK.
>> Then you either:
>> - changed the wrong things in the dumpfile, or
>> - you didn'n theck the new state correctly, what do you get for
>> SHOW FULL FIELDS FROM ?
>
> I have just checked your suggestion to use ALTER TABLE with CONVERT on
> my example table which (I think so) is still in ISO. After this
> operation PHPAdmin show, that data size is still the same, but indexes
> are 4 times bigger... (?)
Uhh. Don't trust phpMyAdmin. Also index size does not prove anything.
Except the size of the index, of course.
> How can I check if data are really stored in UTF8?
As I said. SHOW FULL FIELDS. Hava a look at an example:
mysql> create table foo (x char(10), y varchar(20));
Query OK, 0 rows affected (0.00 sec)
mysql> show full fields from foo\G
*************************** 1. row ***************************
Field: x
Type: char(10)
Collation: latin1_swedish_ci
Null: YES
Default: NULL
*************************** 2. row ***************************
Field: y
Type: varchar(20)
Collation: latin1_swedish_ci
Null: YES
Default: NULL
2 rows in set (0.01 sec)
mysql> alter table foo convert to charset utf8;
Query OK, 0 rows affected (0.08 sec)
mysql> show full fields from foo\G
*************************** 1. row ***************************
Field: x
Type: char(10)
Collation: utf8_general_ci
Null: YES
Default: NULL
*************************** 2. row ***************************
Field: y
Type: varchar(20)
Collation: utf8_general_ci
Null: YES
Default: NULL
2 rows in set (0.00 sec)
> When I connect to database with my application, I get an ISO string.
Did you set character_set_results? I'm quite sure it defaults to latin.
> When I use SHOW FULL FIELDS FROM TABLE, I get such a table with
> information about table structure:
>
> Field, Type, Collation, Null, Key, Default, Extra, Privileges, Comment
>
> Collation is everywhere set to NULL.
This is not possible.
XL
Re: How to import dump file with some different charset
am 04.07.2006 12:00:23 von paff1
Axel Schwenke napisal(a):
> > I have seen it, but I didn't found what I need.
> > Thanks for your reply, yesterday at work I will try your suggestions.
> ^^^^^^^^^
> Ohh. Do you own a time machine? :-))
Do you need one? ;-)
> > When I connect to database with my application, I get an ISO string.
>
> Did you set character_set_results? I'm quite sure it defaults to latin.
That's right, I have just changed it - now everything is ok, thank you
very much.
The whole problem was more complicated than I thought. Database was
defined using latin1 and it was somebody mistake, because clients was
sending strings in latin2. Database was able just to store it and send
back when needed, then client interprete it again as latin2 and
everything was ok. The binary data was nowhere changed so it could work
correctly.
Then database was converted to utf from latin1 and client gets string
in utf. He just wrote it without any interpretation, so he saw signs
from latin1, although there should be latin2 signs...
To have correct strings he could only convert it back to latin1 and
then interprete as latin2 :)
I made a dump file, changed "latin1" substrings to "latin2" and
imported it. After this operation MySQL had still the same binary data,
but it already knows that it is latin2 charset so it could convert it
properly using ALTER TABLE.
Once again - thanks a lot, I wouldn't do it without your help.
Best regards
Pawel Marek