3 broken tables at once

3 broken tables at once

am 21.11.2002 13:34:25 von mmokrejs

Hi,
right now, 3 table broke virtually at the very same moment
on one of our mysql servers:

-rw-rw---- 1 mysql mysql 462344 Nov 20 14:00 Ncrassa_work_int/orf.MYD
-rw-rw---- 1 mysql mysql 338944 Nov 21 13:13 Ncrassa_work_int/orf.MYI
-rw-rw---- 1 mysql mysql 8836 Oct 7 13:05 Ncrassa_work_int/orf.frm
-rw-rw---- 1 mysql mysql 17956600 Nov 20 14:11 Ncrassa_work_int/rep.MYD
-rw-rw---- 1 mysql mysql 712704 Nov 21 13:13 Ncrassa_work_int/rep.MYI
-rw-rw---- 1 mysql mysql 13701 Oct 7 13:07 Ncrassa_work_int/rep.frm
-rw-rw---- 1 mysql mysql 610852 Nov 20 14:12 Ncrassa_work_int/sel_funcat.MYD
-rw-rw---- 1 mysql mysql 297984 Nov 21 13:13 Ncrassa_work_int/sel_funcat.MYI
-rw-rw---- 1 mysql mysql 8734 Oct 7 13:07 Ncrassa_work_int/sel_funcat.frm

The clients use perl based cgi scripts to access and modify tables.
We are running mysql-BK20021104-debug on linux i686, SMP, 1GB RAM system
with 2.4.20-pre5-ac1.

The repair went fine. I've decided to share this information with you
anyway, as you have recently investigated the Brucella_abortus
broken table (broken in a different fashion, I know), but as these tables
broke at once and there's no message in errorlog file, I wish this gives you
a clue.

mysql> repair table rep;
+----------------------+--------+----------+---------------- ----------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+----------------------+--------+----------+---------------- ----------------------------------------------------+
| Ncrassa_work_int.rep | repair | info | Found link that points at 17956600 (outside data file) at 14263756 |
| Ncrassa_work_int.rep | repair | info | Found link that points at 17956716 (outside data file) at 14265896 |
| Ncrassa_work_int.rep | repair | warning | Number of rows changed from 5682 to 5680 |
| Ncrassa_work_int.rep | repair | status | OK |
+----------------------+--------+----------+---------------- ----------------------------------------------------+
4 rows in set (3.35 sec)

mysql> repair table orf;
+----------------------+--------+----------+---------------- ------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+----------------------+--------+----------+---------------- ------------------------------------------------+
| Ncrassa_work_int.orf | repair | info | Found link that points at 462344 (outside data file) at 370132 |
| Ncrassa_work_int.orf | repair | info | Found link that points at 462368 (outside data file) at 370204 |
| Ncrassa_work_int.orf | repair | warning | Number of rows changed from 5682 to 5680 |
| Ncrassa_work_int.orf | repair | status | OK |
+----------------------+--------+----------+---------------- ------------------------------------------------+
4 rows in set (0.34 sec)

mysql> repair table sel_funcat;
+-----------------------------+--------+----------+--------- ---------------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------------------+--------+----------+--------- ---------------------------------+
| Ncrassa_work_int.sel_funcat | repair | warning | Number of rows changed from 9477 to 9474 |
| Ncrassa_work_int.sel_funcat | repair | status | OK |
+-----------------------------+--------+----------+--------- ---------------------------------+
2 rows in set (0.33 sec)

How-To-Repeat:

I've uploaded into your secret ftp directory file Ncrassa_work_int.tgz with the 3 tables before repair.
I should note that after we've found the problem, I've done flush-tables command and then
backed up the datafiles. I should inspect the the timestamps before, as I believe the flush command
updated some of the files (indexfiles?).
--
Martin Mokrejs ,
PGP5.0i key is at http://www.natur.cuni.cz/~mmokrejs
MIPS / Institute for Bioinformatics
GSF - National Research Center for Environment and Health
Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany
tel.: +49-89-3187 3683 , fax: +49-89-3187 3585


------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13060@lists.mysql.com
To unsubscribe, e-mail

Re: 3 broken tables at once

am 21.11.2002 14:00:42 von mmokrejs

On Thu, 21 Nov 2002, Martin MOKREJ© wrote:

Hi,
there 2 more table which was broken at that time, so 5 tables simultaneously!

mysql> check table orf_data;
+---------------------------+-------+----------+------------ -----------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+---------------------------+-------+----------+------------ -----------------------------------------------+
| Ncrassa_work_int.orf_data | check | warning | Table is marked as crashed |
| Ncrassa_work_int.orf_data | check | warning | 3 clients is using or hasn't closed the table properly |
| Ncrassa_work_int.orf_data | check | error | Size of datafile is: 10587400 Should be: 10587468 |
| Ncrassa_work_int.orf_data | check | error | Corrupt |
+---------------------------+-------+----------+------------ -----------------------------------------------+
4 rows in set (0.05 sec)

Please find the output below equivalent to `check table prot_data' output.
[@localhost.:3306 Ncrassa_work_int.prot_data]: check table prot_data
[@localhost.:3306 Ncrassa_work_int.prot_data]: Ncrassa_work_int.prot_data check warning Table is marked as crashed
[@localhost.:3306 Ncrassa_work_int.prot_data]: Ncrassa_work_int.prot_data check warning 3 clients is using or hasn't closed the table properly
[@localhost.:3306 Ncrassa_work_int.prot_data]: Ncrassa_work_int.prot_data check error Size of datafile is: 3490436 Should be: 3490500
[@localhost.:3306 Ncrassa_work_int.prot_data]: Ncrassa_work_int.prot_data check error Corrupt


-rw-rw---- 1 mysql mysql 10587400 Nov 20 14:00 Ncrassa_work_int/orf_data.MYD
-rw-rw---- 1 mysql mysql 253952 Nov 21 13:45 Ncrassa_work_int/orf_data.MYI
-rw-rw---- 1 mysql mysql 8744 Oct 7 13:05 Ncrassa_work_int/orf_data.frm

How-To-Repeat:

Please look also into Ncrassa_work_int_2.tgz and Ncrassa_work_int_3.tgz.

On these tables, mysql complains:

mysql> use Ncrassa_work_int
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Didn't find any fields in table 'orf_data'
Didn't find any fields in table 'prot_data'
Database changed
mysql>

Is this message expected for crashed tables?

mysql> repair table orf_data;
+---------------------------+--------+----------+----------- --------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+---------------------------+--------+----------+----------- --------------------------------------------------------+
| Ncrassa_work_int.orf_data | repair | info | Found link that points at 10587400 (outside data file) at 8423332 |
| Ncrassa_work_int.orf_data | repair | info | Found link that points at 10587424 (outside data file) at 8424752 |
| Ncrassa_work_int.orf_data | repair | warning | Number of rows changed from 5682 to 5680 |
| Ncrassa_work_int.orf_data | repair | status | OK |
+---------------------------+--------+----------+----------- --------------------------------------------------------+
4 rows in set (1.32 sec)

mysql> repair table prot_data;
+----------------------------+--------+----------+---------- --------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+----------------------------+--------+----------+---------- --------------------------------------------------------+
| Ncrassa_work_int.prot_data | repair | info | Found link that points at 3490436 (outside data file) at 2772116 |
| Ncrassa_work_int.prot_data | repair | info | Found link that points at 3490460 (outside data file) at 2772520 |
| Ncrassa_work_int.prot_data | repair | warning | Number of rows changed from 5682 to 5680 |
| Ncrassa_work_int.prot_data | repair | status | OK |
+----------------------------+--------+----------+---------- --------------------------------------------------------+
4 rows in set (0.77 sec)

--
Martin Mokrejs ,
PGP5.0i key is at http://www.natur.cuni.cz/~mmokrejs
MIPS / Institute for Bioinformatics
GSF - National Research Center for Environment and Health
Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany
tel.: +49-89-3187 3683 , fax: +49-89-3187 3585


------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13061@lists.mysql.com
To unsubscribe, e-mail

Re: 3 broken tables at once

am 22.11.2002 00:10:55 von Sergei Golubchik

Hi!

On Nov 21, Martin MOKREJ© wrote:
> On Thu, 21 Nov 2002, Martin MOKREJ? wrote:
>
> On these tables, mysql complains:
>
> mysql> use Ncrassa_work_int
> Reading table information for completion of table and column names
> You can turn off this feature to get a quicker startup with -A
>
> Didn't find any fields in table 'orf_data'
> Didn't find any fields in table 'prot_data'
> Database changed
> mysql>
>
> Is this message expected for crashed tables?

Yes.

Regards,
Sergei

--
MySQL Development Team
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Sergei Golubchik
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, http://www.mysql.com/
/_/ /_/\_, /___/\___\_\___/ Osnabrueck, Germany
<___/

------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13069@lists.mysql.com
To unsubscribe, e-mail

Re: 3 broken tables at once

am 22.11.2002 10:59:29 von mmokrejs

On Fri, 22 Nov 2002, Sergei Golubchik wrote:

Hi,

> > Didn't find any fields in table 'orf_data'
> > Didn't find any fields in table 'prot_data'
> > Database changed
> > mysql>
> >
> > Is this message expected for crashed tables?
>
> Yes.

Thanks. It is interresting that REPAIR deleted last two entries changed:

code 4nc400_250 and 4nc400_250. We had to introduce those 2 lost rows
manually back into all the tables. :(

--
Martin Mokrejs ,
PGP5.0i key is at http://www.natur.cuni.cz/~mmokrejs
MIPS / Institute for Bioinformatics
GSF - National Research Center for Environment and Health
Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany
tel.: +49-89-3187 3683 , fax: +49-89-3187 3585



------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13072@lists.mysql.com
To unsubscribe, e-mail

Re: 3 broken tables at once

am 22.11.2002 11:29:53 von Peter Zaitsev

On Friday 22 November 2002 12:59, Martin MOKREJÅ  wrote:
> On Fri, 22 Nov 2002, Sergei Golubchik wrote:
>
> Hi,
>
> > > Didn't find any fields in table 'orf_data'
> > > Didn't find any fields in table 'prot_data'
> > > Database changed
> > > mysql>
> > >
> > > Is this message expected for crashed tables?
> >
> > Yes.
>
> Thanks. It is interresting that REPAIR deleted last two entries changed:
>
> code 4nc400_250 and 4nc400_250. We had to introduce those 2 lost rows
> manually back into all the tables. :(

Do you have dynamic row format in your tables ?

If it is so this could be quite well explained as MySQL might had to fragment
the row to store new version of it.

By the way did not you try to Run stable kernel on your system to make sure
this is not the issue. 2.4.20-pre5-ac1 does not seems to be the most used
in production kernel.

I do not know which version of Linux are you using but you might wish to try
stable kernel from your vendor or stock stable kernel.

P.S Can't you still write the stress test application which would lead to
table corruption after some uptime on your system ? This would be good for us
to try to debug the problem.


--
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Mr. Peter Zaitsev
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Full-Time Developer
/_/ /_/\_, /___/\___\_\___/ Moscow, Russia
<___/ www.mysql.com M: +7 095 725 4955


------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13073@lists.mysql.com
To unsubscribe, e-mail

Re: 3 broken tables at once

am 22.11.2002 11:46:28 von mmokrejs

On Fri, 22 Nov 2002, Peter Zaitsev wrote:

> On Friday 22 November 2002 12:59, Martin MOKREJÅ  wrote:
> > On Fri, 22 Nov 2002, Sergei Golubchik wrote:

> > Thanks. It is interresting that REPAIR deleted last two entries changed:
> >
> > code 4nc400_250 and 4nc400_250. We had to introduce those 2 lost rows
> > manually back into all the tables. :(
>
> Do you have dynamic row format in your tables ?

There are many varchar columns, so yes.

> If it is so this could be quite well explained as MySQL might had to fragment
> the row to store new version of it.

Hmm, but why had the whole row be deleted, not only those lastly appended
fragments?

> By the way did not you try to Run stable kernel on your system to make sure
> this is not the issue. 2.4.20-pre5-ac1 does not seems to be the most used
> in production kernel.

Well, I think we will stick with 2.4.20, after it's out. We never had such
a problem (5 crashes at once). Yes, we used to use stable kernel releases
and used to use mysql binary releases, but since some time ago we use the
bitkeeper and I do run more often gdb on mysqld ... ;)

> I do not know which version of Linux are you using but you might wish to try
> stable kernel from your vendor or stock stable kernel.

Debian.

> P.S Can't you still write the stress test application which would lead to
> table corruption after some uptime on your system ? This would be good for us
> to try to debug the problem.

Actually I'm pretty satisfied with this machine and kernel, as -ac version
is sometimes much better then version of Linus. ;) And, mainly we use
testing kernel because of memory allocation problems, which possibly
cause problems to mysql to open tables time to time:
"read_const: Got error 127 when reading table" ..., although kernel
doesn't complain couple of months about allocation problems.

Any suggestion how could the stress-test look like in this case?
Thanks!
--
Martin Mokrejs ,
PGP5.0i key is at http://www.natur.cuni.cz/~mmokrejs
MIPS / Institute for Bioinformatics
GSF - National Research Center for Environment and Health
Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany
tel.: +49-89-3187 3683 , fax: +49-89-3187 3585


------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13074@lists.mysql.com
To unsubscribe, e-mail

Re: 3 broken tables at once

am 22.11.2002 13:06:54 von Peter Zaitsev

On Friday 22 November 2002 13:46, Martin MOKREJÅ  wrote:

>
> > If it is so this could be quite well explained as MySQL might had to
fragment
> > the row to store new version of it.
>
> Hmm, but why had the whole row be deleted, not only those lastly appended
> fragments?

Actually it is not just "last" appended fragment. You do not always have the
field which growths in size to be last in the row, so what you are advicing
might be just converting the row to peice of scrap.

In such case it is better to remove the record than to have it trashed.

>
> > By the way did not you try to Run stable kernel on your system to make
sure
> > this is not the issue. 2.4.20-pre5-ac1 does not seems to be the most
used
> > in production kernel.
>
> Well, I think we will stick with 2.4.20, after it's out. We never had such
> a problem (5 crashes at once).

Sorry. What problem do you speak about ? Did you have it with your current
kernel or previous stable kernel.

This is rather strange issue as recent 2.4.x kernels seems to be rather
stable.

> Yes, we used to use stable kernel releases
> and used to use mysql binary releases, but since some time ago we use the
> bitkeeper and I do run more often gdb on mysqld ... ;)

Well. Basically this is not what you usually want to do, at least if you have
run in trouble. We do extensive testing of MySQL at realease time so it
should be relatevely free from various mistakes and compiler issues, while
self compiled BK releases both may have untested code and wrong compilation
issues.

The situation with kernel is exactly the same. It might be OK to run
development kernel if it works for you but in case you run into problems with
system it is better to check if you still have them with configuration which
is extensively tested by community.

Currently in your case we have all:
- Untested release from BK
- Self compiled (Unfortunately there are planty of known issues MySQL has
with various version of GLIBC and Compilers)
- Running on not release kernel version.

The bug you're reproting can't be just easily checked or I would not ask you
to try other configuration/repeat the bug. It is really tricky to hunt
corruption bug without repeatable case.


>
> Actually I'm pretty satisfied with this machine and kernel, as -ac version
> is sometimes much better then version of Linus. ;)

Why do not you try Redhat kernel Release in this case ? It contains most of
Alans code but it is extensively tested on many systems/load types
configurations.


> And, mainly we use
> testing kernel because of memory allocation problems, which possibly
> cause problems to mysql to open tables time to time:
> "read_const: Got error 127 when reading table" ..., although kernel
> doesn't complain couple of months about allocation problems.

This is very important notice. Error 127 means "Record file is Crashed" -
MySQL detects error in data and refuses to continue.

If you had this problems following allocation error prints this could indicate
corruption happens in case of kernel allocation failures. This might be the
same issue as you have now. New kernel might just do not print warnings any
more for subequent allocation failures (in many cases allocation attempted in
a loop several times on kernel level as it is quite important to satisfy it)

Could you please try configuring your system not to experience memory pressure
which can lead to such problems: add good amount of swap space, configure
kernel to try to keep large amount of pages free. Also you might compile
kernel with debugging prints so in case of "warning" situation you will have
messages printed.

>
> Any suggestion how could the stress-test look like in this case?

Basically it should emulate more or less your application good enough to
reproduce the error for sure on your system. You might use forkN.pl scripts
as a base.

It is possible you might just record queries you have using MySQL logging and
run it from one or several sessions...


--
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Mr. Peter Zaitsev
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Full-Time Developer
/_/ /_/\_, /___/\___\_\___/ Moscow, Russia
<___/ www.mysql.com M: +7 095 725 4955


------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13078@lists.mysql.com
To unsubscribe, e-mail

Re: 3 broken tables at once

am 22.11.2002 13:53:15 von mmokrejs

On Fri, 22 Nov 2002, Peter Zaitsev wrote:

> On Friday 22 November 2002 13:46, Martin MOKREJÅ  wrote:

> > Well, I think we will stick with 2.4.20, after it's out. We never had such
> > a problem (5 crashes at once).
>
> Sorry. What problem do you speak about ? Did you have it with your current
> kernel or previous stable kernel.

I just wanted to say that I've never observed something like this (5
crashed tables at the very same moment). Not even under devel kernel
version and/or devel mysql version.

> This is rather strange issue as recent 2.4.x kernels seems to be rather
> stable.

That's why I prefer them.

> > Yes, we used to use stable kernel releases
> > and used to use mysql binary releases, but since some time ago we use the
> > bitkeeper and I do run more often gdb on mysqld ... ;)
>
> Well. Basically this is not what you usually want to do, at least if you have
> run in trouble. We do extensive testing of MySQL at realease time so it

But I've reported couple of bugs and want to use BK code, which has those
problems fixed. I'm still hunting at least one know bug to me and that's
why I run "-g3" binaries.

> The bug you're reproting can't be just easily checked or I would not ask you
> to try other configuration/repeat the bug. It is really tricky to hunt
> corruption bug without repeatable case.

I believe you. ;)

> > Actually I'm pretty satisfied with this machine and kernel, as -ac version
> > is sometimes much better then version of Linus. ;)
>
> Why do not you try Redhat kernel Release in this case ? It contains most of
> Alans code but it is extensively tested on many systems/load types
> configurations.

And there are some RedHat specific modifications, like syscall table.

> > And, mainly we use
> > testing kernel because of memory allocation problems, which possibly
> > cause problems to mysql to open tables time to time:
> > "read_const: Got error 127 when reading table" ..., although kernel
> > doesn't complain couple of months about allocation problems.
>
> This is very important notice. Error 127 means "Record file is Crashed" -
> MySQL detects error in data and refuses to continue.

Yes, but kernel doesn't report any problem and the table actually when
checked later on is completely fine. But, at some point mysqld thought
it's crashed. I've no idea if I'll ever manage to debug this. It just
happens.

> If you had this problems following allocation error prints this could indicate
> corruption happens in case of kernel allocation failures. This might be the
> same issue as you have now. New kernel might just do not print warnings any
> more for subequent allocation failures (in many cases allocation attempted in
> a loop several times on kernel level as it is quite important to satisfy it)

But kernel used to report such problems, but doesn't anymore. But, I still
look for changes in the kernel regarding VM management.

> Could you please try configuring your system not to experience memory pressure
> which can lead to such problems: add good amount of swap space, configure
> kernel to try to keep large amount of pages free. Also you might compile
> kernel with debugging prints so in case of "warning" situation you will have
> messages printed.

I know what helps: reducing table cache.

> > Any suggestion how could the stress-test look like in this case?
>
> Basically it should emulate more or less your application good enough to
> reproduce the error for sure on your system. You might use forkN.pl scripts
> as a base.
>
> It is possible you might just record queries you have using MySQL logging and
> run it from one or several sessions...

Thanks for suggestions.

--
Martin Mokrejs ,
PGP5.0i key is at http://www.natur.cuni.cz/~mmokrejs
MIPS / Institute for Bioinformatics
GSF - National Research Center for Environment and Health
Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany
tel.: +49-89-3187 3683 , fax: +49-89-3187 3585



------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13079@lists.mysql.com
To unsubscribe, e-mail