Re: Replication race bug in sequential file cache (mf_iocache.cc)

Re: Replication race bug in sequential file cache (mf_iocache.cc)

am 28.06.2002 19:56:28 von Nils Ulltveit-Moe

Hi Michael

Michael Widenius writes:
> Nils> To try to hunt down the bug, I can modify the patch to use
> Nils> pthreads_mutex_trylock() instead of pthreads_lock() and then
> Nils> deliberately dump core if two processes are inside my_b_write(=
) working
> Nils> on the same buffer at the same time. Hopefully the stack trace=
will
> Nils> then be useful to find the bug.
>=20
> Please do that; It would help us a lot if you can prove that we have=
a
> race condition in the logging code. To get a back trace from where
> this happens would of course be of great help.
>=20
> By the way, when this happened, where in the binary log file was the=

> error ? (about which position in how big file).
>=20
> I just want to be sure that this didn't happen around the time when
> MySQL started to use a new binary log file.

I checked using csplit, and the problem occured at
approximately 1/3 of the log file, so the problem does not seem to
happen when a new log fil is started:

File size:
-rw-rw---- 1 mysql mysql 2160469 Jun 12 06:25 /var/lib/mysql/=
snort-bin.118

Fault position:
snort:~# csplit /var/lib/mysql/snort-bin.118 '/767\,\ /'
605129
1555340

As I stated in my initial mail, the log showed the mangled INSERT
sentences below, in which an insert to the signature table was
overwritten by an insert from the event table in Snort, as the excerpt
from the log below shows:

=09INSERT INTO signature (sid, sig_id, sig_name,sig_rev,sig_sid)
VALUES (1,767, 'spp_portscan: End of portscan=20
=C2"=À)͊snortINSERT INTO event (sid,cid,signatu;

I have experienced corrupt event tables in Snort 3 times before, but I =
did
not connect this with the log corruption problem until now, but if
there is a bug when inserting into the database, it may affect either
the signature or the event table in the above scenario.=20

You have already shown that the logging classes properly locks access t=
o the
IO-cache, so the problem is probably not there.

I was able to confirm that the INSERT INTO signature with sig_id 767
was properly inserted, but I was not able to verify the cut-off INSERT
INTO event, as I cannot see any keys in the log excerpt.

However, I found the event table to be corrupt some days after, and to
recover the event table by using:

myisamchk -r -q /var/lib/mysql/snort/event.MYI

This may indicate that a table corruption occured in the event table
from the race bug above. On the other hand, the corrupt event table
may not be connected to the race bug. I do not know.

I am running a patched version that should give a stack trace if the
bug happens again, given that the bug occurs in the IO-cache. If it
happens elsewhere, I will not find it. So far, the system has not
crashed, so we just have to wait and see..

Have a nice summer!

Mvh.
Nils Ulltveit-Moe


------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread12152@lists.mysql.com
To unsubscribe, e-mail