Thread freezes on SLAVE START/STOP

Thread freezes on SLAVE START/STOP

am 14.03.2003 08:38:37 von Pavel Merdine

>Description:
A thread in which SLAVE START/STOP was issued freezes. E.g.
'mysql' can be only exited by pressing Ctrl-C. Actually,
only first command is executed. Next are ignored.
Sideeffects tends to be table files corruption, 'master.info'
erase.
Now (4.0.11) they freeze even after server full restart.
We tried different mysql versions: 4.0.7, 4.0.8, 4.0.11
>How-To-Repeat:
The exact conditions are unknown. Sometimes they work. It
looks like fault happens after any replication error. (E.g.
table update error). But I'm not sure.
>Fix:
-

>Submitter-Id:
>Originator: Pavel Merdine
>Organization: Fotki Inc.
>MySQL support: none
>Synopsis: SLAVE START/STOP freezes
>Severity: serious
>Priority: medium
>Category: mysql
>Class: sw-bug
>Release: mysql-4.0.11-gamma-standard (Official MySQL-standard binary)

>C compiler: 2.95.4
>C++ compiler: 2.95.4
>Environment:
CPU: Dual Pentium III/Pentium III Xeon (451.03-MHz)
avail memory = 777900032 (759668K bytes)

System: FreeBSD dev.fotki.com 4.7-STABLE FreeBSD 4.7-STABLE
#1: Thu Jan 9 17:40:50 GMT 2003
Some paths: /usr/bin/perl /usr/bin/make /usr/local/bin/gmake
/usr/bin/gcc /usr/bin/cc
GCC: Using builtin specs.
gcc version 2.95.4 20020320 [FreeBSD]
Compilation info: CC='gcc' CFLAGS='-DHAVE_BROKEN_REALPATH'
CXX='g++' CXXFLAGS='' LDFLAGS='' ASFLAGS=''
LIBC:
-r--r--r-- 1 root wheel 1218496 Oct 9 08:43 /usr/lib/libc.a
lrwxrwxrwx 1 root wheel 9 Jan 8 05:16 /usr/lib/libc.so ->
libc.so.4
-r--r--r-- 1 root wheel 574916 Oct 9 08:43 /usr/lib/libc.so.4
Configure command: ./configure '--prefix=/usr/local/mysql'
'--with-comment=Official MySQL-standard binary'
'--with-extra-charsets=complex'
'--with-server-suffix=-standard' '--enable-thread-safe-client'
'--enable-local-infile' '--enable-assembler'
'--with-named-z-libs=not-used' '--disable-shared'
'--with-innodb' 'CFLAGS=-DHAVE_BROKEN_REALPATH'


------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13957@lists.mysql.com
To unsubscribe, e-mail

Re: Thread freezes on SLAVE START/STOP

am 14.03.2003 14:43:28 von Guilhem Bichot

> A thread in which SLAVE START/STOP was issued freezes. E.g.
> 'mysql' can be only exited by pressing Ctrl-C. Actually,
> only first command is executed. Next are ignored.
> Sideeffects tends to be table files corruption, 'master.info'
> erase.
> Now (4.0.11) they freeze even after server full restart.
> We tried different mysql versions: 4.0.7, 4.0.8, 4.0.11
>
> >How-To-Repeat:
>
> The exact conditions are unknown. Sometimes they work. It
> looks like fault happens after any replication error. (E.g.
> table update error). But I'm not sure.
>
> >Release: mysql-4.0.11-gamma-standard (Official MySQL-standard
> > binary)
> System: FreeBSD dev.fotki.com 4.7-STABLE FreeBSD 4.7-STABLE

Hi,

I believe you. But to investigate this, I need a proper How-to-Repeat :
please try to find a repeatable sequence of commands that always cause
the hang, then send them to bugs@lists.mysql.com together with the .err
files from your master and slave (these contain the error messages,
crucial info to understand what is happening).
Please also tell me under which OS your master runs.
Also expand this sentence

> Now (4.0.11) they freeze even after server full restart.

What has happened before restarting the server ? etc

If you can provide all this, I will forward it to our FreeBSD developers,
as it seems to be platform-dependent.

Regards,
Guilhem
--
MySQL 2003 Users Conference -> http://www.mysql.com/events/uc2003/
For technical support contracts, visit https://order.mysql.com/?ref=mgbi
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Mr. Guilhem Bichot
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Full-Time Software Developer
/_/ /_/\_, /___/\___\_\___/ Bordeaux, France
<___/ www.mysql.com

------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13958@lists.mysql.com
To unsubscribe, e-mail

Re: Re[2]: Thread freezes on SLAVE START/STOP

am 14.03.2003 17:14:52 von Guilhem Bichot

This is the most worrying (the first line) (more than the hang), it looks like the
slave only has the first characters of a longer query (see how SET has become S).
Then it is normal that the slave stops (as it is a bad query).

030130 5:15:48 Slave: error 'unexpected success or fatal error' on query 'UPDATE hits_albums S', error_code=0
030130 5:15:48 Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.007' position 303184804

The fact that the slave now fails immediately at startup is normal ;
you don't start the slave with --skip-slave-start so the slave automatically
does START SLAVE, and always hits this bad query.
For the moment, save your maybe strange master's binlog somewhere, then restart replication
from scratch : take a snapshot of your master, do RESET MASTER on the master,
start the slave with --skip-slave-start, do RESET SLAVE and START SLAVE on it, and see
if the problem still happens later.
(see manual for details before doing this !!!!).

Before this, to give me a bit of info, could you do
mysqlbinlog position -j303184804 binlog.007 | head
on your MASTER
and send me the output please ?

--
MySQL 2003 Users Conference -> http://www.mysql.com/events/uc2003/
For technical support contracts, visit https://order.mysql.com/?ref=mgbi
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Mr. Guilhem Bichot
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Full-Time Software Developer
/_/ /_/\_, /___/\___\_\___/ Bordeaux, France
<___/ www.mysql.com

------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13960@lists.mysql.com
To unsubscribe, e-mail

Re: Re[4]: Thread freezes on SLAVE START/STOP

am 14.03.2003 17:53:19 von Guilhem Bichot

> I investigated the trouble you are saying about.
> If I start replication again from that position using
> CHANGE MASTER TO
> then it starts without error. I made the same thing today. I took
> a position number from 'slave thread exited' line and issued
> CHANGE MASTER TO MASTER_HOST=...
> SLAVE START;
> And now it works again.

Yes, this is a clean solution, as the RESET I proposed.
CHANGE MASTER clears all relay logs, it's a clean start.
But I'm worried about the health of your slave tables (see below).

> I think the cause is the same as the cause of 'master.info' erase.
> It was erased once during a system shutdown.
> I mean relay logs are corrupted. Maybe it's because freezed thread
> does not close that files?

Possibly...

> One more thing. Maybe it's relevant.
> During January, the whole month, I was receiving messages like
> ERROR: 1062 Duplicate entry '2003-01-13-162546' for key 1
> But I'm 100% sure the table referred in this message could not have
> those keys.

>(because it has the same content that master has
Are you sure about these last words ? Most often there's a bug in
the replication code which makes the table on the slave become different
than the one on the master, but nobody notices it, until later an INSERT which
had succeeded on the master fails on the slave because of "Duplicate key".
In your case it is improbable that the tables were 100% identical if you got this
error.

> I even
> tried to copy tables from master and restart replication
> And I was receiving it on each day of January and stopped
> receiving them on 1st of February. Now I dont worry, because I do not
> receive them anymore. But I think maybe it's relevant to the freeze...

This is the Januray 2003 bug ;)

There's this in your .err :
"Could not add ignore table rule 'hits_albums'!"
Might be important.

Looks like later your tables got corrupted on the slave
("Incorrect key file for table:").
If your repair them on the slave, you have to check that after REPAIR they are
identical to the master. Because REPAIR can delete rows.

If possible the safest is clear everything on the slave and restart replication
from scratch. I recommend it, because there seems to be so many strange errors
(which I have never seen elsewhere) in the current database.

You may also consider a support contract, to have a developer login
in your system, go through your binlogs and tables etc.

--
MySQL 2003 Users Conference -> http://www.mysql.com/events/uc2003/
For technical support contracts, visit https://order.mysql.com/?ref=mgbi
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Mr. Guilhem Bichot
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Full-Time Software Developer
/_/ /_/\_, /___/\___\_\___/ Bordeaux, France
<___/ www.mysql.com

------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13961@lists.mysql.com
To unsubscribe, e-mail

Re[6]: Thread freezes on SLAVE START/STOP

am 14.03.2003 18:12:21 von Pavel Merdine

Hello ,

Friday, March 14, 2003, 7:53:19 PM, you wrote:

[skipped a little bit]
>> One more thing. Maybe it's relevant.
>> During January, the whole month, I was receiving messages like
>> ERROR: 1062 Duplicate entry '2003-01-13-162546' for key 1
>> But I'm 100% sure the table referred in this message could not have
>> those keys.
>>(because it has the same content that master has
> Are you sure about these last words ? Most often there's a bug in
> the replication code which makes the table on the slave become different
> than the one on the master, but nobody notices it, until later an INSERT which
> had succeeded on the master fails on the slave because of "Duplicate key".
> In your case it is improbable that the tables were 100% identical if you got this
> error.

>> I even
>> tried to copy tables from master and restart replication
>> And I was receiving it on each day of January and stopped
>> receiving them on 1st of February. Now I dont worry, because I do not
>> receive them anymore. But I think maybe it's relevant to the freeze...

> This is the Januray 2003 bug ;)

> There's this in your .err :
> "Could not add ignore table rule 'hits_albums'!"
> Might be important.

> Looks like later your tables got corrupted on the slave
> ("Incorrect key file for table:").
> If your repair them on the slave, you have to check that after REPAIR they are
> identical to the master. Because REPAIR can delete rows.

> If possible the safest is clear everything on the slave and restart replication
> from scratch. I recommend it, because there seems to be so many strange errors
> (which I have never seen elsewhere) in the current database.

Believe me, I tried it several times. Each time an error occurs I try
to find time and copy all tables from master. It does not help. And
it's not relevant to freeze problem I suppose.
About 'keys' problem I can say that it's defenitelly not problem with
tables. First, because I tried to copy tables from master and run
replication again during January. Second, because it was only during
January 2003. Code on the main server was not changed. The principle
was if record exists then UPDATE, and INSERT if not. I know that it's
not best solution, but it's for statistic purposes, so it's not
significant for us if there was an error.
"Could not add ignore table rule 'hits_albums'" means I tried to avoid
these errors by adding hits_albums to ignored table list. And I added.
Anyway I think that the current replication is not stable, because it
causes a lot of errors that are threaten as fatal. So, I have to spend
a lot of my time for the maintenance.

> You may also consider a support contract, to have a developer login
> in your system, go through your binlogs and tables etc.

It's not as simple as that. We have some private info in tables, so I
dont think my company is going to agree to do this.

--
/ Pavel Merdine


------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13962@lists.mysql.com
To unsubscribe, e-mail