4.0.4 replication slave "hangs" after master breakdown+reboot

4.0.4 replication slave "hangs" after master breakdown+reboot

am 15.10.2002 02:49:02 von Michael Zimmermann

Hi,

I've not included a script for the following
problem, because it's just the setup plus
some commands to be run as root or with
process privileges.

Please tell me, if I should test something
or provide more specific information.
A good open source product deserves my support
as well as yours.


Environment:

Circular replication with 3 machines.
Systems are x86 SuSE 8.0/7.3 Linux, MySQL-max
version 4.0.4 is used on all machines
(installed from the mysql.org-RPMs).
Updates are few, but regular (every 3 minutes)
and done on all 3 machines (conflict-free
in different rows of the same table).
Linux distro- or kernel-version don't seem
to play a role, several combinations of
SuSE-distro (8.0 - 8.0, 7.3 - 8.0, 8.0 - 7.3)
or kernels (2.4.10, 2.4.19, 2.4.20-pre10)
reproduce the same problem.


Problem and
How-To-Repeat:

After one server goes down the "hard" way
(which can be simulated with a "rcnetwork stop")
and comes up through a reboot the master-process
naturally opens a new bin-log.

But the slave-process on the next machine in the
replication-chain keeps 'hanging' on a position
in the previous bin-log (Slave is running,
Slave IO is also 'Yes') - probably the position
when its master jumped down the cliff without
prior notice.

A 'slave stop;' plus 'slave start;' solves
this problem, but the startup is not done
automatically. As if the slave process
is still listening on the socket of the
dead connection and has not recognized
that there is no longer somebody on the
other side.

No corrupted data on any machine or
conflicting updates or the like,
just this inability to resume the slave-
operations without that manual 'push'.
Without that slave start+stop the hanging
occurs 'forever' (much longer than the
master-retry time, which is kept at the
default of 60 seconds).

If the reboot on the master is done normally
(without stopping the network first),
then everything works fine.


Greetings
Michael
--
Michael Zimmermann (http://vegaa.de)

---------------- my.cnf -----------------
# (identical on all machines, except for
# server-id and master-host of course)


[client]
port =
socket = /var/lib/mysql/mysql.sock

[mysqld]
port =
socket = /var/lib/mysql/mysql.sock
skip-locking
skip-innodb
set-variable = key_buffer=16M
set-variable = max_allowed_packet=16M
set-variable = table_cache=64
set-variable = sort_buffer=512K
set-variable = net_buffer_length=8K
set-variable = myisam_sort_buffer_size=8M
log-bin
log-slave-updates
server-id =
master-host=
master-user=
master-password=


------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread12728@lists.mysql.com
To unsubscribe, e-mail