Replication breakage when Heartbeat Failover occurs

am 12.08.2009 09:08:58 von Imran Chaudhry

I want to fix a replication issue with a 2-node cluster (one active,
one passive) that is using Heartbeat for failover. The nodes are in
Master-Master configuration (that is, each is the slave and master of
the other).

I have several other hosts that are replication slaves from the active
node. They connect to MySQL via TCP over an SSH tunnels.

When failover occurs, the passive node becomes the active node.
However the replication slaves stop replicating. The error from a log
on one of the slaves is:

Jul 15 07:43:32 mysqld[1339]: 090715 7:43:32 [Note] Slave I/O
thread: conn
ected to master '@127.0.0.1:3307', replication started in log
'mysql-bin.00
0978' at position 23923243
Jul 15 07:43:32 mysqld[1339]: 090715 7:43:32 [ERROR] Error
reading packet
from server: Could not find first log file name in binary log index file ( serve
r_errno=1236)
Jul 15 07:43:32 mysqld[1339]: 090715 7:43:32 [ERROR] Got fatal
error 1236:
'Could not find first log file name in binary log index file' from master when
reading data from binary log
Jul 15 07:43:32 mysqld[1339]: 090715 7:43:32 [Note] Slave I/O thread
exiting, read up to log 'mysql-bin.000978', position 23923243

I do not think this is an SSH tunnel issue. I believe this is because
of inconsistent binary log file names and positions between the two
nodes. Probably because one of the nodes had been in operation a lot
longer than the other.

At the moment I have to get replication going by dumping the master
databases again, re-import to the slave hosts and bootstrap the
slaves.

What is the best way to make this consistent and ensure that
replication continues smoothly after a failover (and failback) event?

Thank you,
Imran Chaudhry

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Replication breakage when Heartbeat Failover occurs

am 12.08.2009 10:18:12 von Walter Heck

--00163646d086c8f9d80470ed7524
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

Hi Imran,

Have a look at MySQL MMM for Multi-Master Replication failover. The project
is currently in refurbishment when ti comes to having a home, but you can
start by looking at : http://mysql-mmm.org for information.

This project is made for exactly what you want to achieve: Having multiple
masters and multiple slaves with automatic failover.

Hope this helps!

Walter

On Wed, Aug 12, 2009 at 09:08, Imran Chaudhry wrote:

> I want to fix a replication issue with a 2-node cluster (one active,
> one passive) that is using Heartbeat for failover. The nodes are in
> Master-Master configuration (that is, each is the slave and master of
> the other).
>
> I have several other hosts that are replication slaves from the active
> node. They connect to MySQL via TCP over an SSH tunnels.
>
> When failover occurs, the passive node becomes the active node.
> However the replication slaves stop replicating. The error from a log
> on one of the slaves is:
>
> Jul 15 07:43:32 mysqld[1339]: 090715 7:43:32 [Note] Slave I/O
> thread: conn
> ected to master '@127.0.0.1:3307', replication started in log
> 'mysql-bin.00
> 0978' at position 23923243
> Jul 15 07:43:32 mysqld[1339]: 090715 7:43:32 [ERROR] Error
> reading packet
> from server: Could not find first log file name in binary log index file (
> serve
> r_errno=1236)
> Jul 15 07:43:32 mysqld[1339]: 090715 7:43:32 [ERROR] Got fatal
> error 1236:
> 'Could not find first log file name in binary log index file' from master
> when
> reading data from binary log
> Jul 15 07:43:32 mysqld[1339]: 090715 7:43:32 [Note] Slave I/O
> thread
> exiting, read up to log 'mysql-bin.000978', position 23923243
>
> I do not think this is an SSH tunnel issue. I believe this is because
> of inconsistent binary log file names and positions between the two
> nodes. Probably because one of the nodes had been in operation a lot
> longer than the other.
>
> At the moment I have to get replication going by dumping the master
> databases again, re-import to the slave hosts and bootstrap the
> slaves.
>
> What is the best way to make this consistent and ensure that
> replication continues smoothly after a failover (and failback) event?
>
> Thank you,
> Imran Chaudhry
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql?unsub=lists@olindata.com
>
>

--
Walter Heck, Engineer @ Open Query (http://openquery.com)
Affordable Training and ProActive Support for MySQL & related technologies

Follow our blog at http://openquery.com/blog/
OurDelta: free enhanced builds for MySQL @ http://ourdelta.org

--00163646d086c8f9d80470ed7524--