Possible replication bug.

Possible replication bug.

am 07.05.2003 17:45:49 von Cory Patterson

Hello,

I am using mysql 4.0.12 on linux systems. I am running 1 master and 4
slave servers. I have run into a problem when mysql is shutdown
unexpectedly on the slave. When the slave comes back up, I get this in
the error log.

-- snip --
030506 16:00:53 mysqld started
030506 16:00:54 InnoDB: Started
/usr/libexec/mysqld: ready for connections.
Version: '4.0.12' socket: '/tmp/mysql.sock' port: 3306
030506 16:00:54 Slave I/O thread: connected to master
'replican@data:3306', replication started in log 'data-bin.009' at
position 856965203
030506 16:06:14 Error in Log_event::read_log_event(): 'read error',
data_len=5072,event_type=2
030506 16:06:14 Error reading relay log event: slave SQL thread
aborted because of I/O error
030506 16:06:14 Could not parse log event entry, check the master for
binlog corruption
This may also be a network problem, or just a bug in the master or
slave code.
030506 16:06:14 Error running query, slave SQL thread aborted. Fix the
problem, and restart the slave SQL thread with "SLAVE START". We
stopped at log 'data-bin.009' position 856958900
-- snip --

And show slave status on gives me:

-- snip --
*************************** 1. row ***************************
Master_Host: data
Master_User: replican
Master_Port: 3306
Connect_retry: 60
Master_Log_File: data-bin.009
Read_Master_Log_Pos: 912250296
Relay_Log_File: blade12-relay-bin.016
Relay_Log_Pos: 128865934
Relay_Master_Log_File: data-bin.009
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_do_db:
Replicate_ignore_db: ApacheLogs,PUFFY_DEV
Last_errno: 0
Last_error:
Skip_counter: 0
Exec_master_log_pos: 856958900
Relay_log_space: 184152312
-- snip --

When i check the binlog on the master at those 3 positions (because i
don't know which one is the relevant one) I get.

-- snip --
[root@data root]# mysqlbinlog -j 856958900 /raid/mysqldata/data-bin.009
| head -4
# at 856958900
#030506 15:59:56 server id 1 log_pos 856958900 Query
thread_id=2527333 exec_time=0 error_code=0
use PUFFY;
SET TIMESTAMP=1052251196;
[root@data root]# mysqlbinlog -j 856965203
/raid/mysqldata/data-bin.009 | head -4
# at 856965203
#030506 16:00:01 server id 1 log_pos 856965203 Intvar
SET INSERT_ID=15233299;
# at 856965231
[root@data root]# mysqlbinlog -j 912250296 /raid/mysqldata/data-bin.009
| head -4
# at 912250296
#030507 11:24:33 server id 1 log_pos 912250296 Query
thread_id=2682649 exec_time=0 error_code=0
use PUFFY;
SET TIMESTAMP=1052321073;
[root@data root]#
-- snip --

This happened to me in the past and i reloaded the slave from the
latest snapshot and let it catch up and everything was fine. I looked
through the docs and mailing list archives and couldn't find much on
how i can move forward or further diagnose the problem.

Thanks in advance for any help, and if any further information is
needed, please let me know.

Cory Patterson



--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org

Re: Possible replication bug.

am 08.05.2003 05:47:01 von Jeremy Zawodny

On Wed, May 07, 2003 at 11:45:49AM -0400, Cory Patterson wrote:
> Hello,
>
> I am using mysql 4.0.12 on linux systems. I am running 1 master and 4
> slave servers. I have run into a problem when mysql is shutdown
> unexpectedly on the slave. When the slave comes back up, I get this in
> the error log.
>
> -- snip --
> 030506 16:00:53 mysqld started
> 030506 16:00:54 InnoDB: Started
> /usr/libexec/mysqld: ready for connections.
> Version: '4.0.12' socket: '/tmp/mysql.sock' port: 3306
> 030506 16:00:54 Slave I/O thread: connected to master
> 'replican@data:3306', replication started in log 'data-bin.009' at
> position 856965203
> 030506 16:06:14 Error in Log_event::read_log_event(): 'read error',
> data_len=5072,event_type=2
> 030506 16:06:14 Error reading relay log event: slave SQL thread
> aborted because of I/O error
> 030506 16:06:14 Could not parse log event entry, check the master for
> binlog corruption
> This may also be a network problem, or just a bug in the master or
> slave code.
> 030506 16:06:14 Error running query, slave SQL thread aborted. Fix the
> problem, and restart the slave SQL thread with "SLAVE START". We
> stopped at log 'data-bin.009' position 856958900
> -- snip --

Yup. The last event in your relay log is corrupt.

I talked with Monty and others about this at the conference. I had an
idea about modifying mysqlbinlog to help fix the problem. In fact,
they could integrate the necessary changes into the server itself.

I don't know what the plan is for solving this problem. Really, the
SQL Thread needs to realize that the entry is corrupt, truncate the
log, and "rewind" the master.info counters so that it can pull the
event from the master again.

Jeremy
--
Jeremy D. Zawodny | Perl, Web, MySQL, Linux Magazine, Yahoo!
| http://jeremy.zawodny.com/

--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org

Re: Possible replication bug.

am 08.05.2003 20:55:44 von Andrew Houghton

We've seen this ourselves on a couple of our slaves, and I hadn't seen
any discussion or information about the problem until this thread. Last
time this occurred, I had no option but bringing two slaves out of
service (including the slave showing the problem) so I could
reconstitute the slave from a known-good replica.

I'd much, much rather know how to fix this on the offending slave by
simply removing all relay-bin and relay-bin.info files, and mucking with
the master.info counter, but I've never been brave enough to play with
it. What would the correct process be, here? And how dangerous a
process is it? If things don't work I can always fall back to the
method I used last time, but what I'm really scared of is a happily
replicating slave that missed a few inserts or updates.

- a.

Jeremy Zawodny wrote:

>On Wed, May 07, 2003 at 11:45:49AM -0400, Cory Patterson wrote:
>
>
>>Hello,
>>
>>I am using mysql 4.0.12 on linux systems. I am running 1 master and 4
>>slave servers. I have run into a problem when mysql is shutdown
>>unexpectedly on the slave. When the slave comes back up, I get this in
>>the error log.
>>
>>-- snip --
>>030506 16:00:53 mysqld started
>>030506 16:00:54 InnoDB: Started
>>/usr/libexec/mysqld: ready for connections.
>>Version: '4.0.12' socket: '/tmp/mysql.sock' port: 3306
>>030506 16:00:54 Slave I/O thread: connected to master
>>'replican@data:3306', replication started in log 'data-bin.009' at
>>position 856965203
>>030506 16:06:14 Error in Log_event::read_log_event(): 'read error',
>>data_len=5072,event_type=2
>>030506 16:06:14 Error reading relay log event: slave SQL thread
>>aborted because of I/O error
>>030506 16:06:14 Could not parse log event entry, check the master for
>>binlog corruption
>>This may also be a network problem, or just a bug in the master or
>>slave code.
>>030506 16:06:14 Error running query, slave SQL thread aborted. Fix the
>>problem, and restart the slave SQL thread with "SLAVE START". We
>>stopped at log 'data-bin.009' position 856958900
>>-- snip --
>>
>>
>
>Yup. The last event in your relay log is corrupt.
>
>I talked with Monty and others about this at the conference. I had an
>idea about modifying mysqlbinlog to help fix the problem. In fact,
>they could integrate the necessary changes into the server itself.
>
>I don't know what the plan is for solving this problem. Really, the
>SQL Thread needs to realize that the entry is corrupt, truncate the
>log, and "rewind" the master.info counters so that it can pull the
>event from the master again.
>
>Jeremy
>
>


--
MySQL Bugs Mailing List
For list archives: http://lists.mysql.com/bugs
To unsubscribe: http://lists.mysql.com/bugs?unsub=gcdmb-bugs@m.gmane.org