Mismatches

am 03.01.2011 02:10:38 von Leslie Rhorer

OK, I asked this question here before, and I got no answer
whatsoever. I wasn't too concerned previously, but now that I lost the
entire array the last time I tried to do a growth, I am truly concerned.
Would someone please answer my question this time, and perhaps point me
toward a resolution? The monthly array check just finished on my main
machine. For many months, this happened at the first of the month and
completed without issue and with zero mismatches. As of a couple of months
ago, it started to report large numbers of mismatches. It just completed
this afternoon with the following:

RebuildFinished /dev/md0 mismatches found: 96614968

Now, 96,000,000 mismatches would seem to be a matter of great
concern, if you ask me. How can there be any, really, when the entire array
- all 11T - was re-written just a few weeks ago? How can I find out what
the nature of these mismatches is, and how can I correct them without
destroying the data on the array? How can I look to prevent them in the
future? I take it the monthly checkarray routine (which basically
implements ` echo check > /sys/block/md0/md/sync_action`) does not attempt
to fix any errors it finds?

I just recently found out md uses simple parity to try to maintain
the validity of the data. I had always thought it was ECC. With simple
parity it can be difficult or even impossible to tell which data member is
in error, given two conflicting members. Where should I go from here? Can
I use `echo repair > /sys/block/md0/md/sync_action` with impunity? What,
exactly, will this do when it comes across a mismatch between one or more
members?

RAID6 array
mdadm - v2.6.7.2
kernel 2.6.26-2-amd64

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Mismatches

am 03.01.2011 02:22:30 von Mark Knecht

On Sun, Jan 2, 2011 at 5:10 PM, Leslie Rhorer wro=
te:
>
> Â Â Â Â OK, I asked this question here before, and=
I got no answer
> whatsoever. Â I wasn't too concerned previously, but now that I l=
ost the
> entire array the last time I tried to do a growth, I am truly concern=
ed.
> Would someone please answer my question this time, and perhaps point =
me
> toward a resolution? Â The monthly array check just finished on m=
y main
> machine. Â For many months, this happened at the first of the mon=
th and
> completed without issue and with zero mismatches. Â As of a coupl=
e of months
> ago, it started to report large numbers of mismatches. Â It just =
completed
> this afternoon with the following:
>
> RebuildFinished /dev/md0 mismatches found: 96614968
>
> Â Â Â Â Now, 96,000,000 mismatches would seem to b=
e a matter of great
> concern, if you ask me. Â How can there be any, really, when the =
entire array
> - all 11T - was re-written just a few weeks ago? Â How can I find=
out what
> the nature of these mismatches is, and how can I correct them without
> destroying the data on the array? Â How can I look to prevent the=
m in the
> future? Â I take it the monthly checkarray routine (which basical=
ly
> implements ` echo check > /sys/block/md0/md/sync_action`) does not at=
tempt
> to fix any errors it finds?
>
> Â Â Â Â I just recently found out md uses simple p=
arity to try to maintain
> the validity of the data. Â I had always thought it was ECC. Â =
With simple
> parity it can be difficult or even impossible to tell which data memb=
er is
> in error, given two conflicting members. Â Where should I go from=
here? Â Can
> I use `echo repair > /sys/block/md0/md/sync_action` with impunity? =C2=
=A0What,
> exactly, will this do when it comes across a mismatch between one or =
more
> members?
>
> RAID6 array
> mdadm - v2.6.7.2
> kernel 2.6.26-2-amd64

What commands are you running? Is it just the sync_action or other
things in addition?

It was my understanding when researching RAID6 last week that the two
parity calculations are actually different. Only one of them is simple
parity. The the other is part of an advanced math degree. ;-)
(According to Wikipedia, not a definitive source for sure...)

- Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Mismatches

am 03.01.2011 02:35:35 von NeilBrown

On Sun, 2 Jan 2011 19:10:38 -0600 "Leslie Rhorer" wrote:

>
> OK, I asked this question here before, and I got no answer
> whatsoever. I wasn't too concerned previously, but now that I lost the
> entire array the last time I tried to do a growth, I am truly concerned.
> Would someone please answer my question this time, and perhaps point me
> toward a resolution? The monthly array check just finished on my main
> machine. For many months, this happened at the first of the month and
> completed without issue and with zero mismatches. As of a couple of months
> ago, it started to report large numbers of mismatches. It just completed
> this afternoon with the following:
>
> RebuildFinished /dev/md0 mismatches found: 96614968
>
> Now, 96,000,000 mismatches would seem to be a matter of great
> concern, if you ask me. How can there be any, really, when the entire array
> - all 11T - was re-written just a few weeks ago? How can I find out what
> the nature of these mismatches is, and how can I correct them without
> destroying the data on the array? How can I look to prevent them in the
> future? I take it the monthly checkarray routine (which basically
> implements ` echo check > /sys/block/md0/md/sync_action`) does not attempt
> to fix any errors it finds?
>
> I just recently found out md uses simple parity to try to maintain
> the validity of the data. I had always thought it was ECC. With simple
> parity it can be difficult or even impossible to tell which data member is
> in error, given two conflicting members. Where should I go from here? Can
> I use `echo repair > /sys/block/md0/md/sync_action` with impunity? What,
> exactly, will this do when it comes across a mismatch between one or more
> members?
>
> RAID6 array
> mdadm - v2.6.7.2
> kernel 2.6.26-2-amd64
>

96,000,000 is certainly a big number. It seems to suggest that one of your
devices is returning a lot of bad data to reads.
If this is true, you would expect to get corrupt data when you read from the
array. Do you? Does 'fsck' find any problems?

The problem could be in a drive, or in a cable or in a controller. It is
hard to know which.
I would recommend not writing to the array until you have isolated the
problem as writing can just propagate errors.

Possibly:
shut down array
compute the sha1sum of each device
compute the sha1sum again

If there is any difference, you are closer to the error
If every device reports the same sha1sum, both times, then it is presumably
just one device which has consistent errors.

I would then try assembling the array with all-but-one-drive (use a bitmap so
you can add/remove devices without triggering a recovery) and do a 'check'
for each config and hope that one config (i.e. with one particular device
missing) reports no mismatches. That would point to the missing device being
the problem.

'check' does not correct any mismatches it finds, though if it hits a read
error it will try to correct that.

RAID6 can sometimes determine which device is in error, but that has not been
implemented in md/raid6 yet.

I wouldn't use 'repair' as that could hide the errors rather than fixing
them, and there would be no way back. When it comes across a mismatch it
generates the Parity and the Q block from the data and writes them out. If
the P or Q block were wrong, this is a good fix. If one data block was
wrong, this is bad.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: Mismatches

am 03.01.2011 02:53:00 von Leslie Rhorer

> -----Original Message-----
> From: Mark Knecht [mailto:markknecht@gmail.com]
> Sent: Sunday, January 02, 2011 7:23 PM
> To: lrhorer@satx.rr.com
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Mismatches
>=20
> On Sun, Jan 2, 2011 at 5:10 PM, Leslie Rhorer w=
rote:
> >
> > =A0 =A0 =A0 =A0OK, I asked this question here before, and I got no =
answer
> > whatsoever. =A0I wasn't too concerned previously, but now that I lo=
st the
> > entire array the last time I tried to do a growth, I am truly conce=
rned.
> > Would someone please answer my question this time, and perhaps poin=
t me
> > toward a resolution? =A0The monthly array check just finished on my=
main
> > machine. =A0For many months, this happened at the first of the mont=
h and
> > completed without issue and with zero mismatches. =A0As of a couple=
of
> months
> > ago, it started to report large numbers of mismatches. =A0It just
> completed
> > this afternoon with the following:
> >
> > RebuildFinished /dev/md0 mismatches found: 96614968
> >
> > =A0 =A0 =A0 =A0Now, 96,000,000 mismatches would seem to be a matter=
of great
> > concern, if you ask me. =A0How can there be any, really, when the e=
ntire
> array
> > - all 11T - was re-written just a few weeks ago? =A0How can I find =
out
> what
> > the nature of these mismatches is, and how can I correct them witho=
ut
> > destroying the data on the array? =A0How can I look to prevent them=
in the
> > future? =A0I take it the monthly checkarray routine (which basicall=
y
> > implements ` echo check > /sys/block/md0/md/sync_action`) does not
> attempt
> > to fix any errors it finds?
> >
> > =A0 =A0 =A0 =A0I just recently found out md uses simple parity to t=
ry to
> maintain
> > the validity of the data. =A0I had always thought it was ECC. =A0Wi=
th simple
> > parity it can be difficult or even impossible to tell which data me=
mber
> is
> > in error, given two conflicting members. =A0Where should I go from =
here?
> =A0Can
> > I use `echo repair > /sys/block/md0/md/sync_action` with impunity?
> =A0What,
> > exactly, will this do when it comes across a mismatch between one o=
r
> more
> > members?
> >
> > RAID6 array
> > mdadm - v2.6.7.2
> > kernel 2.6.26-2-amd64
>=20
> What commands are you running? Is it just the sync_action or other
> things in addition?

Well, I haven't run anything at all, yet. The checkarray script
runs once a month as a cron job on all the arrays and reports their hea=
lth.
Until just a few months ago, all 8 arrays on the servers always reporte=
d
complete via e-mail with no reported mismatches. Then a few months ago=
,
some of the arrays started report8ing mismatches. I was of the impress=
ion
the checkarray routine would not ony report, but tryto fix mismatches. =
This
seems to have been incorrect, perhaps. In any case, while trying to gr=
ow
one of the arrays a couple of weeks or so ago, nearly every large file =
on
the main array of the main server was corrupted. A few small files wer=
e
also corrupted. I copied everything back over from the backup array, a=
nd
all seemed well, except that checkarray (during its normal cron run) is
still reporting massive numbers of mismatches on the array.

> It was my understanding when researching RAID6 last week that the two
> parity calculations are actually different. Only one of them is simpl=
e
> parity. The the other is part of an advanced math degree. ;-)
> (According to Wikipedia, not a definitive source for sure...)

Well, that's encouraging, if true. It doesn't explain how big
chunks of data in almost every large file got corrupted, though.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Mismatches

am 03.01.2011 02:58:00 von Mark Knecht

On Sun, Jan 2, 2011 at 5:53 PM, Leslie Rhorer wro=
te:

>
>> It was my understanding when researching RAID6 last week that the tw=
o
>> parity calculations are actually different. Only one of them is simp=
le
>> parity. The the other is part of an advanced math degree. ;-)
>> (According to Wikipedia, not a definitive source for sure...)
>
> Â Â Â Â Well, that's encouraging, if true. Â I=
t doesn't explain how big
> chunks of data in almost every large file got corrupted, though.
>
>

If it's real and not some strange bug then it certainly doesn't.

Neil had some very good responses.

Also, what does

smartctl -a /dev/sdX say for each drive? Any indication at all of
_any_ drive having problems?

- Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: Mismatches

am 03.01.2011 04:12:47 von Leslie Rhorer

> -----Original Message-----
> From: Neil Brown [mailto:neilb@suse.de]
> Sent: Sunday, January 02, 2011 7:36 PM
> To: lrhorer@satx.rr.com
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Mismatches
>
> On Sun, 2 Jan 2011 19:10:38 -0600 "Leslie Rhorer"
> wrote:
>
> >
> > OK, I asked this question here before, and I got no answer
> > whatsoever. I wasn't too concerned previously, but now that I lost the
> > entire array the last time I tried to do a growth, I am truly concerned.
> > Would someone please answer my question this time, and perhaps point me
> > toward a resolution? The monthly array check just finished on my main
> > machine. For many months, this happened at the first of the month and
> > completed without issue and with zero mismatches. As of a couple of
> months
> > ago, it started to report large numbers of mismatches. It just
> completed
> > this afternoon with the following:
> >
> > RebuildFinished /dev/md0 mismatches found: 96614968
> >
> > Now, 96,000,000 mismatches would seem to be a matter of great
> > concern, if you ask me. How can there be any, really, when the entire
> array
> > - all 11T - was re-written just a few weeks ago? How can I find out
> what
> > the nature of these mismatches is, and how can I correct them without
> > destroying the data on the array? How can I look to prevent them in the
> > future? I take it the monthly checkarray routine (which basically
> > implements ` echo check > /sys/block/md0/md/sync_action`) does not
> attempt
> > to fix any errors it finds?
> >
> > I just recently found out md uses simple parity to try to maintain
> > the validity of the data. I had always thought it was ECC. With simple
> > parity it can be difficult or even impossible to tell which data member
> is
> > in error, given two conflicting members. Where should I go from here?
> Can
> > I use `echo repair > /sys/block/md0/md/sync_action` with impunity?
> What,
> > exactly, will this do when it comes across a mismatch between one or
> more
> > members?
> >
> > RAID6 array
> > mdadm - v2.6.7.2
> > kernel 2.6.26-2-amd64
> >
>
> 96,000,000 is certainly a big number. It seems to suggest that one of

No kidding, especially since the data was very recently re-written
to the array. I'm getting reports of errors from more than one array,
however, with no drives in common. What's more, I am not getting errors on
other arrays with every disk in common. Specifically, the main array is an
11T RAID6 array with 14 raw SATA members, all in a single PM enclosure, on 3
different channels, assembled as md0. It's the one with the 96 million
mismatches. OTOH, I have a pair of drives in the main CPU enclosure, one
SATA and one PATA. Each is divided into 3 partitions, and each partition
pair forms a RAID1 array. Thus md1 = sda1 + hda1, md2 = sda2 + hda2, and
md3 = sda1 + hda1. Md1 has no mismatches, although it is also quite small
(411M). Md2 has 128 mismatches, and is 328G. Md3 had 37,632 mismatches,
and is only 171G. What's more, md3 gets very limited use. It is allocated
as swap space, and this server almost never swaps anything of which to
speak. The used swap space is usually under 200KB.

Before I received your reply, I went ahead and turned off swap and
then started the repair on md3 just to see what would happen. Obviously,
the data in the swap area is of no concern once swap is disabled.

> your
> devices is returning a lot of bad data to reads.
> If this is true, you would expect to get corrupt data when you read from
> the
> array. Do you?

Not of which I know, or at least it doesn't seem so, but the growth
from 13 drives to 14 really hacked the data to pieces. The file system
croaked, and after recovery, quite a few files were lost. Most, however, we
still there, but almost every large file was corrupt. The bulk of the data
(in size, not number of files) is video, and when I attempted to play almost
any video, it jumped, stuttered, and splashed confetti across the screen. A
handful of small files were also trashed. I looked at one of them - a flat
text file - and it was complete garbage. I suspect that something like
1/11th of the data was corrupt - being that the array was formerly 11 data +
2 parity. I suppose it could have been 1/12th of the data, since the new
shape is 12 + 2.

> Does 'fsck' find any problems?

It doesn't look like it:
RAID-Server:/etc/default# xfs_repair -n /dev/md0
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
- agno = 32
- agno = 33
- agno = 34
- agno = 35
- agno = 36
- agno = 37
- agno = 38
- agno = 39
- agno = 40
- agno = 41
- agno = 42
- agno = 43
- agno = 44
- agno = 45
- agno = 46
- agno = 47
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
- agno = 32
- agno = 33
- agno = 34
- agno = 35
- agno = 36
- agno = 37
- agno = 38
- agno = 39
- agno = 40
- agno = 41
- agno = 42
- agno = 43
- agno = 44
- agno = 45
- agno = 46
- agno = 47
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

I don't know of a way to check the integrity of the swap area, and
md2 is root, so I would have to take the server down to check it.
>
> The problem could be in a drive, or in a cable or in a controller. It is
> hard to know which.

Not if the problem has a single source. Md1, md2, and md3 do not
share drives, cables, or controllers. Other than the CPU, the memory, and
the southbridge, they don't have anything in common. Of course it is within
the realm of possibility the errors are unrelated, but the fact all of the
arrays which began reporting mismatches did so the very same month is very
suggestive of a single source.

> I would recommend not writing to the array until you have isolated the
> problem as writing can just propagate errors.

I'll limit the writing, but I don't know I can stop entirely. I
wouldn't even be able to write this message without it, as the IMAP server
uses the array.

> Possibly:
> shut down array
> compute the sha1sum of each device
> compute the sha1sum again

Um. I presume you mean `sha1sum /dev/sdX`? Check me if I'm wrong,
but even at 200 MBps, that's going to take 2.5 hours per drive, isn't it?
That's 35 solid hours, and I'll have to restart the process every hour and a
quarter, as it finishes each drive. I'm not entirely sure the drives could
sustain 200 MBps, either. I suppose I could write a script.

> If there is any difference, you are closer to the error
> If every device reports the same sha1sum, both times, then it is
> presumably
> just one device which has consistent errors.
>
> I would then try assembling the array with all-but-one-drive (use a bitmap
> so
> you can add/remove devices without triggering a recovery) and do a 'check'
> for each config and hope that one config (i.e. with one particular device
> missing) reports no mismatches. That would point to the missing device
> being
> the problem.

That's not exactly going to be fast, either.

> 'check' does not correct any mismatches it finds, though if it hits a read
> error it will try to correct that.
>
> RAID6 can sometimes determine which device is in error, but that has not
> been
> implemented in md/raid6 yet.
>
> I wouldn't use 'repair' as that could hide the errors rather than fixing
> them, and there would be no way back. When it comes across a mismatch it
> generates the Parity and the Q block from the data and writes them out.
> If
> the P or Q block were wrong, this is a good fix. If one data block was
> wrong, this is bad.

I see.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: Mismatches

am 03.01.2011 05:03:06 von Leslie Rhorer

> -----Original Message-----
> From: Mark Knecht [mailto:markknecht@gmail.com]
> Sent: Sunday, January 02, 2011 7:58 PM
> To: lrhorer@satx.rr.com
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Mismatches
>=20
> On Sun, Jan 2, 2011 at 5:53 PM, Leslie Rhorer w=
rote:
>
> >
> >> It was my understanding when researching RAID6 last week that the =
two
> >> parity calculations are actually different. Only one of them is si=
mple
> >> parity. The the other is part of an advanced math degree. ;-)
> >> (According to Wikipedia, not a definitive source for sure...)
> >
> > =A0 =A0 =A0 =A0Well, that's encouraging, if true. =A0It doesn't exp=
lain how big
> > chunks of data in almost every large file got corrupted, though.
> >
> >
>=20
> If it's real and not some strange bug then it certainly doesn't.
>=20
> Neil had some very good responses.
>=20
> Also, what does
>=20
> smartctl -a /dev/sdX say for each drive? Any indication at all of
> _any_ drive having problems?

Well, not exactly, no. Some of the drives do have errors, but none
have any huge amount. Most of the errors were accumulated when I had a=
bad
RAID chassis a couple of years ago that caused all sorts of grief. Mos=
t of
the drives have been quiet since then, with the exception of the Hitach=
i
drives. All six Hitachi drives show an event 675 hours (28 days, 3 hou=
rs)
ago. None of the other drives show anything in that time frame, and no=
ne of
the drives at all show anything since. That must have been when the sy=
stem
croaked after starting the RAID reshape. I don't see how that, in and =
of
itself, would cause all those mismatches.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html