Doing "echo repair > /sys/devices/virtual/block/md?/md/sync_action"does not result in mismatch_cn
Doing "echo repair > /sys/devices/virtual/block/md?/md/sync_action"does not result in mismatch_cn
am 15.03.2011 12:30:59 von Bas van Schaik
All,
I'm seeing a non-zero mismatch_cnt (in fact, it's 1704) on my RAID-6
array consisting of 8 devices on kernel 2.6.38. After replacing some
hardware, I decided to trigger a MD repair by issuing:
echo repair > /sys/devices/virtual/block/md5/md/sync_action
Directly after issuing this command, the mismatch_cnt is reset to 0 and
MD starts checking the array. However, the mismatch_cnt increases during
this check - resulting in exactly the same count as seen before.
Shouldn't 'repair' yield a zero mismatch_cnt? I think I have seen
'repair' work on other RAID-6 arrays?
Furthermore, theoretically it should be possible to indicate which
device in the RAID-6 array contains the inconsistent data, or am I
mistaking? If so, that would certainly be a nice feature to see
implemented, as it would help diagnosing problems.
Please let me know your thoughts, as I'm quite keen to get my
mismatch_cnt back to 0 in order to see whether the new hardware works
properly!
Thanks,
Bas
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Doing "echo repair >/sys/devices/virtual/block/md?/md/sync_action" does not result inmismatch
am 15.03.2011 13:13:51 von Robin Hill
--wRRV7LY7NUeQGEoC
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Tue Mar 15, 2011 at 11:30:59AM +0000, Bas van Schaik wrote:
> All,
>=20
> I'm seeing a non-zero mismatch_cnt (in fact, it's 1704) on my RAID-6
> array consisting of 8 devices on kernel 2.6.38. After replacing some
> hardware, I decided to trigger a MD repair by issuing:
> echo repair > /sys/devices/virtual/block/md5/md/sync_action
>=20
> Directly after issuing this command, the mismatch_cnt is reset to 0 and
> MD starts checking the array. However, the mismatch_cnt increases during
> this check - resulting in exactly the same count as seen before.
> Shouldn't 'repair' yield a zero mismatch_cnt? I think I have seen
> 'repair' work on other RAID-6 arrays?
>=20
The mismatch_cnt is incremented during repair to indicate how many
errors were repaired. If you want to be certain though, you'd need to
re-run 'check' afterwards.
Cheers,
Robin
--=20
___ =20
( ' } | Robin Hill |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |
--wRRV7LY7NUeQGEoC
Content-Type: application/pgp-signature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
iEYEARECAAYFAk1/V/4ACgkQShxCyD40xBJQrACgklZxkXCpcEB/nWriO7bj bfYx
iIkAoNnwYdPDvgNRpnBPCxpkqW7VF1om
=kmif
-----END PGP SIGNATURE-----
--wRRV7LY7NUeQGEoC--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Doing "echo repair > /sys/devices/virtual/block/md?/md/sync_action"does not result in mismatc
am 15.03.2011 14:43:01 von Bas van Schaik
On 15/03/11 12:13, Robin Hill wrote:
> On Tue Mar 15, 2011 at 11:30:59AM +0000, Bas van Schaik wrote:
>> All,
>>
>> I'm seeing a non-zero mismatch_cnt (in fact, it's 1704) on my RAID-6
>> array consisting of 8 devices on kernel 2.6.38. After replacing some
>> hardware, I decided to trigger a MD repair by issuing:
>> echo repair > /sys/devices/virtual/block/md5/md/sync_action
>>
>> Directly after issuing this command, the mismatch_cnt is reset to 0 and
>> MD starts checking the array. However, the mismatch_cnt increases during
>> this check - resulting in exactly the same count as seen before.
>> Shouldn't 'repair' yield a zero mismatch_cnt? I think I have seen
>> 'repair' work on other RAID-6 arrays?
>>
> The mismatch_cnt is incremented during repair to indicate how many
> errors were repaired. If you want to be certain though, you'd need to
> re-run 'check' afterwards.
Sorry about that - I was sure the mismatch_cnt was reset after a repair
on a different machine, but apparently I was wrong. The 'check' is
running right now, I hope you are right! If not, of course I'll let you
know.
My other question is still standing:
> Furthermore, theoretically it should be possible to indicate which
> device in the RAID-6 array contains the inconsistent data, or am I
> mistaking? If so, that would certainly be a nice feature to see
> implemented, as it would help diagnosing problems.
Am I indeed correct in thinking this?
Thanks,
Bas
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Doing "echo repair >/sys/devices/virtual/block/md?/md/sync_action" does not result inmismatch
am 15.03.2011 15:13:42 von Robin Hill
--mxv5cy4qt+RJ9ypb
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Tue Mar 15, 2011 at 01:43:01PM +0000, Bas van Schaik wrote:
> My other question is still standing:
> > Furthermore, theoretically it should be possible to indicate which
> > device in the RAID-6 array contains the inconsistent data, or am I
> > mistaking? If so, that would certainly be a nice feature to see
> > implemented, as it would help diagnosing problems.
> Am I indeed correct in thinking this?
>=20
I'm not sure. If it's a single data block that's failed then you should
be able to, for each disk, re-generate the data using the other disks
and the P parity, then validate against the Q parity (if it matches then
that disk is the incorrect one). You should also be able to detect
errors in either the P or Q parity (if one is valid for the data and the
other isn't). If there's multiple disks which are incorrect then I
don't think there's any way you can tell which (or even avoid having one
of the correct disks flagged as incorrect).
Cheers,
Robin
--=20
___ =20
( ' } | Robin Hill |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |
--mxv5cy4qt+RJ9ypb
Content-Type: application/pgp-signature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
iEYEARECAAYFAk1/dBUACgkQShxCyD40xBL1WgCgglJseN25pr3wRU5tIsqG ktnB
UCUAoKIDOwJBistgAdb2ivvWX1WHVeOR
=9px6
-----END PGP SIGNATURE-----
--mxv5cy4qt+RJ9ypb--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Doing "echo repair > /sys/devices/virtual/block/md?/md/sync_action"does not result in mismatc
am 02.04.2011 00:44:37 von Bas van Schaik
On 03/15/2011 02:13 PM, Robin Hill wrote:
> On Tue Mar 15, 2011 at 01:43:01PM +0000, Bas van Schaik wrote
>> My other question is still standing:
>>> Furthermore, theoretically it should be possible to indicate which
>>> device in the RAID-6 array contains the inconsistent data, or am I
>>> mistaking? If so, that would certainly be a nice feature to see
>>> implemented, as it would help diagnosing problems.
>> Am I indeed correct in thinking this?
> I'm not sure. If it's a single data block that's failed then you should
> be able to, for each disk, re-generate the data using the other disks
> and the P parity, then validate against the Q parity (if it matches then
> that disk is the incorrect one). You should also be able to detect
> errors in either the P or Q parity (if one is valid for the data and the
> other isn't). If there's multiple disks which are incorrect then I
> don't think there's any way you can tell which (or even avoid having one
> of the correct disks flagged as incorrect).
Indeed, that is what I was thinking. As I've just discovered some new
block mismatches (that's 2 weeks after the last repair!) on my 8x2TB
RAID6 array, it would be really nice to see this feature implemented...
I would be happy to contribute, but I am not very experienced in hacking
kernel C.
Any tips, tricks and/or suggestions anyone?
Cheers,
Bas
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Doing "echo repair > /sys/devices/virtual/block/md?/md/sync_action"does not result in mismatc
am 02.04.2011 01:48:07 von Rory Jaffe
I had the same question and ended up looking at the source. The kernel
documentation was maddeningly vague about this.
/drivers/md/raid5.c (which handles both 5 and 6), has, in procedure
handle_parity_checks5 and handle_parity_checks6 similar comments:
/* handle a successful check operation, if parity is correct
* we are done. Otherwise update the mismatch count and repair
* parity if !MD_RECOVERY_CHECK
*/
and the program logic does just that--update the count, then check for
the flag, and repair if the flag isn't set.
And in /drivers/md/md.c the section that parses the command has the fol=
lowing:
if (cmd_match(page, "check"))
set_bit(MD_RECOVERY_CHECK, &mddev->recovery);
else if (!cmd_match(page, "repair"))
return -EINVAL;
set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
set_bit(MD_RECOVERY_SYNC, &mddev->recovery);
So it looks like the only difference between check and repair is the
MD_RECOVERY_CHECK flag, which is set for check only.
On Fri, Apr 1, 2011 at 3:44 PM, Bas van Schaik wrote:
> On 03/15/2011 02:13 PM, Robin Hill wrote:
>> On Tue Mar 15, 2011 at 01:43:01PM +0000, Bas van Schaik wrote
>>> My other question is still standing:
>>>> Furthermore, theoretically it should be possible to indicate which
>>>> device in the RAID-6 array contains the inconsistent data, or am I
>>>> mistaking? If so, that would certainly be a nice feature to see
>>>> implemented, as it would help diagnosing problems.
>>> Am I indeed correct in thinking this?
>> I'm not sure. If it's a single data block that's failed then you sho=
uld
>> be able to, for each disk, re-generate the data using the other disk=
s
>> and the P parity, then validate against the Q parity (if it matches =
then
>> that disk is the incorrect one). You should also be able to detect
>> errors in either the P or Q parity (if one is valid for the data and=
the
>> other isn't). Â If there's multiple disks which are incorrect th=
en I
>> don't think there's any way you can tell which (or even avoid having=
one
>> of the correct disks flagged as incorrect).
> Indeed, that is what I was thinking. As I've just discovered some new
> block mismatches (that's 2 weeks after the last repair!) on my 8x2TB
> RAID6 array, it would be really nice to see this feature implemented.=
.
> I would be happy to contribute, but I am not very experienced in hack=
ing
> kernel C.
>
> Any tips, tricks and/or suggestions anyone?
>
> Cheers,
>
> Â Bas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.ht=
ml
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html