[PATCH 0/3] count rdev->corrected_errors after final re-read

am 28.06.2011 18:47:23 von Namhyung Kim

Hello,

This patchset tries fix rdev->corrected_errors counting. Read errors
are considered to corrected if write-back and re-read cycle is finished
without further problems. Thus moving the rdev->corrected_errors
counting after the re-reading looks more reasonable IMHO.

For resync case in RAID10, rdev->corrected_errors are increased right
after the first read failure. This seems need an improvement though
it's not handled in this series.

The patches are against 'for-next' branch on git://neil.brown.name/md.

Thanks.

Namhyung Kim (3):
md/raid1: move rdev->corrected_errors counting
md/raid5: move rdev->corrected_errors counting
md/raid10: move rdev->corrected_errors counting

drivers/md/raid1.c | 17 ++++++-----------
drivers/md/raid10.c | 2 +-
drivers/md/raid5.c | 5 +----
3 files changed, 8 insertions(+), 16 deletions(-)

--
1.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] md/raid10: move rdev->corrected_errors counting

am 14.07.2011 07:54:57 von NeilBrown

On Wed, 29 Jun 2011 01:47:26 +0900 Namhyung Kim wrote:

> Read errors are considered to corrected if write-back and re-read
> cycle is finished without further problems. Thus moving the rdev->
> corrected_errors counting after the re-reading looks more reasonable
> IMHO.
>
> Signed-off-by: Namhyung Kim
> ---
> For end_sync_read(), I'm not sure what should I do. But I think, at
> least, the counting should be moved to end_sync_write() and that
> might require additional field(s) in r10_bio. So I leave it as is
> for simplicity now. How do you think?

I don't think it really matters.
If the read error doesn't get corrected, then the device will be failed
and the 'corrected_errors' count won't mean anything any more...

Maybe we should just change the field to 'read_errors' and adjust it whenever
we get a read error. If the device is still in the array, then the error
must have been corrected, if not, it count will be ignored.

There is a corner case where we don't evict the last mirror from an array due
to a read error - we would need to handle that some how.

But otherwise I don't think this is an issue.

I have applied all three patches because they do make the code a bit cleaner
and clear, but I don't think we really need to worry further about RAID10.

Thanks,

NeilBrown

>
> drivers/md/raid10.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 8628a62a02f0..0660bc9597d8 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1524,7 +1524,6 @@ static void fix_read_error(conf_t *conf, mddev_t *mddev, r10bio_t *r10_bio)
> test_bit(In_sync, &rdev->flags)) {
> atomic_inc(&rdev->nr_pending);
> rcu_read_unlock();
> - atomic_add(s, &rdev->corrected_errors);
> if (sync_page_io(rdev,
> r10_bio->devs[sl].addr +
> sect,
> @@ -1589,6 +1588,7 @@ static void fix_read_error(conf_t *conf, mddev_t *mddev, r10bio_t *r10_bio)
> (unsigned long long)(
> sect + rdev->data_offset),
> bdevname(rdev->bdev, b));
> + atomic_add(s, &rdev->corrected_errors);
> }
>
> rdev_dec_pending(rdev, mddev);

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html