Re: Maximizing failed disk replacement on a RAID5 array

am 08.06.2011 09:47:11 von Durval Menezes

Hello Brad,

On Wed, Jun 8, 2011 at 4:32 AM, Brad Campbell wrote:
> On 08/06/11 14:58, Durval Menezes wrote:
>
>> 1) can I simply skip over these sectors (using dd_rescue or multiple
>> dd invocations) when off-line copying the old disk to the new one,
>> trusting the RAID5 to reconstruct the data correctly from the other 2
>
> Noooooooooooo. As we stated early on, it you do that md will have no idea
> that the data missing is actually missing as the drive won't return a read
> error.

Even if a "repair" (echo "repair" >/sys/block/md1/md/sync_status,
checking progress with "cat /proc/mdstat" and completion with "tail -f
/var/log/messages | grep md" ) finishes with no errors?

> does a repair take long on your machine? I find that a few repair runs
> generally gets me enough re-writes to clear the dud sectors and allow an
> offline clone.

I'm sorry if I did not make myself clear; I've already run both a
"repair" on the RAID (see above) and a "smart -t long" on the
particular disk... I had about 40 bad sectors before, and now have
just 4, but these 4 sectors persist as being marked in error... I
think the "RAID repair" didn't touch them.

Cheers,
--
Durval.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Maximizing failed disk replacement on a RAID5 array

am 08.06.2011 09:57:13 von Brad Campbell

On 08/06/11 15:47, Durval Menezes wrote:

> I'm sorry if I did not make myself clear; I've already run both a
> "repair" on the RAID (see above) and a "smart -t long" on the
> particular disk... I had about 40 bad sectors before, and now have
> just 4, but these 4 sectors persist as being marked in error... I
> think the "RAID repair" didn't touch them.

Apologies, I obviously missed that fact.

I think your best course of action in this case is to test both the
other drives with SMART long checks and fail/replace the faulty one.

I've never had md not report a repaired sector when performing a repair
operation.

I'll just re-iterate, if you take the bad sectors away without a good
copy of the data on them, md won't know it is supposed to reconstruct
those missing sectors.

Hrm.. *or*, and this is a big *or* you could use hdparm to create
correctable bad sectors on the copy at the appropriate LBA's, and md
should do the right thing as it will get read errors from those, which
will go away when they are re-written.

I'd not thought of that before, but it should do the trick.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html