Fwd: Maximizing failed disk replacement on a RAID5 array

am 05.06.2011 16:22:22 von Durval Menezes

Hello folks,

A few days ago, the smartd daemon running on my Lucid system at home
(kernel 2.6.32-32-generic, mdadm 2.6.7.1) has started warning me about
a few (less than 50 so far) offline uncorrectable and other errors on
one of my 1.5TB HDs three-disk RAID5 array. This failing HD is still
online (ie, hasn't been kicked off the array), at least for now.

I have another disk ready for replacement, and I'm trying to determine
the safer (not necessarily the simpler) way of proceeding.

I understand that, if I do it the "standard" way (ie, power down the
system, remove the failing disk, add the replacement disk, then boot
up and use "mdadm --add" to add the new disk to the array) I run the
risk of running into unreadable sectors on one of the other two disks,
and then my RAID5 is kaput.

What I would really like to do is to be able to add the new HD to the
array WITHOUT removing the failing HD, somehow sync it with the rest,
and THEN remove the failing HD: that way, an eventual failed read from
one of the two other HDs could possibly be satisfied from the failing
HD (unless EXACTLY that same sector is also unreadable on it, which I
find unlikely), and so avoid losing the whole array in the above case.

So far, the only way I've been able to figure to do that would be to
convert the=A0 array from RAID5 to RAID6, add the new disk, wait for th=
e
array to sync, remove the failing disk, and then convert the array
back from RAID6 to RAID5 (and I'm not really sure that this is a good
idea, or even doable).

So, folks, what do you say? Is there a better way? Any gotchas in the
RAID5->RAID6->RAID6 approach?

Thanks,
--
Durval Menezes.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Maximizing failed disk replacement on a RAID5 array

am 06.06.2011 17:02:48 von Drew

> I understand that, if I do it the "standard" way (ie, power down the
> system, remove the failing disk, add the replacement disk, then boot
> up and use "mdadm --add" to add the new disk to the array) I run the
> risk of running into unreadable sectors on one of the other two disks,
> and then my RAID5 is kaput.
>
> What I would really like to do is to be able to add the new HD to the
> array WITHOUT removing the failing HD, somehow sync it with the rest,
> and THEN remove the failing HD: that way, an eventual failed read from
> one of the two other HDs could possibly be satisfied from the failing
> HD (unless EXACTLY that same sector is also unreadable on it, which I
> find unlikely), and so avoid losing the whole array in the above case.

A reshape from RAID5 -> RAID6 -> RAID5 will hammer your disks so if
either of the other two are ready to die, this will most likely tip
them over the edge.

A far simpler way would be to take the array offline, dd (or
dd_rescue) the old drive's contents onto the new disk, pull the old
disk, and restart the array with the new drive in it's place. With
luck you won't need a resync *and* you're not hammering the other two
drives in the process.

--
Drew

"Nothing in life is to be feared. It is only to be understood."
--Marie Curie

"This started out as a hobby and spun horribly out of control."
-Unknown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Maximizing failed disk replacement on a RAID5 array

am 06.06.2011 17:20:05 von Brad Campbell

On 06/06/11 23:02, Drew wrote:
>> I understand that, if I do it the "standard" way (ie, power down the
>> system, remove the failing disk, add the replacement disk, then boot
>> up and use "mdadm --add" to add the new disk to the array) I run the
>> risk of running into unreadable sectors on one of the other two disks,
>> and then my RAID5 is kaput.
>>
>> What I would really like to do is to be able to add the new HD to the
>> array WITHOUT removing the failing HD, somehow sync it with the rest,
>> and THEN remove the failing HD: that way, an eventual failed read from
>> one of the two other HDs could possibly be satisfied from the failing
>> HD (unless EXACTLY that same sector is also unreadable on it, which I
>> find unlikely), and so avoid losing the whole array in the above case.
> A reshape from RAID5 -> RAID6 -> RAID5 will hammer your disks so if
> either of the other two are ready to die, this will most likely tip
> them over the edge.
>
> A far simpler way would be to take the array offline, dd (or
> dd_rescue) the old drive's contents onto the new disk, pull the old
> disk, and restart the array with the new drive in it's place. With
> luck you won't need a resync *and* you're not hammering the other two
> drives in the process.

Bear with me, I've had a few scotches and this might not be as coherent as it might be, but I think
I spot a very, very fatal flaw in your plan.

I thought this initially also, except it blows up in the scenario where the dud sectors are data and
not parity.

If you do it the way you suggest and choose dd_rescue in place of dd, dodgy data from the dud
sectors will be replicated as kosher sectors on the replacement disk (or zero, or random or whatever)

If you execute a "repair" first, it will strike the dud sectors, see they are toast, re-calculate
them from parity and write them back forcing a reallocation.

You can then replicate the failing disk using "dd", *not* dd_rescue. If dd fails due to a read error
then you know that part of your data is likely to be toast on the replaced disk, and you can go
about making provisions for a backup/restore operation using the original disk (which will likely
succeed as the data read from the array will be re-built from parity where required).

dd_rescue is a blessing and a curse. It's _very_ good at getting you access to data that you have no
backup of, and you have no other way of getting back. On the other hand, it will happily go and
replicate whatever trash it happens to get back from the source disk, or skip those sectors and
leave you with an incomplete copy that will leave no trace of it being incomplete until you find
chunks missing (like superblocks or your formula for a zero cost petroleum replacement).

If your array works, but has a badly failing drive you are far better to buy some cheap 2TB disks
and back it up, then restore it onto a re-created array than chance losing chunks of data by using a
dd_rescue'd clone disk.

Now, if I'm off the wall and missing something blindingly obvious feel free to thump me with a clue
bat (it would not be the first time).

I've lost 2 arrays recently. 8TB to a dodgy controller (thanks SIL), and 2TB to complete idiocy on
my part, so I know the sting of lost or corrupted data.

Brad
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Maximizing failed disk replacement on a RAID5 array

am 06.06.2011 17:37:03 von Drew

> Now, if I'm off the wall and missing something blindingly obvious feel free
> to thump me with a clue bat (it would not be the first time).
>
> I've lost 2 arrays recently. 8TB to a dodgy controller (thanks SIL), and 2TB
> to complete idiocy on my part, so I know the sting of lost or corrupted
> data.

I think you've covered the process in more detail, including pitfalls,
then I have. :-) Only catch is where would you find a cheap 2-3TB
drive right now?

I also know the sting of mixing stupidity and dd. ;-) A friend was
helping me do some complex rework with dd on one of my disks. Being
the n00b I followed his instructions exactly, and him being the expert
(and assuming I wasn't the n00b I was back then) didn't double check
my work. Net result was I backed the MBR/Partition Table up using dd,
but did so to a partition on the drive we were working on. There may
have been some alcohol involved (I was in University), the revised
data we inserted failed, and next thing you know I'm running Partition
Magic (the gnu tools circa 2005 failed to detect anything) to try and
recover the partition table. No backups obviously. ;-)

--
Drew

"Nothing in life is to be feared. It is only to be understood."
--Marie Curie
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Maximizing failed disk replacement on a RAID5 array

am 06.06.2011 17:54:41 von Brad Campbell

On 06/06/11 23:37, Drew wrote:
>> Now, if I'm off the wall and missing something blindingly obvious feel free
>> to thump me with a clue bat (it would not be the first time).
>>
>> I've lost 2 arrays recently. 8TB to a dodgy controller (thanks SIL), and 2TB
>> to complete idiocy on my part, so I know the sting of lost or corrupted
>> data.
> I think you've covered the process in more detail, including pitfalls,
> then I have. :-) Only catch is where would you find a cheap 2-3TB
> drive right now?

I bought 10 recently for about $90 each. It's all relative, but I consider ~$45 / TB cheap.

> I also know the sting of mixing stupidity and dd. ;-) A friend was
> helping me do some complex rework with dd on one of my disks. Being
> the n00b I followed his instructions exactly, and him being the expert
> (and assuming I wasn't the n00b I was back then) didn't double check
> my work. Net result was I backed the MBR/Partition Table up using dd,
> but did so to a partition on the drive we were working on. There may
> have been some alcohol involved (I was in University), the revised
> data we inserted failed, and next thing you know I'm running Partition
> Magic (the gnu tools circa 2005 failed to detect anything) to try and
> recover the partition table. No backups obviously. ;-)

Similar to my

dd if=/dev/zero of=/dev/sdb bs=1M count=100

except instead of the target disk, it was to a raid array member that was currently active. To its
credit, ext3 and fsck managed to give me most of my data back, even if I had to spend months
intermittently sorting/renaming inode numbers from lost+found into files and directories.

I'd like to claim Alcohol as a mitigating factor (hell, it gets people off charges in our court
system all the time) but unfortunately I was just stupid.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html