Re: Maximizing failed disk replacement on a RAID5 array

am 06.06.2011 20:06:23 von Durval Menezes

Hello Brad, Drew,

Thanks for reminding me of the hammering a RAID level conversion would=A0=
cause.
This is certainly a major=A0 reason to avoid the RAID5->RAID6->RAID5 ro=
ute.

The "repair" has been running here for a few days already, with the
server online, and ought to finish in 24 more hours. So far (thanks to
the automatic rewrite relocation) the number of=A0 uncorrectable sector=
s
being reported by SMART has dropped from 40 to 20 , so it seems the
repair is=A0 doing its job. Lets just hope the disk has enough=A0 spare
sectors=A0 to remap all the bad sectors; if it does, a simple "dd "from
the bad disk to=A0 its replacement ought to=A0 do the job=A0 (as you ha=
ve
indicated).

On the other hand, as this "dd" has to be done with the array offline,
it will entail in some downtime (although not as much as having to
restore the whole array from backups).... not ideal, but not too bad
either.

In case worst comes to worst, I have an up-to-date offline backup of
the contents of the whole array, so if something really bad happens, I
have something to restore from.

It would be great to have a
"duplicate-this-bad-old-disk-into-this-shiny-new-disk"=A0 functionality=
,
as it would enable=A0 an almost-no-downtime disk replacement with
minimum=A0 risk, but it seems we can't have everything... :-0 Maybe it'=
s
something for the wishlist?

About mishaps with "dd", I think everyone=A0 who ever dealt with a
system=A0 (not just Linux)=A0 on the level we do has sometime gone thro=
ugh
something similar... the last time I remember doing this was many
years ago, before=A0 Linux existed, when me and a few friends spent a
wonderful night installing=A0 William Jolitz ' then-new 386/BSD=A0 on a=
HD
(a process which *required*=A0 dd)=A0 and trashing its Windows partiti=
ons
(which contained the only copy of the graduation thesis of one of us,
due in a few days).

Thanks for all the help,
--
Durval Menezes.

On Mon, Jun 6, 2011 at 12:54 PM, Brad Campbell w=
rote:
>
> On 06/06/11 23:37, Drew wrote:
>>>
>>> Now, if I'm off the wall and missing something blindingly obvious f=
eel free
>>> to thump me with a clue bat (it would not be the first time).
>>>
>>> I've lost 2 arrays recently. 8TB to a dodgy controller (thanks SIL)=
, and 2TB
>>> to complete idiocy on my part, so I know the sting of lost or corru=
pted
>>> data.
>>
>> I think you've covered the process in more detail, including pitfall=
s,
>> then I have. :-) Only catch is where would you find a cheap 2-3TB
>> drive right now?
>
> I bought 10 recently for about $90 each. It's all relative, but I con=
sider ~$45 / TB cheap.
>
>> I also know the sting of mixing stupidity and dd. ;-) A friend was
>> helping me do some complex rework with dd on one of my disks. Being
>> the n00b I followed his instructions exactly, and him being the expe=
rt
>> (and assuming I wasn't the n00b I was back then) didn't double check
>> my work. Net result was I backed the MBR/Partition Table up using dd=
,
>> but did so to a partition on the drive we were working on. There may
>> have been some alcohol involved (I was in University), the revised
>> data we inserted failed, and next thing you know I'm running Partiti=
on
>> Magic (the gnu tools circa 2005 failed to detect anything) to try an=
d
>> recover the partition table. No backups obviously. ;-)
>
> Similar to my
>
> dd if=3D/dev/zero of=3D/dev/sdb bs=3D1M count=3D100
>
> except instead of the target disk, it was to a raid array member that=
was currently active. To its credit, ext3 and fsck managed to give me =
most of my data back, even if I had to spend months intermittently sort=
ing/renaming inode numbers from lost+found into files and directories.
>
> I'd like to claim Alcohol as a mitigating factor (hell, it gets peopl=
e off charges in our court system all the time) but unfortunately I was=
just stupid.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Maximizing failed disk replacement on a RAID5 array

am 07.06.2011 10:52:55 von John Robinson

On 06/06/2011 19:06, Durval Menezes wrote:
[...]
> It would be great to have a
> "duplicate-this-bad-old-disk-into-this-shiny-new-disk" functionality,
> as it would enable an almost-no-downtime disk replacement with
> minimum risk, but it seems we can't have everything... :-0 Maybe it's
> something for the wishlist?

It's already on the wishlist, described as a hot replace.

Cheers,

John.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Maximizing failed disk replacement on a RAID5 array

am 10.06.2011 12:25:27 von John Robinson

On 07/06/2011 09:52, John Robinson wrote:
> On 06/06/2011 19:06, Durval Menezes wrote:
> [...]
>> It would be great to have a
>> "duplicate-this-bad-old-disk-into-this-shiny-new-disk" functionality,
>> as it would enable an almost-no-downtime disk replacement with
>> minimum risk, but it seems we can't have everything... :-0 Maybe it's
>> something for the wishlist?
>
> It's already on the wishlist, described as a hot replace.

Actually I've been thinking about this. I think I'd rather the hot
replace functionality did a normal rebuild from the still-good drives,
and only if it came across a read error from those would it attempt to
refer to the contents of the known-to-be-failing drive (and then also
attempt to repair the read error on the supposedly-still-good drive that
gave a read error, as already happens).

My rationale for this is as follows: if we want to hot-replace a drive
that's known to be failing, we should trust it less than the remaining
still-good drives, and treat it with kid gloves. It may be suffering
from bit-rot. We'd rather not hit all the bad sectors on the failing
drive, because each time we do that we send the drive into 7 seconds (or
more, for cheap drives without TLER) of re-reading, plus any Linux-level
re-reading there might be. Further, making the known-to-be-failing drive
work extra hard (doing the equivalent of dd'ing from it while also still
using it to serve its contents as an array member) might make it die
completely before we've finished.

What will this do for rebuild time? Well, I don't think it'll be any
slower. On the one hand, you'd think that copying from one drive to
another would be faster than a rebuild, because you're only reading 1
drive instead of N-1, but on the other, your array is going to run
slowly (pretty much degraded speed) anyway because you're keeping one
drive in constant use reading from it, and you risk it becoming much,
much slower if you do run in to hundreds or thousands of read errors on
the failing drive.

So overall I think hot-replace should be a normal replace with a
possible second source of data/parity.

Thoughts?

Yes, I know, -ENOPATCH

Cheers,

John.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html