Re: Maximizing failed disk replacement on a RAID5 array

Re: Maximizing failed disk replacement on a RAID5 array

am 13.06.2011 07:32:47 von Durval Menezes

Hello Folks,

On Wed, Jun 8, 2011 at 10:21 AM, Brad Campbell
wrote:
>
> Best of luck, and let us know how you get on.

Just finished the process here. To summarize, seems I've got my array
back in a stable state.

What I did:

1) Got a good backup of all the data in the array (using "tar") to
   removable HDs, verified it (using md5sum), and then stored these
   HDs safely offline;

2) Unmounted the filesystem in the array;

3) inserted the replacement disk on a USB dock, partitioned it,
   then added it to the array ("mdadm --add");
  =A0 -> Verified (via "mdadm --detail") that the replacement disk =
was
=A0 =A0 =A0 =A0listed on the array as a "spare";

4) failed the bad disk in the array ("mdadm --fail")
   -> At that point, the array immediatelly started to resync into =
the
    =A0 replacement disk;

5) Monitored the resync process via "cat /proc/mdstat": it took
   roughly 11 hours (I guess because transfer speed to the replacem=
ent
   disk was limited by the USB ~40MB/s speed limit), but it signale=
d
   no errors;

6) Verified that the array was really synced ("mdadm --detail") and
   that there were indeed no errors during the resync (less
   /var/log/messages);

7) removed the bad disk logically from the array ("mdadm --remove");

8) shut down the machine (init 0);

9) removed the bad disk physically from the machine, ejected the
   replacement disk from the USB dock, and then installed the
   replacement disk inside the machine;

10) turned the system on: the OS booted, assembled the array and
  =A0 mounted the filesystem in it with no issues;

11) checked (using "md5sum -c" on the md5sum files generated during
  =A0 pass#1 above) that all that ON THE ARRAY was indeed correct, =
so
  =A0 in the end I didn't need to restore anything from backup.

Thanks for all the help, folks, and I pray we have the "hot-replace"
functionality implemented soon... it will make for much sounder sleep
the next time one of my disks fails... :-)

Cheers,
--
=A0 Durval Menezes.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html