Re: Maximizing failed disk replacement on a RAID5 array
am 13.06.2011 07:32:47 von Durval MenezesHello Folks,
On Wed, Jun 8, 2011 at 10:21 AM, Brad Campbell
>
> Best of luck, and let us know how you get on.
Just finished the process here. To summarize, seems I've got my array
back in a stable state.
What I did:
1) Got a good backup of all the data in the array (using "tar") to
removable HDs, verified it (using md5sum), and then stored these
HDs safely offline;
2) Unmounted the filesystem in the array;
3) inserted the replacement disk on a USB dock, partitioned it,
then added it to the array ("mdadm --add");
=A0 -> Verified (via "mdadm --detail") that the replacement disk =
was
=A0 =A0 =A0 =A0listed on the array as a "spare";
4) failed the bad disk in the array ("mdadm --fail")
-> At that point, the array immediatelly started to resync into =
the
=A0 replacement disk;
5) Monitored the resync process via "cat /proc/mdstat": it took
roughly 11 hours (I guess because transfer speed to the replacem=
ent
disk was limited by the USB ~40MB/s speed limit), but it signale=
d
no errors;
6) Verified that the array was really synced ("mdadm --detail") and
that there were indeed no errors during the resync (less
/var/log/messages);
7) removed the bad disk logically from the array ("mdadm --remove");
8) shut down the machine (init 0);
9) removed the bad disk physically from the machine, ejected the
replacement disk from the USB dock, and then installed the
replacement disk inside the machine;
10) turned the system on: the OS booted, assembled the array and
=A0 mounted the filesystem in it with no issues;
11) checked (using "md5sum -c" on the md5sum files generated during
=A0 pass#1 above) that all that ON THE ARRAY was indeed correct, =
so
=A0 in the end I didn't need to restore anything from backup.
Thanks for all the help, folks, and I pray we have the "hot-replace"
functionality implemented soon... it will make for much sounder sleep
the next time one of my disks fails... :-)
Cheers,
--
=A0 Durval Menezes.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html