failing a drive while RAID5 is initializing

am 23.02.2011 17:20:34 von Iordan Iordanov

Hi guys,

I just wanted to make sure that the behaviour I observed is as expected.
With kernel 2.6.35.11, under Debian Lenny, I created a RAID5 array with
5 drives, partitioned it, formatted a partition with ext3 and mounted
it. Then, I put some load onto the filesystem with:

dd if=/dev/urandom of=/mnt/testfile

The array started initializing. At that point, I needed to fail and
replace a drive for some unrelated testing, so I did that with:

mdadm /dev/md0 -f /dev/sdc

The result was a broken filesystem which was remounted read-only, and a
bunch of errors in dmesg. Theoretically, one would imagine that failing
a drive on a RAID5 even during initialization should render the array
without redundancy but workable. Am I wrong? Is there something special
about the initialization stage of RAID5 that makes drive failure fatal
during the initialization? If not, then I have a bug to report and I'll
try to reproduce it for you.

If initialisation is special, does that mean that when creating RAID5,
it is advisable to *wait* until the array has fully initialized before
using it, otherwise one risks losing any data that was put onto the
array during the initialization phase if a drive fails at that point?

Many thanks for any input,
Iordan Iordanov
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: failing a drive while RAID5 is initializing

am 23.02.2011 17:33:54 von Robin Hill

--UugvWAfsgieZRqgk
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed Feb 23, 2011 at 11:20:34AM -0500, Iordan Iordanov wrote:

> Hi guys,
>=20
> I just wanted to make sure that the behaviour I observed is as expected.=
=20
> With kernel 2.6.35.11, under Debian Lenny, I created a RAID5 array with=
=20
> 5 drives, partitioned it, formatted a partition with ext3 and mounted=20
> it. Then, I put some load onto the filesystem with:
>=20
> dd if=3D/dev/urandom of=3D/mnt/testfile
>=20
> The array started initializing. At that point, I needed to fail and=20
> replace a drive for some unrelated testing, so I did that with:
>=20
> mdadm /dev/md0 -f /dev/sdc
>=20
> The result was a broken filesystem which was remounted read-only, and a=
=20
> bunch of errors in dmesg. Theoretically, one would imagine that failing=
=20
> a drive on a RAID5 even during initialization should render the array=20
> without redundancy but workable. Am I wrong? Is there something special=
=20
> about the initialization stage of RAID5 that makes drive failure fatal=20
> during the initialization? If not, then I have a bug to report and I'll=
=20
> try to reproduce it for you.
>=20
The array is created in a degraded state, then recovered onto the final
disk. Pulling any disk bar the final one will results in a broken array
and lost data, until this recovery is completed.

> If initialisation is special, does that mean that when creating RAID5,=20
> it is advisable to *wait* until the array has fully initialized before=20
> using it, otherwise one risks losing any data that was put onto the=20
> array during the initialization phase if a drive fails at that point?
>=20
That depends. It's advisable not to use it for critical, non-backed up
data (the same as it is during a recovery following a drive failure).
The alternative is to wait until the array is fully initialised before
making it available to the user (which could take a considerable amount
of time) or manually zeroing all the drives and then creating the array
using --assume-clean (in which case the zeroing means the parity data is
correct by default).

Some of the features on Neil's current roadmap should allow for "lazy
initialisation" where the recovery data is added only as the drive is
written to, which should mean the array is available immediately but
still retains full recoverability.

Cheers,
Robin
--=20
___ =20
( ' } | Robin Hill |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |

--UugvWAfsgieZRqgk
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iEYEARECAAYFAk1lNvEACgkQShxCyD40xBKAnQCfcQUHu/l8d7+RR1DtJ7nf WyqC
nLkAoL3sLx+JiZUUU311aHHIG7OXj9K1
=fjN3
-----END PGP SIGNATURE-----

--UugvWAfsgieZRqgk--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: failing a drive while RAID5 is initializing

am 23.02.2011 23:08:48 von Iordan Iordanov

Thanks for the clear answer. I am glad I didn't uncover a bug with
something so basic.

> Some of the features on Neil's current roadmap should allow for "lazy
> initialisation" where the recovery data is added only as the drive is
> written to, which should mean the array is available immediately but
> still retains full recoverability.

I was thinking that this is how things work now, but now that you
mentioned it, the man-page does say that the array is degraded to begin
with and is rebuilding onto the last device during initialization.

To make this lazy initialization possible, one would have to have a
bitmap of "dirtied" chunks, though, right?

Cheers,
Iordan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: failing a drive while RAID5 is initializing

am 23.02.2011 23:33:28 von Robin Hill

--IS0zKkzwUGydFO0o
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed Feb 23, 2011 at 05:08:48PM -0500, Iordan Iordanov wrote:

> > Some of the features on Neil's current roadmap should allow for "lazy
> > initialisation" where the recovery data is added only as the drive is
> > written to, which should mean the array is available immediately but
> > still retains full recoverability.
>=20
> I was thinking that this is how things work now, but now that you=20
> mentioned it, the man-page does say that the array is degraded to begin=
=20
> with and is rebuilding onto the last device during initialization.
>=20
> To make this lazy initialization possible, one would have to have a=20
> bitmap of "dirtied" chunks, though, right?
>=20
Yes, that's correct. He's also suggested using this same bitmap with
TRIM operations for SSDs - I'm not sure whether this is mainly intended
for recognising when an entire block in unsynced and can be trimmed, or
whether it's primarily to allow TRIM operations to be delayed until the
system is less busy (as they require a flush of the I/O buffer and
command queue).

Anyway, I'd suggest reading the roadmap as it goes into a lot more
detail on the planned implementation (it was posted here a week or so
ago, and is also on his blog at
http://neil.brown.name/blog/20090129234603).

Cheers,
Robin
--=20
___ =20
( ' } | Robin Hill |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |

--IS0zKkzwUGydFO0o
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iEYEARECAAYFAk1lizcACgkQShxCyD40xBJIcQCcDSR8iE6hsbSXFFfgJxy+ eLOX
aogAn2gSOjvNbHXb2/iaddsh691osikD
=fc4V
-----END PGP SIGNATURE-----

--IS0zKkzwUGydFO0o--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html