RAID6 and crashes

am 10.06.2010 20:02:42 von Miles Fidelman

Hi Folks,

I just recently converted a server from a basic Debian Lenny
installation to a virtualized platform (Debian Lenny, Xen 3, Debian
Lenny DomUs).

I also converted my underlying disk environment from RAID1 to a mix of
RAID1 (for Dom0) and RAID6/LVM/DRBD for the domUs. All the RAID is
implemented using md. (Yes I realize there's a performance hit - but it
seemed like a good idea at the time, and with volumes mounted with
"noatime" the performance is acceptable, though I'm sort of thinking now
of moving to RAID10).

Anyway, I'm still working out some instabilities in my virtualized
environment, and I seem to have a crash/reboot event maybe once a day
(still trying to track that down).

In some, but not all cases, I find the machine comes up with the RAID6
volume marked dirty, and an automatic resync gets initiated - which
takes several hours to complete, and drags performance way down while
it's going on.

Which leads to two questions:

1. Are there any known problems with md-based RAID6 that might,
themselves, lead to a crash/reboot? (I always suspect complicated,
low-level functions that are critical to everything).

2. Are there any settings that can reduce the likelihood of a RAID
volume being dirty after a crash? (The crash/reboot isn't that much of
a problem - the several hours of degraded performance ARE a problem.)

Thanks Very Much,

Miles Fidelman

--
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes

am 10.06.2010 20:57:43 von Roman Mamedov

--Sig_//CsN40REJv38Xdty5W8aUbc
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Thu, 10 Jun 2010 14:02:42 -0400
Miles Fidelman wrote:
=20
> 2. Are there any settings that can reduce the likelihood of a RAID=20
> volume being dirty after a crash? (The crash/reboot isn't that much of=20
> a problem - the several hours of degraded performance ARE a problem.)

Do you currently have a write intent bitmap in the array? I think it can
reduce the need for recovery by an order of magnitude in some cases. Check =
man
mdadm for --bitmap if you don't use it yet.

--=20
With respect,
Roman

--Sig_//CsN40REJv38Xdty5W8aUbc
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAkwRNagACgkQTLKSvz+PZwh1/gCcD/SuGnf3heTzMc9AEe/Z xN3A
EAoAoJWEf6FrdIpHYN1rTVeGcP+2Nolh
=P86t
-----END PGP SIGNATURE-----

--Sig_//CsN40REJv38Xdty5W8aUbc--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 10.06.2010 23:22:19 von Miles Fidelman

Roman Mamedov wrote:
> On Thu, 10 Jun 2010 14:02:42 -0400
> Miles Fidelman wrote:
>
>
>> 2. Are there any settings that can reduce the likelihood of a RAID
>> volume being dirty after a crash? (The crash/reboot isn't that much of
>> a problem - the several hours of degraded performance ARE a problem.)
>>
> Do you currently have a write intent bitmap in the array? I think it can
> reduce the need for recovery by an order of magnitude in some cases. Check man
> mdadm for --bitmap if you don't use it yet.
>
Just went through the process of turning it on for all my arrays.
Incredibly painless and quick. Now I get to wait and see if it helps
the next time I have a crash/reboot event.

Thanks very much!

Miles

--
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 10.06.2010 23:41:08 von Roman Mamedov

--Sig_/4ELId/txoWIMsXWC9tFOX5K
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Thu, 10 Jun 2010 17:22:19 -0400
Miles Fidelman wrote:

> > Do you currently have a write intent bitmap in the array? I think it can
> > reduce the need for recovery by an order of magnitude in some cases. Ch=
eck
> > man mdadm for --bitmap if you don't use it yet.
> > =20
> Just went through the process of turning it on for all my arrays. =20
> Incredibly painless and quick. Now I get to wait and see if it helps=20
> the next time I have a crash/reboot event.

I assume you went with "internal" bitmap, in which case if you notice that
write speed on the arrays became significantly lower, the first thing you
should look at is increasing the --bitmap-chunk size (I use 131072).

It is possible to use an external bitmap on an independent device (which has
almost zero performance impact), but in this case it could be non-trivial to
100% ensure that such a device is mounted and accessible at the moment duri=
ng
boot-up when md arrays are being started, especially if one of those arrays
also hosts the root FS.

--=20
With respect,
Roman

--Sig_/4ELId/txoWIMsXWC9tFOX5K
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAkwRW/QACgkQTLKSvz+PZwhNoQCgi216EQj+fVxAan3AIzWU sDiI
uBwAn3n4maWHXudpnKS3hNAWE8Z39pR7
=B/k4
-----END PGP SIGNATURE-----

--Sig_/4ELId/txoWIMsXWC9tFOX5K--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 11.06.2010 00:40:11 von Miles Fidelman

Roman Mamedov wrote:
> On Thu, 10 Jun 2010 17:22:19 -0400
> Miles Fidelman wrote:
>
>
>>> Do you currently have a write intent bitmap in the array? I think it can
>>> reduce the need for recovery by an order of magnitude in some cases. Check
>>> man mdadm for --bitmap if you don't use it yet.
>>>
>>>
>> Just went through the process of turning it on for all my arrays.
>> Incredibly painless and quick. Now I get to wait and see if it helps
>> the next time I have a crash/reboot event.
>>
> I assume you went with "internal" bitmap, in which case if you notice that
> write speed on the arrays became significantly lower, the first thing you
> should look at is increasing the --bitmap-chunk size (I use 131072).
>
Now you tell me :-)

Yes... went with internal.

I'll keep an eye on write performance. Do you happen to know, off hand,
a magic incantation to change the bitmap-chunk size? (Do I need to
remove the bitmap I just set up and reinstall one with the larger chunk
size?)
> It is possible to use an external bitmap on an independent device (which has
> almost zero performance impact), but in this case it could be non-trivial to
> 100% ensure that such a device is mounted and accessible at the moment during
> boot-up when md arrays are being started, especially if one of those arrays
> also hosts the root FS.
>
I think I'll stick with internal.

Thanks again,

Miles

--
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 11.06.2010 04:51:12 von Roman Mamedov

--Sig_/79edi9wJjQea/6G4RIXO_xj
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Thu, 10 Jun 2010 18:40:11 -0400
Miles Fidelman wrote:

> Yes... went with internal.
>=20
> I'll keep an eye on write performance. Do you happen to know, off hand,=
=20
> a magic incantation to change the bitmap-chunk size? (Do I need to=20
> remove the bitmap I just set up and reinstall one with the larger chunk=20
> size?)

Remove (--bitmap=3Dnone) then add again with new --bitmap-chunk.

--=20
With respect,
Roman

--Sig_/79edi9wJjQea/6G4RIXO_xj
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAkwRpKEACgkQTLKSvz+PZwhuQQCeKKZK+OUD06+0kRVFackH S/C7
o4AAoJWj3LgR7OWRS6sAhJhs/x0qdVqo
=C/Kh
-----END PGP SIGNATURE-----

--Sig_/79edi9wJjQea/6G4RIXO_xj--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: RAID6 and crashes (reporting back re. --bitmap)

am 11.06.2010 06:31:57 von Graham Mitchell

Can you do this on a live array, or can it only be done (as the docs seem to
suggest), with the create, build and grow options?

G

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Roman Mamedov
> Sent: Thursday, June 10, 2010 10:51 PM
> To: Miles Fidelman
> Cc: linux-raid@vger.kernel.org
> Subject: Re: RAID6 and crashes (reporting back re. --bitmap)
>
> On Thu, 10 Jun 2010 18:40:11 -0400
> Miles Fidelman wrote:
>
> > Yes... went with internal.
> >
> > I'll keep an eye on write performance. Do you happen to know, off
> > hand, a magic incantation to change the bitmap-chunk size? (Do I need
> > to remove the bitmap I just set up and reinstall one with the larger
> > chunk
> > size?)
>
> Remove (--bitmap=none) then add again with new --bitmap-chunk.
>
> --
> With respect,
> Roman

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 11.06.2010 06:41:34 von Roman Mamedov

--Sig_/eQ9wjZH.5U+Js7e4gif6yHJ
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Fri, 11 Jun 2010 00:31:57 -0400
"Graham Mitchell" wrote:

> Can you do this on a live array, or can it only be done (as the docs seem=
to
> suggest), with the create, build and grow options?

It is a variant of the grow operation, but it can be done on a live array,
even mounted, and completes instantly:

mdadm --grow /dev/md0 --bitmap=3Dinternal --bitmap-chunk=3D131072

In my experience, removing the bitmap (setting it to none) may occasionally
fail (probably when the array has a lot of outstanding write requests), but
just try again when it's a bit quieter, and it'll work.

--=20
With respect,
Roman

--Sig_/eQ9wjZH.5U+Js7e4gif6yHJ
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAkwRvn4ACgkQTLKSvz+PZwiL6wCcDW8Q0NPUhjhwlLZoQ/Hy vHOS
JbMAnjgR6asMyPhTOlmy3fsAVX7rz1JN
=LUQx
-----END PGP SIGNATURE-----

--Sig_/eQ9wjZH.5U+Js7e4gif6yHJ--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 11.06.2010 06:42:24 von Miles Fidelman

Graham Mitchell wrote:
> Can you do this on a live array, or can it only be done (as the docs seem to
> suggest), with the create, build and grow options?
>
>
>
I just did it on a live array. Some --grow options, including --bitmap,
seem to work on live arrays.

Miles

--
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 11.06.2010 06:46:47 von Miles Fidelman

Roman Mamedov wrote:
> On Thu, 10 Jun 2010 18:40:11 -0400
> Miles Fidelman wrote:
>
>
>> Yes... went with internal.
>>
>> I'll keep an eye on write performance. Do you happen to know, off hand,
>> a magic incantation to change the bitmap-chunk size? (Do I need to
>> remove the bitmap I just set up and reinstall one with the larger chunk
>> size?)
>>
> Remove (--bitmap=none) then add again with new --bitmap-chunk.
>
>
Looks like my original --bitmap internal creation set a very large chunk
size initially

md3 : active raid6 sda4[0] sdd4[3] sdc4[2] sdb4[1]
947417088 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
bitmap: 6/226 pages [24KB], 1024KB chunk

unless that --bitmap-chunk=131072 recommendation is translates to
131072KB (if so, are you really running 131MB chunks?)

Miles

--
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 11.06.2010 06:50:19 von NeilBrown

On Fri, 11 Jun 2010 00:31:57 -0400
"Graham Mitchell" wrote:

> Can you do this on a live array, or can it only be done (as the docs seem to
> suggest), with the create, build and grow options?
>

As 'grow' can (and must) be used on a live array you're question doesn't
exactly make sense.
Yes: it can be done on a live array.

NeilBrown

>
> G
>
> > -----Original Message-----
> > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> > owner@vger.kernel.org] On Behalf Of Roman Mamedov
> > Sent: Thursday, June 10, 2010 10:51 PM
> > To: Miles Fidelman
> > Cc: linux-raid@vger.kernel.org
> > Subject: Re: RAID6 and crashes (reporting back re. --bitmap)
> >
> > On Thu, 10 Jun 2010 18:40:11 -0400
> > Miles Fidelman wrote:
> >
> > > Yes... went with internal.
> > >
> > > I'll keep an eye on write performance. Do you happen to know, off
> > > hand, a magic incantation to change the bitmap-chunk size? (Do I need
> > > to remove the bitmap I just set up and reinstall one with the larger
> > > chunk
> > > size?)
> >
> > Remove (--bitmap=none) then add again with new --bitmap-chunk.
> >
> > --
> > With respect,
> > Roman
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 11.06.2010 06:55:21 von Roman Mamedov

--Sig_/kvqIACyZDocIU28L2Csw/4z
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Fri, 11 Jun 2010 00:46:47 -0400
Miles Fidelman wrote:

> Looks like my original --bitmap internal creation set a very large chunk=
=20
> size initially
>=20
> md3 : active raid6 sda4[0] sdd4[3] sdc4[2] sdb4[1]
> 947417088 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
> bitmap: 6/226 pages [24KB], 1024KB chunk
>=20
> unless that --bitmap-chunk=3D131072 recommendation is translates to=20
> 131072KB (if so, are you really running 131MB chunks?)

Yes, this is correct.
This will only mean that after an unclean shutdown, at least 128MB-sized
areas of the array will be invalidated for a resync, and not smaller areas
with 1MB-granularity like on yours currently. 128 megabytes is just about 1
second of read throughput on modern drives, so I am okay with that. Several
128MB-windows here and there are still faster to resync than the whole arra=
y.
And this had an extemely good effect on write performance for me (increased=
it
by more than 1.5x) compared to a small chunk. Test for yourself, first with=
out
the bitmap, then with various chunk sizes of it (ensure there's no other lo=
ad
on the array, and note the speeds):

dd if=3D/dev/zero of=3D/your-raid/zerofile bs=3D1M count=3D2048 conv=3Dnotr=
unc,fdatasync

--=20
With respect,
Roman

--Sig_/kvqIACyZDocIU28L2Csw/4z
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAkwRwbkACgkQTLKSvz+PZwhQqgCgjm8kJeBfUDZPyUP7MzYX mUak
b4EAn31AUyO4sUHpfuGlgK2nE5S162hG
=75fx
-----END PGP SIGNATURE-----

--Sig_/kvqIACyZDocIU28L2Csw/4z--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 11.06.2010 07:08:11 von NeilBrown

On Fri, 11 Jun 2010 00:46:47 -0400
Miles Fidelman wrote:

> Roman Mamedov wrote:
> > On Thu, 10 Jun 2010 18:40:11 -0400
> > Miles Fidelman wrote:
> >
> >
> >> Yes... went with internal.
> >>
> >> I'll keep an eye on write performance. Do you happen to know, off hand,
> >> a magic incantation to change the bitmap-chunk size? (Do I need to
> >> remove the bitmap I just set up and reinstall one with the larger chunk
> >> size?)
> >>
> > Remove (--bitmap=none) then add again with new --bitmap-chunk.
> >
> >
> Looks like my original --bitmap internal creation set a very large chunk
> size initially
>
> md3 : active raid6 sda4[0] sdd4[3] sdc4[2] sdb4[1]
> 947417088 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
> bitmap: 6/226 pages [24KB], 1024KB chunk
>
> unless that --bitmap-chunk=131072 recommendation is translates to
> 131072KB (if so, are you really running 131MB chunks?)

Yes, and 131MB (128MiB) is probably a little on the large side, but not
excessively so and may well be a very good number.

My current rule-of-thumb is that the bitmap chunk size should be about the
amount of data that can be written sequentially in 1 second. 131MB is maybe 2
seconds with today's technology, so it is close enough.

The idea is that normally if your filesystems provides fairly good locality,
you should not have very many bits in the bitmap set. Probably 10s, possible
100s.

If this is the case, and each takes 1 second to resync, then resync time is
limited to a few minutes.

Smaller chunks might reduce this to less than a minute, but that probably
isn't worth it. Conversely smaller chunks will tend to mean more updates to
the bitmap, so slower writes all the time.

On a 1TB drive there are 7500 131MB chunks. So assuming a relatively small
number of bits set at a time, this will reduce resync time by a factor of
somewhere between 200 and 1000. Hours become fewer minutes. This is
probably enough for most situations.

I would be really interested to find out if my assumption of small numbers of
bits set is valid. You can find out the number of bits set at any instant
with "mdadm -X" run on some component of the array.

If anyone is able to report some samples of that number along with array
size / level / layout / number of devices etc and some guide to the workload,
it might be helpful in validating my rule-of-thumb.

Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 11.06.2010 13:10:23 von John Hendrikx

Neil Brown wrote:
> On Fri, 11 Jun 2010 00:46:47 -0400
> Miles Fidelman wrote:
>
>
>> Roman Mamedov wrote:
>>
>>> On Thu, 10 Jun 2010 18:40:11 -0400
>>> Miles Fidelman wrote:
>>>
>>>
>>>
>>>> Yes... went with internal.
>>>>
>>>> I'll keep an eye on write performance. Do you happen to know, off hand,
>>>> a magic incantation to change the bitmap-chunk size? (Do I need to
>>>> remove the bitmap I just set up and reinstall one with the larger chunk
>>>> size?)
>>>>
>>>>
>>> Remove (--bitmap=none) then add again with new --bitmap-chunk.
>>>
>>>
>>>
>> Looks like my original --bitmap internal creation set a very large chunk
>> size initially
>>
>> md3 : active raid6 sda4[0] sdd4[3] sdc4[2] sdb4[1]
>> 947417088 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
>> bitmap: 6/226 pages [24KB], 1024KB chunk
>>
>> unless that --bitmap-chunk=131072 recommendation is translates to
>> 131072KB (if so, are you really running 131MB chunks?)
>>
>
> Yes, and 131MB (128MiB) is probably a little on the large side, but not
> excessively so and may well be a very good number.
>
I'm using --bitmap-chunk=131072 as well, with the same reasoning as you
outlined in your post. The bitmap will be small and require few updates
while still providing a huge reduction in resync times.

> On a 1TB drive there are 7500 131MB chunks. So assuming a relatively small
> number of bits set at a time, this will reduce resync time by a factor of
> somewhere between 200 and 1000. Hours become fewer minutes. This is
> probably enough for most situations.
>
> I would be really interested to find out if my assumption of small numbers of
> bits set is valid. You can find out the number of bits set at any instant
> with "mdadm -X" run on some component of the array.
>
I was interested as well, so I ran this command:

> mdadm -X /dev/md2

and this is the result(??):

Filename : /dev/md2
Magic : d747992c
mdadm: invalid bitmap magic 0xd747992c, the bitmap file appears to be
corrupted
Version : 1132474982
mdadm: unknown bitmap version 1132474982, either the bitmap file is
corrupted or you need to upgrade your tools

> cat /proc/mdstat

Personalities : [raid1] [raid6] [raid5] [raid4] [raid0]
md3 : active raid6 sdd1[7] sda1[0] sdj1[6] sdc1[3] sdg1[2] sdb1[1]
3867871232 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/6]
[UUUUUU]
bitmap: 0/4 pages [0KB], 131072KB chunk

md2 : active raid6 sde1[7] sdi1[6] sdh1[3] sdg3[2] sdb2[1] sda2[0]
3867871232 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/6]
[UUUUUU]
bitmap: 0/4 pages [0KB], 131072KB chunk

md0 : active raid1 sdg2[0] hda1[1]
9767424 blocks [2/2] [UU]

unused devices:

I upgraded to the latest available mdadm (in debian unstable) and it has
the same results (for both arrays).

> mdadm --version
mdadm - v3.1.2 - 10th March 2010

> uname -a
Linux Ukyo 2.6.27.5 #1 SMP PREEMPT Sun Nov 9 08:32:40 CET 2008 i686
GNU/Linux

Is this normal? :) Both arrays were freshly created a few days ago,
with mdadm v3.0.3...
> If anyone is able to report some samples of that number along with array
> size / level / layout / number of devices etc and some guide to the workload,
> it might be helpful in validating my rule-of-thumb.
>
> Thanks,
> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

--John

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 11.06.2010 13:50:46 von Roman Mamedov

--Sig_/WhKGcgYUdZn6RmeCRr0MB_d
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Fri, 11 Jun 2010 13:10:23 +0200
John Hendrikx wrote:

> > I would be really interested to find out if my assumption of small numb=
ers
> > of bits set is valid. You can find out the number of bits set at any
> > instant with "mdadm -X" run on some component of the array.
> > =20
> I was interested as well, so I ran this command:
>=20
> > mdadm -X /dev/md2
>=20
> and this is the result(??):
>=20
> Filename : /dev/md2
> Magic : d747992c
> mdadm: invalid bitmap magic 0xd747992c, the bitmap file appears to be=20
> corrupted
> Version : 1132474982
> mdadm: unknown bitmap version 1132474982, either the bitmap file is=20
> corrupted or you need to upgrade your tools

I stumbled in the same way initially, but then re-read more closely and
noticed that Neil said to run it "on some component of the array",
e.g. /dev/sdxN, not the array itself -- and that way it worked fine. Howeve=
r as
my array sees almost no write load at the moment, I have no useful results =
to
report.

--=20
With respect,
Roman

--Sig_/WhKGcgYUdZn6RmeCRr0MB_d
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAkwSIxYACgkQTLKSvz+PZwjz1gCghjb68caR/TiN0cLW15SX BcF+
MSwAn1RhKqKV2bmzXhgK72VPqSKerOfX
=EJfg
-----END PGP SIGNATURE-----

--Sig_/WhKGcgYUdZn6RmeCRr0MB_d--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: RAID6 and crashes (reporting back re. --bitmap)

am 11.06.2010 14:13:48 von Graham Mitchell

Thanks to everyone for replying, it is indeed a simple operation and
completes immediately. I was just worried since I equate grow more to an
array reshape than to a management/tune option. I guess it's just my
paranoia showing thru... :)

G

> -----Original Message-----
> From: Roman Mamedov [mailto:roman@rm.pp.ru]
> Sent: Friday, June 11, 2010 12:42 AM
> To: Graham Mitchell
> Cc: linux-raid@vger.kernel.org
> Subject: Re: RAID6 and crashes (reporting back re. --bitmap)
>
> On Fri, 11 Jun 2010 00:31:57 -0400
> "Graham Mitchell" wrote:
>
> > Can you do this on a live array, or can it only be done (as the docs
> > seem to suggest), with the create, build and grow options?
>
> It is a variant of the grow operation, but it can be done on a live array,
even
> mounted, and completes instantly:
>
> mdadm --grow /dev/md0 --bitmap=internal --bitmap-chunk=131072
>
> In my experience, removing the bitmap (setting it to none) may
occasionally
> fail (probably when the array has a lot of outstanding write requests),
but
> just try again when it's a bit quieter, and it'll work.
>
> --
> With respect,
> Roman

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: RAID6 and crashes (reporting back re. --bitmap)

am 11.06.2010 14:25:22 von Graham Mitchell

> I was interested as well, so I ran this command:
>

I also was interested, and did mdadm -X /dev/md0, and this is my output...

mdadm -X /dev/md0
Filename : /dev/md0
Magic : 00000000
mdadm: invalid bitmap magic 0x0, the bitmap file appears to be corrupted
Version : 0
mdadm: unknown bitmap version 0, either the bitmap file is corrupted or you
need to upgrade your tools

mdadm --version
mdadm - v3.0.3 - 22nd October 2009

uname -a
Linux file00bert.woodlea.org.uk 2.6.32.12-115.fc12.i686.PAE #1 SMP Fri Apr
30 20:14:08 UTC 2010 i686 i686 i386 GNU/Linux

cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdh1[0] sda1[14] sdp1[13] sdg1[12] sdo1[11] sdf1[10]
sdk1[9] sdn1[8] sde1[7] sdj1[6] sdm1[5] sdd1[4] sdi1[3] sdl1[2] sdc1[1]
6348985344 blocks super 1.2 level 6, 512k chunk, algorithm 2 [15/15]
[UUUUUUUUUUUUUUU]
bitmap: 0/2 pages [0KB], 131072KB chunk

unused devices:

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: RAID6 and crashes (reporting back re. --bitmap)

am 11.06.2010 14:29:44 von Graham Mitchell

> I stumbled in the same way initially, but then re-read more closely and
> noticed that Neil said to run it "on some component of the array", e.g.
> /dev/sdxN, not the array itself -- and that way it worked fine. However as
my
> array sees almost no write load at the moment, I have no useful results to
> report.
>
> --
> With respect,
> Roman
[GM> ]

Duh.... You are correct, here's the output from one of my disks

mdadm -X /dev/sdn1
Filename : /dev/sdn1
Magic : 6d746962
Version : 4
UUID : 1470c671:4236b155:67287625:899db153
Events : 10584
Events Cleared : 10584
State : OK
Chunksize : 128 MB
Daemon : 5s flush period
Write Mode : Normal
Sync Size : 488383488 (465.76 GiB 500.10 GB)
Bitmap : 3727 bits (chunks), 0 dirty (0.0%)

Like you, I've no load on it at the moment, but I do have a couple of GB to
copy onto it today, so I'll see if I can get some more figures.

I think I need some tea...:)

G

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 11.06.2010 22:26:01 von Miles Fidelman

FYI: I just:
- removed the bitmap
- installed a new bitmap with larger chunk-size
on 4 arrays, on each of two machines (redundant high-availability
cluster setup)

Took me all of about 5 minutes, of which most of the time was waiting
for virtual machines to migrate from one machine to the other, and then
back.

All seems to be working, performance seems just a little snappier - but
who can really tell until the next time an array rebuilds.

Thanks all (and Roman in particular) for your guidance.

Miles

Roman Mamedov wrote:
> On Fri, 11 Jun 2010 00:46:47 -0400
> Miles Fidelman wrote:
>
>
>> Looks like my original --bitmap internal creation set a very large chunk
>> size initially
>>
>> md3 : active raid6 sda4[0] sdd4[3] sdc4[2] sdb4[1]
>> 947417088 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
>> bitmap: 6/226 pages [24KB], 1024KB chunk
>>
>> unless that --bitmap-chunk=131072 recommendation is translates to
>> 131072KB (if so, are you really running 131MB chunks?)
>>
> Yes, this is correct.
> This will only mean that after an unclean shutdown, at least 128MB-sized
> areas of the array will be invalidated for a resync, and not smaller areas
> with 1MB-granularity like on yours currently. 128 megabytes is just about 1
> second of read throughput on modern drives, so I am okay with that. Several
> 128MB-windows here and there are still faster to resync than the whole array.
> And this had an extemely good effect on write performance for me (increased it
> by more than 1.5x) compared to a small chunk. Test for yourself, first without
> the bitmap, then with various chunk sizes of it (ensure there's no other load
> on the array, and note the speeds):
>
> dd if=/dev/zero of=/your-raid/zerofile bs=1M count=2048 conv=notrunc,fdatasync
>
>

--
In theory, there is no difference between theory and practice.
In practice, there is. .... Yogi Berra

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 13.06.2010 16:28:34 von Bernd Schubert

On Friday 11 June 2010, Neil Brown wrote:
> On Fri, 11 Jun 2010 00:31:57 -0400
>
> "Graham Mitchell" wrote:
> > Can you do this on a live array, or can it only be done (as the docs seem
> > to suggest), with the create, build and grow options?
>
> As 'grow' can (and must) be used on a live array you're question doesn't
> exactly make sense.
> Yes: it can be done on a live array.

While I have done this myself a couple of times, I still do not understand
where it takes the disk space for the bitmap journal from? Is this space mdadm
reserved for this purpose?

Thanks,
Bernd
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 14.06.2010 01:05:20 von NeilBrown

On Sun, 13 Jun 2010 16:28:34 +0200
Bernd Schubert wrote:

> On Friday 11 June 2010, Neil Brown wrote:
> > On Fri, 11 Jun 2010 00:31:57 -0400
> >
> > "Graham Mitchell" wrote:
> > > Can you do this on a live array, or can it only be done (as the docs seem
> > > to suggest), with the create, build and grow options?
> >
> > As 'grow' can (and must) be used on a live array you're question doesn't
> > exactly make sense.
> > Yes: it can be done on a live array.
>
> While I have done this myself a couple of times, I still do not understand
> where it takes the disk space for the bitmap journal from? Is this space mdadm
> reserved for this purpose?

Sort-of.
It uses space that the alignment requirements of the metadata assure us is
otherwise unused.
For v0.90, that is limited to 60K. For 1.x it is 3K.
With recent kernels it is possible for mdadm to tell the kernel where to put
the bitmap (rather than the kernel *knowing*) so mdadm could use other space
that was reserved when the array was created, but I haven't implemented that
in mdadm yet.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 14.06.2010 11:01:09 von Bernd Schubert

On Monday 14 June 2010, Neil Brown wrote:
> On Sun, 13 Jun 2010 16:28:34 +0200
>
> Bernd Schubert wrote:
> > On Friday 11 June 2010, Neil Brown wrote:
> > > On Fri, 11 Jun 2010 00:31:57 -0400
> > >
> > > "Graham Mitchell" wrote:
> > > > Can you do this on a live array, or can it only be done (as the docs
> > > > seem to suggest), with the create, build and grow options?
> > >
> > > As 'grow' can (and must) be used on a live array you're question
> > > doesn't exactly make sense.
> > > Yes: it can be done on a live array.
> >
> > While I have done this myself a couple of times, I still do not
> > understand where it takes the disk space for the bitmap journal from? Is
> > this space mdadm reserved for this purpose?
>
> Sort-of.
> It uses space that the alignment requirements of the metadata assure us is
> otherwise unused.
> For v0.90, that is limited to 60K. For 1.x it is 3K.
> With recent kernels it is possible for mdadm to tell the kernel where to
> put the bitmap (rather than the kernel *knowing*) so mdadm could use other
> space that was reserved when the array was created, but I haven't
> implemented that in mdadm yet.

Thanks a lot Neil! I added these information to the raid wiki

https://raid.wiki.kernel.org/index.php/Bitmap#Used_disk_spac e_for_bitmaps

Cheers,
Bernd
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 14.06.2010 11:14:25 von Roman Mamedov

--Sig_/J/Dp9Vw4KHJ4DF/hq6Ea09p
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Mon, 14 Jun 2010 09:05:20 +1000
Neil Brown wrote:

> > While I have done this myself a couple of times, I still do not underst=
and=20
> > where it takes the disk space for the bitmap journal from? Is this space
> > mdadm reserved for this purpose?
>=20
> Sort-of.
> It uses space that the alignment requirements of the metadata assure us is
> otherwise unused.
> For v0.90, that is limited to 60K. For 1.x it is 3K.

I have now:

md0 : active raid5 sdf3[3] sde3[1] sda3[0]
3887004672 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UU=
U]
bitmap: 1/8 pages [4KB], 131072KB chunk

Metadata is 1.2, and the internal bitmap is 8 pages, which is 32K, not 3K.
Did I misunderstand something, or perhaps 3K was a typo?

--=20
With respect,
Roman

--Sig_/J/Dp9Vw4KHJ4DF/hq6Ea09p
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAkwV8vEACgkQTLKSvz+PZwhqyACfRXKK1AYUNOYr/P12eons iGvZ
oDoAn2HOfK/z8xoQ9ALzLUujZ0LGr8CK
=CVEF
-----END PGP SIGNATURE-----

--Sig_/J/Dp9Vw4KHJ4DF/hq6Ea09p--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 14.06.2010 11:47:42 von NeilBrown

On Mon, 14 Jun 2010 15:14:25 +0600
Roman Mamedov wrote:

> On Mon, 14 Jun 2010 09:05:20 +1000
> Neil Brown wrote:
>
> > > While I have done this myself a couple of times, I still do not understand
> > > where it takes the disk space for the bitmap journal from? Is this space
> > > mdadm reserved for this purpose?
> >
> > Sort-of.
> > It uses space that the alignment requirements of the metadata assure us is
> > otherwise unused.
> > For v0.90, that is limited to 60K. For 1.x it is 3K.
>
> I have now:
>
> md0 : active raid5 sdf3[3] sde3[1] sda3[0]
> 3887004672 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
> bitmap: 1/8 pages [4KB], 131072KB chunk
>
> Metadata is 1.2, and the internal bitmap is 8 pages, which is 32K, not 3K.
> Did I misunderstand something, or perhaps 3K was a typo?
>

The pages used to store the bitmap internally use 16 bits per bitmap-chunk,
to count how many active IO requests to the chunk there are. So it is
potentially 16 times the size of the bitmap stored on disk. For that reason
we free pages for which all chunks are idle. In your case, only one of the 8
pages currently has any active chunks.

There are 3887004672 / 131072 or about 29655 chunks. Hence bits.
29655/8 is 3706 bytes which you will notice is still larger than 3 K.

When you create an array an specify that a bitmap be added at the same time,
there is more flexibility for size and location of the bitmap. It can easily
be more that 3K in that case.

So presumably this array was created with a bitmap, rather than created
without a bitmap and had a bitmap added later with --grow. Correct?

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 14.06.2010 13:53:27 von Roman Mamedov

--Sig_/s4oJ5R+_7SakiPr2KRr/8wD
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Mon, 14 Jun 2010 19:47:42 +1000
Neil Brown wrote:

> When you create an array an specify that a bitmap be added at the same ti=
me,
> there is more flexibility for size and location of the bitmap. It can ea=
sily
> be more that 3K in that case.
>=20
> So presumably this array was created with a bitmap, rather than created
> without a bitmap and had a bitmap added later with --grow. Correct?

Yes, as far as I remember. However, it seems a bit unfortunate to have any
significant difference between adding the bitmap right when creating the
array, and adding it later. For that reason, I'd suggest reserving more spa=
ce
in the metadata than 3K, even if the bitmap isn't requested - and even if it
won't be added later, then that space could prove useful for something else
that might require it later. Maybe 64, 128 or 256K - still miniscule compar=
ed
to the array size, and could provide some nice flexibility for the future.

--=20
With respect,
Roman

--Sig_/s4oJ5R+_7SakiPr2KRr/8wD
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAkwWGDcACgkQTLKSvz+PZwjfVQCeKW+CXHM3ukhAxcI0jFM1 3nOC
7LAAnRxgLOcIkqmX4ro3cJUM01/64nDF
=SrkB
-----END PGP SIGNATURE-----

--Sig_/s4oJ5R+_7SakiPr2KRr/8wD--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID6 and crashes (reporting back re. --bitmap)

am 14.06.2010 23:24:49 von NeilBrown

On Mon, 14 Jun 2010 17:53:27 +0600
Roman Mamedov wrote:

> On Mon, 14 Jun 2010 19:47:42 +1000
> Neil Brown wrote:
>
> > When you create an array an specify that a bitmap be added at the same time,
> > there is more flexibility for size and location of the bitmap. It can easily
> > be more that 3K in that case.
> >
> > So presumably this array was created with a bitmap, rather than created
> > without a bitmap and had a bitmap added later with --grow. Correct?
>
> Yes, as far as I remember. However, it seems a bit unfortunate to have any
> significant difference between adding the bitmap right when creating the
> array, and adding it later. For that reason, I'd suggest reserving more space
> in the metadata than 3K, even if the bitmap isn't requested - and even if it
> won't be added later, then that space could prove useful for something else
> that might require it later. Maybe 64, 128 or 256K - still miniscule compared
> to the array size, and could provide some nice flexibility for the future.
>

Yes. An I'm fairly sure mdadm does always reserve space.
The point is that (until very recently) it wasn't possible to tell the kernel
where to add a bitmap to an active array - just that it should add one.
So it could only add it at a place that it was certain would be usable.
That is the place I described.
It is now possible to give the kernel more details of the bitmap to add. I
just need to teach mdadm how to choose the best space and how to tell the
kernel about it.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html