Preventing a RAID device from starting until all disks are ready

Preventing a RAID device from starting until all disks are ready

am 14.10.2010 17:36:44 von Andrew Klaassen

I'm having problems with a 56-drive fibre-channel software RAID-10 array.

During boot, mdadm starts the array before one of the two fibre-channel cards has started its disk detection. The array comes up, but with only 28 of 56 drives, and I have to manually re-add the drives and cross my fingers that nothing will go wrong during the 10-hour rebuild.

Is there any way to tell mdadm to wait longer, or to not attempt to start the array if not all devices are present, or... (any other solution you can think of)?

I'm on Centos 5.2.

Thanks.

Andrew




--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing a RAID device from starting until all disks are ready

am 14.10.2010 18:00:42 von Iordan Iordanov

Hi Andrew,

Andrew Klaassen wrote:
> During boot, mdadm starts the array before one of the two fibre-channel cards has started its disk detection. The array comes up, but with only 28 of 56 drives, and I have to manually re-add the drives and cross my fingers that nothing will go wrong during the 10-hour rebuild.

Have you considered enabling a write-intent bitmap on your array? This
way, at least your rebuild will take seconds instead of 10 hours. Write
intent bitmap support for RAID10 was introduced in 2005, and hopefully
CentOS 5.2 supports it.

> Is there any way to tell mdadm to wait longer, or to not attempt to start the array if not all devices are present, or... (any other solution you can think of)?

We have iscsi targets for drives in our array, and we make sure that
we've logged into all 30 of our drives before we continue to enable
mdadm (we literally count the number of iscsi sessions open). You can
try counting the number of block devices present (in /dev/block) that
match a certain pattern, or perhaps your fiber channel driver offers an
even more convenient facility in /dev.

However, it would be great if there really was a way to tell mdadm to
wait until the devices are ready. I'm not aware of one though.

Cheers!
Iordan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing a RAID device from starting until all disks are ready

am 14.10.2010 19:00:32 von Andrew Klaassen

--- On Thu, 10/14/10, Iordan Iordanov wrote:

> Have you considered enabling a write-intent bitmap on your
> array? This way, at least your rebuild will take seconds
> instead of 10 hours. Write intent bitmap support for RAID10
> was introduced in 2005, and hopefully CentOS 5.2 supports
> it.

I've never heard of that - sounds fantastic. Does it have any performance penalties during heavy writes?

> We have iscsi targets for drives in our array, and we make
> sure that we've logged into all 30 of our drives before we
> continue to enable mdadm (we literally count the number of
> iscsi sessions open). You can try counting the number of
> block devices present (in /dev/block) that match a certain
> pattern, or perhaps your fiber channel driver offers an even
> more convenient facility in /dev.

Are you doing the mdadm startup in rc.local, or in the initrd, or...?

Andrew


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing a RAID device from starting until all disks are ready

am 14.10.2010 21:31:39 von Andrew Klaassen

--- On Thu, 10/14/10, Iordan Iordanov wrote:

> We have iscsi targets for drives in our array, and we make
> sure that we've logged into all 30 of our drives before we
> continue to enable mdadm (we literally count the number of
> iscsi sessions open). You can try counting the number of
> block devices present (in /dev/block) that match a certain
> pattern, or perhaps your fiber channel driver offers an even
> more convenient facility in /dev.

It seems like the simplest way to do this would be to have two mdadm.conf files; one for the root arrays that need to come up right away, one for the FC/iSCSI arrays that need to wait.

Is either of these ideas:

- run two "mdadm --monitor" processes simultaneously, one for each set of arrays, or

- specify two config file arguments to "mdadm --monitor"

....possible?

Thanks.

Andrew




--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing a RAID device from starting until all disks areready

am 15.10.2010 03:54:48 von NeilBrown

On Thu, 14 Oct 2010 12:00:42 -0400
Iordan Iordanov wrote:

> Hi Andrew,
>
> Andrew Klaassen wrote:
> > During boot, mdadm starts the array before one of the two fibre-channel cards has started its disk detection. The array comes up, but with only 28 of 56 drives, and I have to manually re-add the drives and cross my fingers that nothing will go wrong during the 10-hour rebuild.
>
> Have you considered enabling a write-intent bitmap on your array? This
> way, at least your rebuild will take seconds instead of 10 hours. Write
> intent bitmap support for RAID10 was introduced in 2005, and hopefully
> CentOS 5.2 supports it.
>
> > Is there any way to tell mdadm to wait longer, or to not attempt to start the array if not all devices are present, or... (any other solution you can think of)?
>
> We have iscsi targets for drives in our array, and we make sure that
> we've logged into all 30 of our drives before we continue to enable
> mdadm (we literally count the number of iscsi sessions open). You can
> try counting the number of block devices present (in /dev/block) that
> match a certain pattern, or perhaps your fiber channel driver offers an
> even more convenient facility in /dev.
>
> However, it would be great if there really was a way to tell mdadm to
> wait until the devices are ready. I'm not aware of one though.
>

Time to go back and read the mdadm man page. From top to bottom. Twice.

I suspect that --no-degraded is the flag you want.\

It was introduced in mdadm 2.5

There are three scenarios that could be relevant.

1/ If an array is being assembled explicitly, e.g.
mdadm --assemble /dev/mdX .....
then mdadm will refuse to assemble the array if any expected devices are
missing. You need to add "--run" to get it to start a partial array.

2/ If an array is being assembled using auto-assembly, e.g.
mdadm --assemble --scan
then mdadm will start partial arrays if it cannot find the missing parts
anyway. You can tell it not to with --no-degraded. This flag is actually a
misnomer. It may well assemble a degraded array, but only if the array was
degraded the last time it was active.

3/ If an array is being assembled used a sequence of --incremental commands,
e.g.
mdadm --incremental /dev/first
mdadm --incremental /dev/second
etc

then mdadm won't assemble the array until all expected devices have been
found. Using "--run" will override this so the array is assembled as soon
as enough devices are present. Once all possible devices have been
presented to mdadm it "mdadm -incremental device" you can tell mdadm to
start any arrays that haven't been started yet with
mdadm --incremental --run

Hope that clears it up.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing a RAID device from starting until all disks are ready

am 15.10.2010 10:19:12 von Jon Hardcastle

> > Hi Andrew,
> >=20
> > Andrew Klaassen wrote:
> > > During boot, mdadm starts the array before one of
> the two fibre-channel cards has started its disk
> detection.=A0 The array comes up, but with only 28 of 56
> drives, and I have to manually re-add the drives and cross
> my fingers that nothing will go wrong during the 10-hour
> rebuild.
> >=20
> > Have you considered enabling a write-intent bitmap on
> your array? This=20
> > way, at least your rebuild will take seconds instead
> of 10 hours. Write=20
> > intent bitmap support for RAID10 was introduced in
> 2005, and hopefully=20
> > CentOS 5.2 supports it.
> >=20
> > > Is there any way to tell mdadm to wait longer, or
> to not attempt to start the array if not all devices are
> present, or... (any other solution you can think of)?
> >=20
> > We have iscsi targets for drives in our array, and we
> make sure that=20
> > we've logged into all 30 of our drives before we
> continue to enable=20
> > mdadm (we literally count the number of iscsi sessions
> open). You can=20
> > try counting the number of block devices present (in
> /dev/block) that=20
> > match a certain pattern, or perhaps your fiber channel
> driver offers an=20
> > even more convenient facility in /dev.
> >=20
> > However, it would be great if there really was a way
> to tell mdadm to=20
> > wait until the devices are ready. I'm not aware of one
> though.
> >=20
>=20
> Time to go back and read the mdadm man page.=A0 From top
> to bottom.=A0 Twice.
>=20
> I suspect that --no-degraded is the flag you want.\
>=20
> It was introduced in mdadm 2.5
>=20
> There are three scenarios that could be relevant.
>=20
> 1/ If an array is being assembled explicitly, e.g.
>   =A0mdadm --assemble /dev/mdX .....
> then mdadm will refuse to assemble the array if any
> expected devices are
> missing.=A0 You need to add "--run" to get it to start
> a partial array.
>=20
> 2/ If an array is being assembled using auto-assembly,
> e.g.
>   =A0mdadm --assemble --scan
> then mdadm will start partial arrays if it cannot find the
> missing parts
> anyway.=A0 You can tell it not to with
> --no-degraded.=A0 This flag is actually a
> misnomer.=A0 It may well assemble a degraded array, but
> only if the array was
> degraded the last time it was active.


This '--no-degraded' option sounds cool. Can you tell it to apply that =
logic on some arrays but not others? Like I have an OS drive that can h=
appily come up as degraded if need be. But I also have a 7 drive data a=
rray that something the cables come adrift on when i am replacing/addin=
g a drive and i'd rather it just not assemble.. so I can go back and ch=
eck.

(sorry to steal the thread; kinda)


>=20
> 3/ If an array is being assembled used a sequence of
> --incremental commands,
> e.g.
>   =A0mdadm --incremental /dev/first
>   =A0mdadm --incremental /dev/second
> =A0 etc
>=20
> then mdadm won't assemble the array until all expected
> devices have been
> found.=A0 Using "--run" will override this so the array
> is assembled as soon
> as enough devices are present.=A0 Once all possible
> devices have been
> presented to mdadm it "mdadm -incremental device" you can
> tell mdadm to
> start any arrays that haven't been started yet with
>   =A0mdadm --incremental --run
>=20
> Hope that clears it up.
>=20
> NeilBrown
>=20



=20
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Preventing a RAID device from starting until all disks are ready

am 18.10.2010 20:00:03 von Iordan Iordanov

Hi Andrew,

My apologies for the late reply, but I've been over-busy at work.

> I've never heard of that - sounds fantastic. Does it have any performance penalties during heavy writes?

There has to be some performance penalty, since for every write on the
array, there is an additional write in the write-intent bitmap. However,
it can be imposed on a device other than your RAID array by keeping the
write-intent bitmap as a file on a separate file-system. In the mdadm
manpage, it is specified that the write-intent bitmap can either be
"internal" - in the MD superblock, or external - in a file. We have kept
it internal for now, since we experience significantly less writes than
reads. However, we are keeping in mind the option to move it off to
another file-system if this changes, or we see our write performance
impacting our users.

> Are you doing the mdadm startup in rc.local, or in the initrd, or...?

We have disabled all of the system startup-scripts, and we have cooked
our own startup script which does each stage of the startup. In our
case, we need the following order of operations:

1) Start networking, and bring up a set of bonded interfaces.
2) Login over iscsi to 30 iscsi target drives.
3) Start mdadm
etc.

Cheers,
Iordan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html