Re: Software RAID and Fakeraid

Re: Software RAID and Fakeraid

am 30.11.2010 23:25:08 von NeilBrown

On Tue, 30 Nov 2010 14:54:40 -0500 Phillip Susi wrote:

> On 11/25/2010 5:26 AM, John Sheu wrote:
> > What's the preferred way to differentiate BIOS fakeraid from regular
> > software mdraid?
>
> The only way I know of is detecting that it is a dmraid device as
> opposed to md, which is why grub does it that way. This worked well in
> the past when each tool exclusively handled one type of raid.
>
> > I ask this as I'm booting with GRUB2 off a system that has one of those
> > Intel fakeraid chipsets. As of a few months ago, the mdadm package has
> > supported these fakeraid setups, so the RAID array comes up as a /dev/md###
> > device. This is unfortunate, as GRUB2 assumes that any device of the type
> > /dev/md### must be a pure software RAID device, and in
> > util/grub-setup.c:939, tries to install itself to the RAID members
> > individually:
>
> For grub to support fakeraids activated by the md driver, it needs some
> way to find out that it is actually a fake raid, and not a software
> raid. Adding linux-raid to Cc list to see if they can suggest a way of
> doing that.

My feeling is that grub just needs to be a bit more careful.

If the members of the md array are partitions, then installing itself in the
boot blocks of the devices holding those partitions always makes sense.

If the members of the md array are whole devices, then installing grub in
those devices might make sense depending on specific details of the
metadata. The default should be that it doesn't make sense, but specific
cases do.
e.g. if the metadata (/sys/block/mdX/md/metadata_version) is 0.90 or 1.0, and
the array is RAID1, then grub should install itself in the *array*, not in
the devices.
If the metadata is 1.1, then grub cannot install itself
If the metadata is 1.2, then grub can install itself at the start
If the metadata is external:imsm then (I think) grub should install itself in
the array ... though there are some complexities there.

I often wonder why people who add knowledge of md to grub etc don't at least
let me know what they are doing in case I can see something obviously wrong
with their approach..

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and Fakeraid

am 02.12.2010 23:13:42 von Phillip Susi

On 11/30/2010 5:25 PM, Neil Brown wrote:
> My feeling is that grub just needs to be a bit more careful.
>
> If the members of the md array are partitions, then installing itself in the
> boot blocks of the devices holding those partitions always makes sense.
>
> If the members of the md array are whole devices, then installing grub in
> those devices might make sense depending on specific details of the
> metadata. The default should be that it doesn't make sense, but specific
> cases do.
> e.g. if the metadata (/sys/block/mdX/md/metadata_version) is 0.90 or 1.0, and
> the array is RAID1, then grub should install itself in the *array*, not in
> the devices.

I don't think that is quite right. For software raid, you can't
actually install to the array per se, since the bios does not know about
it; it only knows about the individual disks. Therefore, grub needs to
be installed to the individual disk(s), and preferably on each member of
a raid 1 so you can still boot with a failed disk. To do this, it needs
the embed area to place the core image into, which doesn't exist if the
array uses the whole disk instead of a partition in it.

In the case of fakeraid, the bios does know about it, so grub can and
does install itself into the array, but since this won't work with true
mdadm soft raid using the raw disks, grub needs to be able to tell the
difference. Only seeing the members of the array are raw disks instead
of partitions is not enough information.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and Fakeraid

am 03.12.2010 04:15:55 von Phillip Susi

On 12/02/2010 08:36 PM, Neil Brown wrote:
> If the array uses 0.90 or 1.0 metadata and comprises whole-disks (not
> partitions), and if the array is RAID1, then each device (except for the very
> end) contains exactly the same data as the whole array.
> If you install grub to the array, then it will be installed onto all of the
> (active) devices in the array. And that is certainly the easiest way to
> write to all device.
>
> It won't write to 'spares', so if you want to be able to boot from spares as
> well .... but I'm not sure that makes sense anyway.

Yes, for a raid1 with no spares, installing to the array is equivalent
to installing to each individual disk, but it helps avoid confusion to
ignore this fact and remain thinking in terms of the physical disks, at
least as they appear to the bios.

> Completely agree. As I said, there are only some cases where you can boot
> from an array which uses whole-disks.
> One case if in the bios understands the array, such as Intel bios's with IMSM
> metadata, or possibly some bioses with DDF metadata.
> Another case is RAID1 which starts at the beginning of the device, where the
> bios doesn't need to know about the RAID.

So how do we tell the difference? Right now grub uses the rule of
dmraid = bios aware, so install to the raid device, and mdadm = software
raid, so install to the component devices individually. You have noted
that in same cases both methods will produce the same results, but grub
needs to be certain that whichever method it chooses will work, whether
or not either one will. To do this, it needs to install to the raid
device if and only if it is a bios recognized fakeraid.

Re: Software RAID and Fakeraid

am 02.02.2011 04:22:33 von NeilBrown

On Tue, 01 Feb 2011 19:08:09 -0500 Phillip Susi wrote:

> On 02/01/2011 11:26 AM, Lennart Sorensen wrote:
> > If the raid stores the raid info at the end, then the data starts at
> > sector 0. So no space for a bootloader at all.
>
> I know that is how it works with 0.9, but are you sure it is for 1.0?
> If so, then for anything but raid-1 we will just have to try to install
> only to the first device if it has an MBR.
>
> > Certainly makes sense. Now is 4K enough for a boot loader? Not sure.
>
> It is enough for the MBR. The core image will need to go elsewhere,
> hence the proposal to ask mdadm for a suitable location.
>
> > I personally consider soft raid on raw devices so convluted that I
> > have never done it. I would rather have something I know works with my
> > bootloader and other tools, than gain that extra 1MB (at most) that not
> > having partitions gives. Also given many PCs won't boot from a drive
> > without a partition table, it isn't even an option then.
>
> That is why I like the idea of 1.2 since you could still have a bootable
> MBR when using the whole disk. Though now that you mention it, I can't
> think of a good reason to use the whole disk instead of a partition either.


It seems to me that a case analysis would be useful here.
Assuming that the area of interest is loading the grub core image when
/boot (or '/') is on an md device,

0 If the md device is comprised entirely of partitions, then it is
not involved in loading the core image at all

otherwise:

1 The md device could be addressable directly by the bios. This applies
to a RAID1 which starts at the start of the devices, or any RAID level
which is explicitly understood by BIOS or an option ROM (such as Intel
IMSM)

2 The md device could leave the first block, and some other section of
each device unused. These can be used to store the boot block and
the core image.
This applies to 1.2 metadata stored on whole devices. It could apply to
1.0 (as the start address is configurable) but doesn't in practise. The
main reason to choose 1.0 is to have the array aligned with the start of
the device.

3 The md device does not permit booting. This applies to 1.1 metadata
and various other combinations other than those identified above.


There is a difficulty in case 2 as it is not clear who's responsibility it is
to write a partition table at the start of each device.
Presumably GRUB doesn't like to write partition tables unless one already
exists.
Currently mdadm doesn't write a partition table either. Possibly it could,
but I would rather avoid that if possible.
Maybe once case 2 has been clearly identified, GRUB could consider that
sufficient permission to write a boot block and partition table even if no
partition table existed??

I imagine that the best way to distinguish between the cases would be to have
mdadm --detail --export /dev/mdXXX

report something appropriate. Maybe a setting for "MD_BOOTABLE"
e.g.

MD_BOOTABLE=partitions # case 0 - the array is comprised entirely of
# partitions
MD_BOOTABLE=BIOS # mdadm believes that the bios can an will read the
# the array directly - i.e. case 1
MD_BOOTABLE=reserved # Space at the start of each device is reserved
# for storing boot information. In particular the
# first block (4K) is reserved plus some more.
MD_BOOTABLE=no # mdadm does not believe it is possible to boot
# from this array

In the 'reserved' case, mdadm would also report where the space is. e.g.

MD_BOOT_SPACE="/dev/sda 8192 32768"

means that from byte offset 8192 there is 32768 bytes of available space.
I would need to make sure that mdadm kept that space available, so I would
need to know how much to reserve. Maybe 32K. Maybe 1M is safe?

However there is another complication.
I understand that the boot block sometimes lives at the start of the
partition instead of (or as well as) the start of the device.
I'm fairly syslinux does this - I don't know about GRUB.
So I really want to still report BIOS or 'reserved' or 'no' even when
partitions are in use.

So maybe I should scrap case 0 (MD_BOOTABLE=partitions), assume that the
boot-loader configurer can detect and understand partitions itself, and just
report the other 3 cases ignoring the details about partitions.

Would that be helpful? Would it get used? How could it be better?

Thanks,
NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and Fakeraid

am 02.02.2011 16:34:09 von Phillip Susi

On 2/1/2011 10:22 PM, NeilBrown wrote:
> There is a difficulty in case 2 as it is not clear who's responsibility it is
> to write a partition table at the start of each device.
> Presumably GRUB doesn't like to write partition tables unless one already
> exists.

Yes, it preserves the existing partition table and just modifies the
boot loader code in the MBR.

> Currently mdadm doesn't write a partition table either. Possibly it could,
> but I would rather avoid that if possible.
> Maybe once case 2 has been clearly identified, GRUB could consider that
> sufficient permission to write a boot block and partition table even if no
> partition table existed??

Possibly, but that is the kind of thing I think should require an
explicit request. Whether it is done by mdadm or not, I think that
someone should write a protective mbr. Since mdadm is what is
effectively formatting the disk, it makes the most sense to me to do it
there, rather than in grub, which is just trying to install a boot
loader to an existing disk. I suppose the OS installer could add the
MBR before asking mdadm to add its superblocks too. What partition type
would be appropriate to use?

> In the 'reserved' case, mdadm would also report where the space is. e.g.
>
> MD_BOOT_SPACE="/dev/sda 8192 32768"
>
> means that from byte offset 8192 there is 32768 bytes of available space.
> I would need to make sure that mdadm kept that space available, so I would
> need to know how much to reserve. Maybe 32K. Maybe 1M is safe?

Sounds good. Grub is used to operating with 32k or less since that is
the historical amount of free space following the MBR.

> However there is another complication.
> I understand that the boot block sometimes lives at the start of the
> partition instead of (or as well as) the start of the device.
> I'm fairly syslinux does this - I don't know about GRUB.
> So I really want to still report BIOS or 'reserved' or 'no' even when
> partitions are in use.

Grub can be installed to the partition boot block, but it is strongly
discouraged since there is no gap to embed the core into, so the boot
block must use block lists to locate it. This comes with all kinds of
headaches, including not being possible at all on some filesystems, and
frequently breaking on others. Either way you still have to have boot
code in the MBR to go load the partition boot block, so I don't think
that changes anything with respect to this discussion.

> So maybe I should scrap case 0 (MD_BOOTABLE=partitions), assume that the
> boot-loader configurer can detect and understand partitions itself, and just
> report the other 3 cases ignoring the details about partitions.
>
> Would that be helpful? Would it get used? How could it be better?

Yes, it sounds quite helpful.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Software RAID and Fakeraid

am 02.02.2011 17:09:13 von hansBKK

On Tue, Feb 1, 2011 at 11:26 PM, Lennart Sorensen
wrote:
> I personally consider soft raid on raw devices so convluted that I
> have never done it. =A0I would rather have something I know works wit=
h my
> bootloader and other tools, than gain that extra 1MB (at most) that n=
ot
> having partitions gives. =A0Also given many PCs won't boot from a dri=
ve
> without a partition table, it isn't even an option then.

=46or others googling this later, another very good reason for *never*
RAID'ing raw block devices (ie always creating at least one partition
first) is that if you ever mistakenly boot into some flavors of
Windows (even from some optical discs, perhaps unknowingly left ina
drive), your disks will automatically get "helpfully" initialized, as
windoze thinks it's a brand new empty drive being offered up like a
virgin for sacrifice - **poof** there goes all your data.

Speaking from experience 8-(
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: Software RAID and Fakeraid

am 02.02.2011 22:12:17 von Leslie Rhorer

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of hansbkk@gmail.com
> Sent: Wednesday, February 02, 2011 10:09 AM
> To: linux-raid@vger.kernel.org
> Cc: grub-devel@gnu.org
> Subject: Re: Software RAID and Fakeraid
>=20
> On Tue, Feb 1, 2011 at 11:26 PM, Lennart Sorensen
> wrote:
> > I personally consider soft raid on raw devices so convluted that I
> > have never done it.

Convoluted? It simply removes an unnecessary layer of management.
If anything, it is less convoluted, and one needn't worry about partiti=
on
types and their associated limitations. I currently have 24 un-partiti=
oned
drives in my two RAID systems.

> > I would rather have something I know works with my
> > bootloader and other tools, than gain that extra 1MB (at most) that=
not

Modern bootloaders don't require partitions, either, IIRC, although
in fact I do partition my boot drives.

> > having partitions gives. =A0Also given many PCs won't boot from a d=
rive
> > without a partition table, it isn't even an option then.

Do you have an example? I'm not aware of any. In any case, as I
said, I do recommend partitioning the boot drive, at a minimum into a r=
oot
and a swap partition. I also like to keep the /boot target separate, b=
ut
then /boot is tiny.
=20
> For others googling this later, another very good reason for *never*
> RAID'ing raw block devices (ie always creating at least one partition
> first) is that if you ever mistakenly boot into some flavors of
> Windows (even from some optical discs, perhaps unknowingly left ina
> drive), your disks will automatically get "helpfully" initialized, as
> windoze thinks it's a brand new empty drive being offered up like a
> virgin for sacrifice - **poof** there goes all your data.
>=20
> Speaking from experience 8-(

This is only true with very old versions of Windows. What's more,
the data certainly can be recovered. None of my data drives are
partitioned. There's really no point.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html