Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "biotoo big device md0 (248

Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "biotoo big device md0 (248 > 2

am 02.05.2011 02:22:24 von NeilBrown

On Mon, 02 May 2011 01:00:57 +0100 Ben Hutchings wrote:

> On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
> > On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings wrote:
> > > On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
> > > > I run what I imagine is a fairly unusual disk setup on my laptop,
> > > > consisting of:
> > > >
> > > > ssd -> raid1 -> dm-crypt -> lvm -> ext4
> > > >
> > > > I use the raid1 as a backup. The raid1 operates normally in degraded
> > > > mode. For backups I then hot-add a usb hdd, let the raid1 sync, and
> > > > then fail/remove the external hdd.
> > >
> > > Well, this is not expected to work. Possibly the hot-addition of a disk
> > > with different bio restrictions should be rejected. But I'm not sure,
> > > because it is safe to do that if there is no mounted filesystem or
> > > stacking device on top of the RAID.
> >
> > Hi, Ben. Can you explain why this is not expected to work? Which part
> > exactly is not expected to work and why?
>
> Adding another type of disk controller (USB storage versus whatever the
> SSD interface is) to a RAID that is already in use.

Normally this practice is perfectly OK.
If a filesysytem is mounted directly from an md array, then adding devices
to the array at any time is fine, even if the new devices have quite
different characteristics than the old.

However if there is another layer in between md and the filesystem - such as
dm - then there can be problem.
There is no mechanism in the kernl for md to tell dm that things have
changed, so dm never changes its configuration to match any change in the
config of the md device.

A filesystem always queries the config of the device as it prepares the
request. As this is not an 'active' query (i.e. it just looks at
variables, it doesn't call a function) there is no opportunity for dm to then
query md.

There is a ->merge_bvec_fn which could be pushed into service. i.e. if
md/raid1 defined some trivial merge_bvec_fn, then it would probably work.
However the actual effect of this would probably to cause every bio created
by the filesystem to be just one PAGE in size, and this is guaranteed always
to work. So it could be a significant performance hit for the common case.

We really need either:
- The fs sends down arbitrarily large requests, and the lower layers split
them up if/when needed
or
- A mechanism for a block device to tell the layer above that something has
changed.

But these are both fairly intrusive which unclear performance/complexity
implications and no one has bothered.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: Bug#624343: linux-image-2.6.38-2-amd64: frequent message"bio too big device md0 (248 > 2

am 02.05.2011 04:47:55 von Guy Watkins

} -----Original Message-----
} From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
} owner@vger.kernel.org] On Behalf Of NeilBrown
} Sent: Sunday, May 01, 2011 8:22 PM
} To: Ben Hutchings
} Cc: Jameson Graef Rollins; 624343@bugs.debian.org; linux-
} raid@vger.kernel.org
} Subject: Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio
} too big device md0 (248 > 240)" in kern.log
}
} On Mon, 02 May 2011 01:00:57 +0100 Ben Hutchings
} wrote:
}
} > On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
} > > On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings
} wrote:
} > > > On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
} > > > > I run what I imagine is a fairly unusual disk setup on my laptop,
} > > > > consisting of:
} > > > >
} > > > > ssd -> raid1 -> dm-crypt -> lvm -> ext4
} > > > >
} > > > > I use the raid1 as a backup. The raid1 operates normally in
} degraded
} > > > > mode. For backups I then hot-add a usb hdd, let the raid1 sync,
} and
} > > > > then fail/remove the external hdd.
} > > >
} > > > Well, this is not expected to work. Possibly the hot-addition of a
} disk
} > > > with different bio restrictions should be rejected. But I'm not
} sure,
} > > > because it is safe to do that if there is no mounted filesystem or
} > > > stacking device on top of the RAID.
} > >
} > > Hi, Ben. Can you explain why this is not expected to work? Which
} part
} > > exactly is not expected to work and why?
} >
} > Adding another type of disk controller (USB storage versus whatever the
} > SSD interface is) to a RAID that is already in use.
}
} Normally this practice is perfectly OK.
} If a filesysytem is mounted directly from an md array, then adding devices
} to the array at any time is fine, even if the new devices have quite
} different characteristics than the old.
}
} However if there is another layer in between md and the filesystem - such
} as
} dm - then there can be problem.
} There is no mechanism in the kernl for md to tell dm that things have
} changed, so dm never changes its configuration to match any change in the
} config of the md device.
}
} A filesystem always queries the config of the device as it prepares the
} request. As this is not an 'active' query (i.e. it just looks at
} variables, it doesn't call a function) there is no opportunity for dm to
} then
} query md.
}
} There is a ->merge_bvec_fn which could be pushed into service. i.e. if
} md/raid1 defined some trivial merge_bvec_fn, then it would probably work.
} However the actual effect of this would probably to cause every bio
} created
} by the filesystem to be just one PAGE in size, and this is guaranteed
} always
} to work. So it could be a significant performance hit for the common
} case.
}
} We really need either:
} - The fs sends down arbitrarily large requests, and the lower layers
} split
} them up if/when needed
} or
} - A mechanism for a block device to tell the layer above that something
} has
} changed.
}
} But these are both fairly intrusive which unclear performance/complexity
} implications and no one has bothered.
}
} NeilBrown

Maybe mdadm should not allow a disk to be added if its characteristics are
different enough to be an issue? And require the --force option if the
admin really wants to do it anyhow.

Oh, and a good error message explaining the issues and risks. :)

Guy

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "biotoo big device md0 (248 > 2

am 02.05.2011 07:07:20 von Daniel Kahn Gillmor

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig9EC48162400E64D1EF41724D
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 05/01/2011 08:22 PM, NeilBrown wrote:
> However if there is another layer in between md and the filesystem - su=
ch as
> dm - then there can be problem.
> There is no mechanism in the kernl for md to tell dm that things have
> changed, so dm never changes its configuration to match any change in t=
he
> config of the md device.
>=20
> A filesystem always queries the config of the device as it prepares the=

> request. As this is not an 'active' query (i.e. it just looks at
> variables, it doesn't call a function) there is no opportunity for dm t=
o then
> query md.

Thanks for this followup, Neil.

Just to clarify, it sounds like any one of the following situations on
its own is *not* problematic from the kernel's perspective:

0) having a RAID array that is more often in a de-synced state than in
an online state.

1) mixing various types of disk in a single RAID array (e.g. SSD and
spinning metal)

2) mixing various disk access channels within a single RAID array (e.g.
USB and SATA)

3) putting other block device layers (e.g. loopback, dm-crypt, dm (via
lvm or otherwise) above md and below a filesystem

4) hot-adding a device to an active RAID array from which filesystems
are mounted.

However, having any layers between md and the filesystem becomes
problematic if the array is re-synced while the filesystem is online,
because the intermediate layer can't communicate $SOMETHING (what
specifically?) from md to the kernel's filesystem code.

As a workaround, would the following sequence of actions (perhaps
impossible for any given machine's operational state) allow a RAID
re-sync without the errors jrollins reports or requiring a reboot?

a) unmount all filesystems which ultimately derive from the RAID array
b) hot-add the device with mdadm
c) re-mount the filesystems

or would something else need to be done with lvm (or cryptsetup, or the
loopback device) between steps b and c?

Coming at it from another angle: is there a way that an admin can ensure
that the RAID array can be re-synced without unmounting the filesystems
other than limiting themselves to exactly the same models of hardware
for all components in the storage chain?

Alternately, Is there a way to manually inform a given mounted
filesystem that it should change $SOMETHING (what?), so that an aware
admin could keep filesystems online by issuing this instruction before a
raid re-sync?

=46rom a modular-kernel perspective: Is this specifically a problem with
md itself, or would it also be the case with other block-device layering
in the kernel? For example, suppose an admin has (without md) lvm over
a bare disk, and a filesystem mounted from an LV. The admin then adds a
second bare disk as a PV to the VG, and uses pvmove to transfer the
physical extents of the active filesystem to the new disk, while
mounted. Assuming that the new disk doesn't have the same
characteristics (which characteristics?), does the fact that LVM sits
between the underlying disk and the filesystem cause the same problem?
What if dm-crypt sits between the disk and lvm? Between lvm and the
filesystem?

What if the layering is disk-dm-md-fs instead of disk-md-dm-fs ?

Sorry for all the questions without having much concrete to contribute
at the moment. If these limitations are actually well-documented
somewhere, I would be grateful for a pointer. As a systems
administrator, i would be unhappy to be caught out by some
as-yet-unknown constraints during a hardware failure. I'd like to at
least know my constraints beforehand.

Regards,

--dkg

--------------enig9EC48162400E64D1EF41724D
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQJ8BAEBCgBmBQJNvjwIXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25z Lm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQwRUU1QkU5NzkyODJEODBCOUY3NTQw RjFD
Q0QyRUQ5NEQyMTczOUU5AAoJEMzS7ZTSFznp0vQP/0pLcmwKLUHYStUW5idF l6Aq
N1A4piEcIgWZvzs0ddLE+2UGrcwlNZKFahhMONbS3ZoKFJD+GtcBRre38C+L DqVY
aOI2bMO/q6s7e7XJMl14tbNOiKvNN2/lJMZ3uh5pv4Ri13Iar9EHC0cYYCQ+ moAS
aEeVC18uL2AUMdEDw5RkwRrkxiChh84kMm74PpC2FwXkTtO2JIMQLdhJoBxT rzxP
qBX4iCb1xaNFKL51tpAntAuUdQ8aL8T0MQ/358udILUu3jNmkiku2EogrO/8 oulZ
rBmu3bWBvzZYFdPtd7UmjYSzcm/3BzHjtQA7MLpKsBVbjMuzKznYZF7kmq9H MJ8Q
9xtSiDNh+6rTRmz+bJHmNf3xWQm/KuuJDGGG04MCxhHW5jAATLrJM+QC5K3Y 8gT+
mlBMdQA5w9OwzCR49/FxfMGwYIzBwZraMmS0mjUfRmMCBne9v2Qri4afWtD0 LBFd
tduenIHQogUoH3FaWVCYHRf1GqYGAdO90W/4Bu0pNYI4jCKV+cNqfRFkTUVq or/G
OVG3Wf1lWtAru7YL+bYKT9Mp9WhhtQFzY0q+MzRquMaIftds5L0eVFENf5WS SDI+
eoAD33IxS6kCvmyllBfJDlxS3yByA5PI0d3xjZqsdCxhnv/uXuC6CjqRTzE0 YQku
CqWFnknXjs8p/SFuLJ6L
=i41O
-----END PGP SIGNATURE-----

--------------enig9EC48162400E64D1EF41724D--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "biotoo big device md0 (248 > 2

am 02.05.2011 11:08:11 von David Brown

On 02/05/2011 02:22, NeilBrown wrote:
> On Mon, 02 May 2011 01:00:57 +0100 Ben Hutchings wrote:
>
>> On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
>>> On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings wrote:
>>>> On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
>>>>> I run what I imagine is a fairly unusual disk setup on my laptop,
>>>>> consisting of:
>>>>>
>>>>> ssd -> raid1 -> dm-crypt -> lvm -> ext4
>>>>>
>>>>> I use the raid1 as a backup. The raid1 operates normally in degraded
>>>>> mode. For backups I then hot-add a usb hdd, let the raid1 sync, and
>>>>> then fail/remove the external hdd.
>>>>
>>>> Well, this is not expected to work. Possibly the hot-addition of a disk
>>>> with different bio restrictions should be rejected. But I'm not sure,
>>>> because it is safe to do that if there is no mounted filesystem or
>>>> stacking device on top of the RAID.
>>>
>>> Hi, Ben. Can you explain why this is not expected to work? Which part
>>> exactly is not expected to work and why?
>>
>> Adding another type of disk controller (USB storage versus whatever the
>> SSD interface is) to a RAID that is already in use.
>
> Normally this practice is perfectly OK.
> If a filesysytem is mounted directly from an md array, then adding devices
> to the array at any time is fine, even if the new devices have quite
> different characteristics than the old.
>
> However if there is another layer in between md and the filesystem - such as
> dm - then there can be problem.
> There is no mechanism in the kernl for md to tell dm that things have
> changed, so dm never changes its configuration to match any change in the
> config of the md device.
>

While I can see that there might be limitations in informing the dm
layer about changes to the md layer, I fail to see what changes we are
talking about. If the OP were changing the size of the raid1, for
example, then that would be a metadata change that needed to propagate
up so that lvm could grow its physical volume. But the dm layer should
not care if a disk is added or removed from the md raid1 set - as long
as the /dev/mdX device stays online and valid, it should work correctly.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html