Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "biotoo big device md0 (248

Re: Bug#624343: linux-image-2.6.38-2-amd64: frequent message "biotoo big device md0 (248 > 2

am 02.05.2011 02:00:57 von Ben Hutchings

--=-AR7Pgh1jHOjltjppy/V6
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
> On Fri, 29 Apr 2011 05:39:40 +0100, Ben Hutchings w=
rote:
> > On Wed, 2011-04-27 at 09:19 -0700, Jameson Graef Rollins wrote:
> > > I run what I imagine is a fairly unusual disk setup on my laptop,
> > > consisting of:
> > >=20
> > > ssd -> raid1 -> dm-crypt -> lvm -> ext4
> > >=20
> > > I use the raid1 as a backup. The raid1 operates normally in degraded
> > > mode. For backups I then hot-add a usb hdd, let the raid1 sync, and
> > > then fail/remove the external hdd.=20
> >=20
> > Well, this is not expected to work. Possibly the hot-addition of a dis=
k
> > with different bio restrictions should be rejected. But I'm not sure,
> > because it is safe to do that if there is no mounted filesystem or
> > stacking device on top of the RAID.
>=20
> Hi, Ben. Can you explain why this is not expected to work? Which part
> exactly is not expected to work and why?

Adding another type of disk controller (USB storage versus whatever the
SSD interface is) to a RAID that is already in use.

> > I would recommend using filesystem-level backup (e.g. dirvish or
> > backuppc). Aside from this bug, if the SSD fails during a RAID resync
> > you will be left with an inconsistent and therefore useless 'backup'.
>=20
> I appreciate your recommendation, but it doesn't really have anything to
> do with this bug report. Unless I am doing something that is
> *expressly* not supposed to work, then it should work, and if it doesn't
> then it's either a bug or a documentation failure (ie. if this setup is
> not supposed to work then it should be clearly documented somewhere what
> exactly the problem is).

The normal state of a RAID set is that all disks are online. You have
deliberately turned this on its head; the normal state of your RAID set
is that one disk is missing. This is such a basic principle that most
documentation won't mention it.

> > The block layer correctly returns an error after logging this message.
> > If it's due to a read operation, the error should be propagated up to
> > the application that tried to read. If it's due to a write operation, =
I
> > would expect the error to result in the RAID becoming desynchronised.
> > In some cases it might be propagated to the application that tried to
> > write.
>=20
> Can you say what is "correct" about the returned error? That's what I'm
> still not understanding. Why is there an error and what is it coming
> from?

The error is that you changed the I/O capabilities of the RAID while it
was already in use. But what I was describing as 'correct' was that an
error code was returned, rather than the error condition only being
logged. If the error condition is not properly propagated then it could
lead to data loss.

Ben.

--=20
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

--=-AR7Pgh1jHOjltjppy/V6
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQIVAwUATb30MOe/yOyVhhEJAQpZKA//aM7gxJ5TeTNUu+l5m40gP15CSvAo aRVU
M5UKLmDAtmNUddyfUDgGi/e2dno1lfz/JEQOrs0X3aG7gXP0RBN2ESGZ5an3 JbpD
X6CudIrHd6imRahO67y1iWswPBC4VxPfksfr2DwlXsGRXBU3L8ZPWDTpg4/v c0py
htWsbkd1mKJuIucNMUv6j7oTx27Qgx9kqKYeJj0XH8MjyZFKU3fvX2xeCZ8i XAlB
dSLHI3PKmlor4ghizkKsQqptQ61YM/BSdu/Ag+RRFDS+vt9T8tXCseP0bnkz x61v
qfRPRDDarwglfm1KLmYMp+DxNzdkz51N7/+XrC4/DX765JwW4WIpZaL/hEFP 4p/Z
ucAo/sjMDykPv2dx+50wCjb3wqp8lS2zqXj3UkxBL7TUKa+12jdtFN7IKE+u p9/w
iJ2qCqWMMJJNS8mvNmA+px5n+uqF4hDFGa3UPTAD8MitZKa0o489sdk0THYH 11JI
QEttpH94bDXgI1ZxnqhR9z6/uEPXiv/6lnMNvsPsbdiiQlLj9UztPQgdlSh+ OmC6
DfTTG2gwcV8ZeWrF47sjp7WszMDcaLV7cTPzlrEuwm2q8vkkicGQtGKjlOEy 2Q9q
8zWJCndQYcyFJ5cSCfeD/rJpSMGC0CMOrNAaIJ9zl1/IK2MtKV2If/79oQBZ Yurc
i5VZm1rjwc0=
=ex8i
-----END PGP SIGNATURE-----

--=-AR7Pgh1jHOjltjppy/V6--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Bug#624343: linux-image-2.6.38-2-amd64: frequent message "bio too big device md0 (248 > 240)

am 02.05.2011 02:42:52 von Daniel Kahn Gillmor

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigA79D39960901EC41C4C12DBC
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 05/01/2011 08:00 PM, Ben Hutchings wrote:
> On Sun, 2011-05-01 at 15:06 -0700, Jameson Graef Rollins wrote:
>> Hi, Ben. Can you explain why this is not expected to work? Which par=
t
>> exactly is not expected to work and why?
>=20
> Adding another type of disk controller (USB storage versus whatever the=

> SSD interface is) to a RAID that is already in use.
>=20
[...]
> The normal state of a RAID set is that all disks are online. You have
> deliberately turned this on its head; the normal state of your RAID set=

> is that one disk is missing. This is such a basic principle that most
> documentation won't mention it.

This is somewhat worrisome to me. Consider a fileserver with
non-hotswap disks. One disk fails in the morning, but the machine is in
production use, and the admin's goals are:

* minimize downtime,
* reboot only during off-hours, and
* minimize the amount of time that the array is spent de-synced.

A responsible admin might reasonably expect to attach a disk via a
well-tested USB or ieee1394 adapter, bring the array back into sync,
announce to the rest of the organization that there will be a scheduled
reboot later in the evening.

Then, at the scheduled reboot, move the disk from the USB/ieee1394
adapter to the direct ATA interface on the machine.

If this sequence of operations is likely (or even possible) to cause
data loss, it should be spelled out in BIG RED LETTERS someplace. I
don't think any of the above steps seem unreasonable, and the set of
goals the admin is attempting to meet are certainly commonplace goals.

> The error is that you changed the I/O capabilities of the RAID while it=

> was already in use. But what I was describing as 'correct' was that an=

> error code was returned, rather than the error condition only being
> logged. If the error condition is not properly propagated then it coul=
d
> lead to data loss.

How is an admin to know which I/O capabilities to check before adding a
device to a RAID array? When is it acceptable to mix I/O capabilities?
Can a RAID array which is not currently being used as a backing store
for a filesystem be assembled of unlike disks? What if it is then
(later) used as a backing store for a filesystem?

One of the advantages people tout for in-kernel software raid (over many
H/W RAID implementations) is the ability to mix disks, so that you're
not reliant on a single vendor during a failure. If this advantage
doesn't extend across certain classes of disk, it would be good to be
unambiguous about what can be mixed and what cannot.

Regards,

--dkg

--------------enigA79D39960901EC41C4C12DBC
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQJ8BAEBCgBmBQJNvf4MXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25z Lm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQwRUU1QkU5NzkyODJEODBCOUY3NTQw RjFD
Q0QyRUQ5NEQyMTczOUU5AAoJEMzS7ZTSFznp/SgP/0ZtFRPAV5FiAz5RZtID Hd/w
iac1SyLaHFD4udSdHpFCPGxaXlpD6w9ik9bDKqpWGOsi0cbG+4U+89bDE44U GbiY
wEmGRuCjlhEjrb6UoTvc6zfViQxgFvhOtuuzu4cIHUZK8+53ywQMdGS7xOxO 0A1W
1x55+T+axuYVdT0dF0uSvjx0CiaRTlk/mYSSp+BZw4hZ4ZunOH6Qiz8cSLfX 1P4A
yaLobnB7NVsglig/CmgOzYx3+iRraZZFgE/VBz7xoZ9Yy37awhXvR07X+/2G kBoB
8ur3lGjljsC2BGXiXv7LlvpnxaWLm9VzFzEwa/DgmsYiT4TbpEOS0uluPWnu g1i/
lR73Yw+cBlLkCwjEK+TtacwIocP52azsFroUBE0p0/ewhgQLvUfblDj2Lh1z Dskd
8KEiOTAgkePgBngcCTHSxwbuKDZfAdng/JQv7VusUF5515vfSEmZI5vliPpa V2F7
a/NNcZc6FmryAHH/qS849Bm399tnP84g2bWAf7nM4G72wcMpuU28pkDzLq/s jq8C
YaFzQ1rsjmT3HKwnXHO0qCmrmK4QGHWZ90ZsKo3Ff1Fne8WQO4Y0sxoI914+ vOPK
Sjnv6MdTfSnyRuaOZ74N36Lo1vZzf90I3GJ2x+TChchIvkvjBWwaD3m/OaLV 0HWk
O44ra7IsDLpYSncm70+9
=anQn
-----END PGP SIGNATURE-----

--------------enigA79D39960901EC41C4C12DBC--