[PATCH] FIX: imsm: OROM does not recognize degraded arrays
[PATCH] FIX: imsm: OROM does not recognize degraded arrays
am 24.02.2011 07:53:57 von krzysztof.wojcik
Defect description:
When we create an redundant array in mdadm and then degrade it
by disk removing, Option ROM and Windows OS does not detect any array.
Reason:
Metadata created and updated after degrading array is not compatible
with IMSM standard.
This patch synchronizes the metadata according IMSM requirements.
Following inconsistencies have been fixed:
- reset all fields in imsm_dev during creation to avoid random values
- init dev status during creation to proper state
- not reset CONFIGURED_DISK flag when disk is missing
- add ":0" suffix to the serial number for missing/failed disks
- update medatada signature after takeover operation
- mark map state as degraded after raid0->raid10 takeover
Signed-off-by: Krzysztof Wojcik
---
super-intel.c | 27 +++++++++++++++++----------
1 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/super-intel.c b/super-intel.c
index c101dca..a40b7e5 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -3437,6 +3437,8 @@ static int init_super_imsm_volume(struct supertype *st, mdu_array_info_t *info,
fprintf(stderr, Name": could not allocate raid device\n");
return 0;
}
+ memset(dev, 0, (sizeof(*dev) + sizeof(__u32) * (info->raid_disks - 1)));
+
strncpy((char *) dev->volume, name, MAX_RAID_SERIAL_LEN);
if (info->level == 1)
array_blocks = info_to_blocks_per_member(info);
@@ -3449,8 +3451,7 @@ static int init_super_imsm_volume(struct supertype *st, mdu_array_info_t *info,
dev->size_low = __cpu_to_le32((__u32) array_blocks);
dev->size_high = __cpu_to_le32((__u32) (array_blocks >> 32));
- dev->status = __cpu_to_le32(0);
- dev->reserved_blocks = __cpu_to_le32(0);
+ dev->status = (DEV_READ_COALESCING | DEV_WRITE_COALESCING);
vol = &dev->vol;
vol->migr_state = 0;
set_migr_type(dev, MIGR_INIT);
@@ -5046,6 +5047,7 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
__u32 ord;
int slot;
struct imsm_map *map;
+ char buf[PATH_MAX];
/* new failures are always set in map[0] */
map = get_imsm_map(dev, 0);
@@ -5057,9 +5059,12 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
ord = __le32_to_cpu(map->disk_ord_tbl[slot]);
if (is_failed(disk) && (ord & IMSM_ORD_REBUILD))
return 0;
+ sprintf(buf, "%s:0", disk->serial);
+ int len = strlen(buf);
+ unsigned int shift = len - MAX_RAID_SERIAL_LEN + 1;
+ strncpy((char *)disk->serial, &buf[shift], MAX_RAID_SERIAL_LEN);
disk->status |= FAILED_DISK;
- disk->status &= ~CONFIGURED_DISK;
set_imsm_ord_tbl_ent(map, slot, idx | IMSM_ORD_REBUILD);
if (map->failed_disk_num == 0xff)
map->failed_disk_num = slot;
@@ -6063,16 +6068,16 @@ static int apply_takeover_update(struct imsm_update_takeover *u,
*space_list = *space;
du = (void *)space;
memcpy(du, super->disks, sizeof(*du));
- du->disk.status = FAILED_DISK;
- du->disk.scsi_id = 0;
+ du->disk.status = CONFIGURED_DISK | FAILED_DISK;
+ du->disk.scsi_id = ~0;
du->fd = -1;
du->minor = 0;
du->major = 0;
du->index = (i * 2) + 1;
sprintf((char *)du->disk.serial,
- " MISSING_%d", du->index);
+ " MISSING_%d:0", du->index);
sprintf((char *)du->serial,
- "MISSING_%d", du->index);
+ "MISSING_%d:0", du->index);
du->next = super->missing;
super->missing = du;
}
@@ -6085,9 +6090,9 @@ static int apply_takeover_update(struct imsm_update_takeover *u,
memcpy(dev_new, dev, sizeof(*dev));
/* update new map */
map = get_imsm_map(dev_new, 0);
- map->failed_disk_num = map->num_members;
+ map->failed_disk_num = 1;
map->num_members = map->num_members * 2;
- map->map_state = IMSM_T_STATE_NORMAL;
+ map->map_state = IMSM_T_STATE_DEGRADED;
map->num_domains = 2;
map->raid_level = 1;
/* replace dev<->dev_new */
@@ -6149,8 +6154,10 @@ static void imsm_process_update(struct supertype *st,
switch (type) {
case update_takeover: {
struct imsm_update_takeover *u = (void *)update->buf;
- if (apply_takeover_update(u, super, &update->space_list))
+ if (apply_takeover_update(u, super, &update->space_list)) {
+ imsm_update_version_info(super);
super->updates_pending++;
+ }
break;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] FIX: imsm: OROM does not recognize degraded arrays
am 24.02.2011 18:42:53 von dan.j.williams
Thanks Krzysztof, comments below:
On Wed, Feb 23, 2011 at 10:53 PM, Krzysztof Wojcik
wrote:
> Defect description:
> When we create an redundant array in mdadm and then degrade it
> by disk removing, Option ROM and Windows OS does not detect any array=
> Reason:
> Metadata created and updated after degrading array is not compatible
> with IMSM standard.
>
> This patch synchronizes the metadata according IMSM requirements.
> Following inconsistencies have been fixed:
> - reset all fields in imsm_dev during creation to avoid random values
> - init dev status during creation to proper state
> - not reset CONFIGURED_DISK flag when disk is missing
> - add ":0" suffix to the serial number for missing/failed disks
> - update medatada signature after takeover operation
> - mark map state as degraded after raid0->raid10 takeover
>
> Signed-off-by: Krzysztof Wojcik
> ---
> =A0super-intel.c | =A0 27 +++++++++++++++++----------
> =A01 files changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/super-intel.c b/super-intel.c
> index c101dca..a40b7e5 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -3437,6 +3437,8 @@ static int init_super_imsm_volume(struct supert=
ype *st, mdu_array_info_t *info,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0fprintf(stderr, Name": could not alloc=
ate raid device\n");
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return 0;
> =A0 =A0 =A0 =A0}
> + =A0 =A0 =A0 memset(dev, 0, (sizeof(*dev) + sizeof(__u32) * (info->r=
aid_disks - 1)));
> +
Consider changing the allocation to calloc, or store the size in a
temporary variable so we don't duplicate the size calculation.
> =A0 =A0 =A0 =A0strncpy((char *) dev->volume, name, MAX_RAID_SERIAL_LE=
N);
> =A0 =A0 =A0 =A0if (info->level == 1)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0array_blocks =3D info_to_blocks_per_me=
mber(info);
> @@ -3449,8 +3451,7 @@ static int init_super_imsm_volume(struct supert=
ype *st, mdu_array_info_t *info,
>
> =A0 =A0 =A0 =A0dev->size_low =3D __cpu_to_le32((__u32) array_blocks);
> =A0 =A0 =A0 =A0dev->size_high =3D __cpu_to_le32((__u32) (array_blocks=
>> 32));
> - =A0 =A0 =A0 dev->status =3D __cpu_to_le32(0);
> - =A0 =A0 =A0 dev->reserved_blocks =3D __cpu_to_le32(0);
> + =A0 =A0 =A0 dev->status =3D (DEV_READ_COALESCING | DEV_WRITE_COALES=
CING);
The orom will fail to assemble arrays without these status bits set?
> =A0 =A0 =A0 =A0vol =3D &dev->vol;
> =A0 =A0 =A0 =A0vol->migr_state =3D 0;
> =A0 =A0 =A0 =A0set_migr_type(dev, MIGR_INIT);
> @@ -5046,6 +5047,7 @@ static int mark_failure(struct imsm_dev *dev, s=
truct imsm_disk *disk, int idx)
> =A0 =A0 =A0 =A0__u32 ord;
> =A0 =A0 =A0 =A0int slot;
> =A0 =A0 =A0 =A0struct imsm_map *map;
> + =A0 =A0 =A0 char buf[PATH_MAX];
This is a very large stack allocation which is potentially problematic
for mdmon.
>
> =A0 =A0 =A0 =A0/* new failures are always set in map[0] */
> =A0 =A0 =A0 =A0map =3D get_imsm_map(dev, 0);
> @@ -5057,9 +5059,12 @@ static int mark_failure(struct imsm_dev *dev, =
struct imsm_disk *disk, int idx)
> =A0 =A0 =A0 =A0ord =3D __le32_to_cpu(map->disk_ord_tbl[slot]);
> =A0 =A0 =A0 =A0if (is_failed(disk) && (ord & IMSM_ORD_REBUILD))
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return 0;
> + =A0 =A0 =A0 sprintf(buf, "%s:0", disk->serial);
> + =A0 =A0 =A0 int len =3D strlen(buf);
> + =A0 =A0 =A0 unsigned int shift =3D len - MAX_RAID_SERIAL_LEN + 1;
> + =A0 =A0 =A0 strncpy((char *)disk->serial, &buf[shift], MAX_RAID_SER=
IAL_LEN);
Other option-roms truncated the serial number just to make sure it did
not match (see mark_missing()), are you sure the ":0" matters? And if
we are already tolerant of chopping characters why not just
append/overwrite the end of the serial number.
> =A0 =A0 =A0 =A0disk->status |=3D FAILED_DISK;
> - =A0 =A0 =A0 disk->status &=3D ~CONFIGURED_DISK;
The Windows driver was observed making this modification, but maybe
only in the "missing" case. Have you verified that this works in the
failed but still present case (versus the device gone case).
> =A0 =A0 =A0 =A0set_imsm_ord_tbl_ent(map, slot, idx | IMSM_ORD_REBUILD=
);
> =A0 =A0 =A0 =A0if (map->failed_disk_num == 0xff)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0map->failed_disk_num =3D slot;
> @@ -6063,16 +6068,16 @@ static int apply_takeover_update(struct imsm_=
update_takeover *u,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*space_list =3D *space=
;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0du =3D (void *)space;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0memcpy(du, super->disk=
s, sizeof(*du));
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 du->disk.status =3D FAI=
LED_DISK;
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 du->disk.scsi_id =3D 0;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 du->disk.status =3D CON=
=46IGURED_DISK | FAILED_DISK;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 du->disk.scsi_id =3D ~0=
;
Looks like the code has grown some open coded duplications of what
mark_failure() is doing. Let's unify this. Keep subtle compatibility
logic confined to one location.
--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] FIX: imsm: OROM does not recognize degraded arrays
am 25.02.2011 12:08:00 von krzysztof.wojcik
> -----Original Message-----
> From: dan.j.williams@gmail.com [mailto:dan.j.williams@gmail.com] On
> Behalf Of Dan Williams
> Sent: Thursday, February 24, 2011 6:43 PM
> To: Wojcik, Krzysztof
> Cc: neilb@suse.de; linux-raid@vger.kernel.org; Neubauer, Wojciech;
> Kwolek, Adam; Ciechanowski, Ed
> Subject: Re: [PATCH] FIX: imsm: OROM does not recognize degraded arra=
ys
>=20
> Thanks Krzysztof, comments below:
>=20
> On Wed, Feb 23, 2011 at 10:53 PM, Krzysztof Wojcik
> wrote:
> > Defect description:
> > When we create an redundant array in mdadm and then degrade it
> > by disk removing, Option ROM and Windows OS does not detect any
> array.
> > Reason:
> > Metadata created and updated after degrading array is not compatibl=
e
> > with IMSM standard.
> >
> > This patch synchronizes the metadata according IMSM requirements.
> > Following inconsistencies have been fixed:
> > - reset all fields in imsm_dev during creation to avoid random valu=
es
> > - init dev status during creation to proper state
> > - not reset CONFIGURED_DISK flag when disk is missing
> > - add ":0" suffix to the serial number for missing/failed disks
> > - update medatada signature after takeover operation
> > - mark map state as degraded after raid0->raid10 takeover
> >
> > Signed-off-by: Krzysztof Wojcik
> > ---
> > =A0super-intel.c | =A0 27 +++++++++++++++++----------
> > =A01 files changed, 17 insertions(+), 10 deletions(-)
> >
> > diff --git a/super-intel.c b/super-intel.c
> > index c101dca..a40b7e5 100644
> > --- a/super-intel.c
> > +++ b/super-intel.c
> > @@ -3437,6 +3437,8 @@ static int init_super_imsm_volume(struct
> supertype *st, mdu_array_info_t *info,
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0fprintf(stderr, Name": could not all=
ocate raid
> device\n");
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return 0;
> > =A0 =A0 =A0 =A0}
> > + =A0 =A0 =A0 memset(dev, 0, (sizeof(*dev) + sizeof(__u32) * (info-
> >raid_disks - 1)));
> > +
>=20
> Consider changing the allocation to calloc, or store the size in a
> temporary variable so we don't duplicate the size calculation.
OK. Changed
>=20
> > =A0 =A0 =A0 =A0strncpy((char *) dev->volume, name, MAX_RAID_SERIAL_=
LEN);
> > =A0 =A0 =A0 =A0if (info->level == 1)
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0array_blocks =3D info_to_blocks_per_=
member(info);
> > @@ -3449,8 +3451,7 @@ static int init_super_imsm_volume(struct
> supertype *st, mdu_array_info_t *info,
> >
> > =A0 =A0 =A0 =A0dev->size_low =3D __cpu_to_le32((__u32) array_blocks=
);
> > =A0 =A0 =A0 =A0dev->size_high =3D __cpu_to_le32((__u32) (array_bloc=
ks >> 32));
> > - =A0 =A0 =A0 dev->status =3D __cpu_to_le32(0);
> > - =A0 =A0 =A0 dev->reserved_blocks =3D __cpu_to_le32(0);
> > + =A0 =A0 =A0 dev->status =3D (DEV_READ_COALESCING | DEV_WRITE_COAL=
ESCING);
>=20
> The orom will fail to assemble arrays without these status bits set?
These bits are not the direct cause that OROM not assemble the array bu=
t it is not compatible with metadata created under Windows and OROM and=
may cause problems in other scenarios.
>=20
> > =A0 =A0 =A0 =A0vol =3D &dev->vol;
> > =A0 =A0 =A0 =A0vol->migr_state =3D 0;
> > =A0 =A0 =A0 =A0set_migr_type(dev, MIGR_INIT);
> > @@ -5046,6 +5047,7 @@ static int mark_failure(struct imsm_dev *dev,
> struct imsm_disk *disk, int idx)
> > =A0 =A0 =A0 =A0__u32 ord;
> > =A0 =A0 =A0 =A0int slot;
> > =A0 =A0 =A0 =A0struct imsm_map *map;
> > + =A0 =A0 =A0 char buf[PATH_MAX];
>=20
> This is a very large stack allocation which is potentially problemati=
c
> for mdmon.
OK. Changed.
>=20
> >
> > =A0 =A0 =A0 =A0/* new failures are always set in map[0] */
> > =A0 =A0 =A0 =A0map =3D get_imsm_map(dev, 0);
> > @@ -5057,9 +5059,12 @@ static int mark_failure(struct imsm_dev *dev=
,
> struct imsm_disk *disk, int idx)
> > =A0 =A0 =A0 =A0ord =3D __le32_to_cpu(map->disk_ord_tbl[slot]);
> > =A0 =A0 =A0 =A0if (is_failed(disk) && (ord & IMSM_ORD_REBUILD))
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return 0;
> > + =A0 =A0 =A0 sprintf(buf, "%s:0", disk->serial);
> > + =A0 =A0 =A0 int len =3D strlen(buf);
> > + =A0 =A0 =A0 unsigned int shift =3D len - MAX_RAID_SERIAL_LEN + 1;
> > + =A0 =A0 =A0 strncpy((char *)disk->serial, &buf[shift],
> MAX_RAID_SERIAL_LEN);
>=20
> Other option-roms truncated the serial number just to make sure it di=
d
> not match (see mark_missing()), are you sure the ":0" matters? And i=
f
> we are already tolerant of chopping characters why not just
> append/overwrite the end of the serial number.
I adapted the method to the one used by OROM- append ":0" at the end of=
serial number and cut at the beginning if size > MAX_RAID_SERIAL_LEN.
>=20
> > =A0 =A0 =A0 =A0disk->status |=3D FAILED_DISK;
> > - =A0 =A0 =A0 disk->status &=3D ~CONFIGURED_DISK;
>=20
> The Windows driver was observed making this modification, but maybe
> only in the "missing" case. Have you verified that this works in the
> failed but still present case (versus the device gone case).
This was verified under OROM and Windows.
If disk has been configured and become member of an array CONFIGURED_DI=
SK bit is set and it is not reset even though disk is missing or failed=
>=20
> > =A0 =A0 =A0 =A0set_imsm_ord_tbl_ent(map, slot, idx | IMSM_ORD_REBUI=
LD);
> > =A0 =A0 =A0 =A0if (map->failed_disk_num == 0xff)
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0map->failed_disk_num =3D slot;
> > @@ -6063,16 +6068,16 @@ static int apply_takeover_update(struct
> imsm_update_takeover *u,
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*space_list =3D *spa=
ce;
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0du =3D (void *)space=
;
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0memcpy(du, super->di=
sks, sizeof(*du));
> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 du->disk.status =3D F=
AILED_DISK;
> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 du->disk.scsi_id =3D =
0;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 du->disk.status =3D C=
ONFIGURED_DISK |
> FAILED_DISK;
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 du->disk.scsi_id =3D =
~0;
>=20
> Looks like the code has grown some open coded duplications of what
> mark_failure() is doing. Let's unify this. Keep subtle compatibilit=
y
> logic confined to one location.
OK. Changed
>=20
> --
> Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html