Mdadm re-add fails

Mdadm re-add fails

am 18.05.2011 16:43:47 von Annemarie.Schmidt

Hi!

I have a 2 disk raid1 data array. As a result of other testing, the dev=
ice info
in the superblock for one of the partners, /dev/sdc2, ended up being in=
slot 3
of the device info array:=20

[root@typhon ~]# mdadm --detail /dev/md21
/dev/md21:
  Version : 1.2
=A0 Creation Time : Mon May=A0 9 11:19:43 2011
  Raid Level : raid1
=A0 Array Size : 5241844 (5.00 GiB 5.37 GB)
=A0 Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
=A0 Raid Devices : 2
=A0 Total Devices : 2
=A0 Persistence : Superblock is persistent

=A0 Intent Bitmap : Internal

=A0 Update Time : Thu May 12 15:51:50 2011
=A0 State : active
=A0 Active Devices : 2
Working Devices : 2
=A0Failed Devices : 0
=A0 Spare Devices : 0

           Name : typhon.mno.stratus.com:21=A0 (loc=
al to host typhon.mno.stratus.com)
           UUID : 996d993f:baac367a:8b154ba9:43e56c=
ff
         Events : 687

  =A0 Number   Major   Minor   RaidDevice State
-->  =A0 3    =A0 65       34      =
=A0 0    =A0 active sync   /dev/sdc2
       2    =A0 65       82    =
  =A0 1    =A0 active sync   /dev/sdk2

When I remove /dev/sdk2 and then a re-add it back in, the re-add fails:

>> [root@typhon ~]# mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2
mdadm: set /dev/sdk2 faulty in /dev/md21
mdadm: hot removed /dev/sdk2 from /dev/md21

>> [root@typhon ~]# mdadm /dev/md21 -a /dev/sdk2
mdadm: /dev/sdk2 reports being an active member for /dev/md21, but a --=
re-add
fails.
mdadm: not performing --add as that would convert /dev/sdk2 in to a spa=
re.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk2" fi=
rst.

I believe the re-add fails because the enough_fd function (util.c) is n=
ot searching deep enough into the
dev_info array with this line of code:
   for (i=3D0; i
array.raids_disk =3D 2 and array/nr_disks =3D 1, and so for this partic=
ular md device, it is only looking at slots 0-2.=A0
I believe the code needs to be changed to look at all possible dev_info=
array slots, taking into account the=20
version of the superblock (like the Detail function does (Detail.c).=A0=
=20

Do folks agree?

Thanks & regards,
Annemarie

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Mdadm re-add fails

am 20.05.2011 01:51:33 von NeilBrown

On Wed, 18 May 2011 10:43:47 -0400 "Schmidt, Annemarie"
wrote:

> Hi!
>=20
> I have a 2 disk raid1 data array. As a result of other testing, the d=
evice info
> in the superblock for one of the partners, /dev/sdc2, ended up being =
in slot 3
> of the device info array:=20
>=20
> [root@typhon ~]# mdadm --detail /dev/md21
> /dev/md21:
>   Version : 1.2
> =A0 Creation Time : Mon May=A0 9 11:19:43 2011
>   Raid Level : raid1
> =A0 Array Size : 5241844 (5.00 GiB 5.37 GB)
> =A0 Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
> =A0 Raid Devices : 2
> =A0 Total Devices : 2
> =A0 Persistence : Superblock is persistent
>=20
> =A0 Intent Bitmap : Internal
>=20
> =A0 Update Time : Thu May 12 15:51:50 2011
> =A0 State : active
> =A0 Active Devices : 2
> Working Devices : 2
> =A0Failed Devices : 0
> =A0 Spare Devices : 0
>=20
>            Name : typhon.mno.stratus.com:21=A0 (l=
ocal to host typhon.mno.stratus.com)
>            UUID : 996d993f:baac367a:8b154ba9:43e5=
6cff
>          Events : 687
>=20
>   =A0 Number   Major   Minor   RaidDevice State
> -->  =A0 3    =A0 65       34      =
=A0 0    =A0 active sync   /dev/sdc2
>        2    =A0 65       82  =A0=
     1    =A0 active sync   /dev/sdk2
>=20
> When I remove /dev/sdk2 and then a re-add it back in, the re-add fail=
s:
>=20
> >> [root@typhon ~]# mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2
> mdadm: set /dev/sdk2 faulty in /dev/md21
> mdadm: hot removed /dev/sdk2 from /dev/md21
>=20
> >> [root@typhon ~]# mdadm /dev/md21 -a /dev/sdk2
> mdadm: /dev/sdk2 reports being an active member for /dev/md21, but a =
--re-add
> fails.
> mdadm: not performing --add as that would convert /dev/sdk2 in to a s=
pare.
> mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk2" =
first.
>=20
> I believe the re-add fails because the enough_fd function (util.c) is=
not searching deep enough into the
> dev_info array with this line of code:
>    for (i=3D0; i >=20
> array.raids_disk =3D 2 and array/nr_disks =3D 1, and so for this part=
icular md device, it is only looking at slots 0-2.=A0
> I believe the code needs to be changed to look at all possible dev_in=
fo array slots, taking into account the=20
> version of the superblock (like the Detail function does (Detail.c).=A0=
=20
>=20
> Do folks agree?
>

I do - largely. I think there might be a better more general way to co=
ntrol
the loop though.
Could you try this please?

Thanks,
NeilBrown


diff --git a/util.c b/util.c
index 1056ae4..d005e0a 100644
--- a/util.c
+++ b/util.c
@@ -370,10 +370,14 @@ int enough_fd(int fd)
array.raid_disks <=3D 0)
return 0;
avail =3D calloc(array.raid_disks, 1);
- for (i=3D0; i + for (i=3D0; i < 1024 && array.raid_disks > 0; i++) {
disk.number =3D i;
if (ioctl(fd, GET_DISK_INFO, &disk) !=3D 0)
continue;
+ if (disk.major == 0 && disk.minor == 0)
+ continue;
+ array.raid_disks--;
+
if (! (disk.state & (1< continue;
if (disk.raid_disk < 0 || disk.raid_disk >=3D array.raid_disks)


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: Mdadm re-add fails

am 20.05.2011 19:16:20 von Annemarie.Schmidt

Neil,

Yes, that worked:

>> [root@typhon ~]# mdadm --detail /dev/md24
/dev/md24:
Version : 1.2
Creation Time : Fri May 20 11:42:17 2011
Raid Level : raid1
Array Size : 5241844 (5.00 GiB 5.37 GB)
Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Fri May 20 12:47:09 2011
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Name : typhon.mno.stratus.com:24 (local to host typhon.mno.=
stratus.com)
UUID : 562323d9:9a7b2979:a734abf0:b3fb8f0b
Events : 155

Number Major Minor RaidDevice State
3 65 22 0 active sync /dev/sdc6
2 65 54 1 active sync /dev/sdk6

>> [root@typhon sbin]# mdadm /dev/md24 -f /dev/sdk6 -r /dev/sdk6
mdadm: set /dev/sdk6 faulty in /dev/md24
mdadm: hot removed /dev/sdk6 from /dev/md24

Without the fix:
---------------------
>> root@typhon sbin]# mdadm /dev/md24 -a /dev/sdk6
mdadm: /dev/sdk6 reports being an active member for /dev/md24, but a --=
re-add fails.
mdadm: not performing --add as that would convert /dev/sdk6 in to a spa=
re.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk6" fi=
rst.

With the fix:
-----------------
>> [root@typhon ~]# ./mdadm /dev/md24 -a /dev/sdk6 =
=20
mdadm: re-added /dev/sdk6

Thanks very much for the assistance.

Regards,
Annemarie


-----Original Message-----
=46rom: NeilBrown [mailto:neilb@suse.de]=20
Sent: Thursday, May 19, 2011 7:52 PM
To: Schmidt, Annemarie
Cc: linux-raid@vger.kernel.org
Subject: Re: Mdadm re-add fails

On Wed, 18 May 2011 10:43:47 -0400 "Schmidt, Annemarie"
wrote:

> Hi!
>=20
> I have a 2 disk raid1 data array. As a result of other testing, the d=
evice info
> in the superblock for one of the partners, /dev/sdc2, ended up being =
in slot 3
> of the device info array:=20
>=20
> [root@typhon ~]# mdadm --detail /dev/md21
> /dev/md21:
>   Version : 1.2
> =A0 Creation Time : Mon May=A0 9 11:19:43 2011
>   Raid Level : raid1
> =A0 Array Size : 5241844 (5.00 GiB 5.37 GB)
> =A0 Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
> =A0 Raid Devices : 2
> =A0 Total Devices : 2
> =A0 Persistence : Superblock is persistent
>=20
> =A0 Intent Bitmap : Internal
>=20
> =A0 Update Time : Thu May 12 15:51:50 2011
> =A0 State : active
> =A0 Active Devices : 2
> Working Devices : 2
> =A0Failed Devices : 0
> =A0 Spare Devices : 0
>=20
>            Name : typhon.mno.stratus.com:21=A0 (l=
ocal to host typhon.mno.stratus.com)
>            UUID : 996d993f:baac367a:8b154ba9:43e5=
6cff
>          Events : 687
>=20
>   =A0 Number   Major   Minor   RaidDevice State
> -->  =A0 3    =A0 65       34      =
=A0 0    =A0 active sync   /dev/sdc2
>        2    =A0 65       82  =A0=
     1    =A0 active sync   /dev/sdk2
>=20
> When I remove /dev/sdk2 and then a re-add it back in, the re-add fail=
s:
>=20
> >> [root@typhon ~]# mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2
> mdadm: set /dev/sdk2 faulty in /dev/md21
> mdadm: hot removed /dev/sdk2 from /dev/md21
>=20
> >> [root@typhon ~]# mdadm /dev/md21 -a /dev/sdk2
> mdadm: /dev/sdk2 reports being an active member for /dev/md21, but a =
--re-add
> fails.
> mdadm: not performing --add as that would convert /dev/sdk2 in to a s=
pare.
> mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk2" =
first.
>=20
> I believe the re-add fails because the enough_fd function (util.c) is=
not searching deep enough into the
> dev_info array with this line of code:
>    for (i=3D0; i >=20
> array.raids_disk =3D 2 and array/nr_disks =3D 1, and so for this part=
icular md device, it is only looking at slots 0-2.=A0
> I believe the code needs to be changed to look at all possible dev_in=
fo array slots, taking into account the=20
> version of the superblock (like the Detail function does (Detail.c).=A0=
=20
>=20
> Do folks agree?
>

I do - largely. I think there might be a better more general way to co=
ntrol
the loop though.
Could you try this please?

Thanks,
NeilBrown


diff --git a/util.c b/util.c
index 1056ae4..d005e0a 100644
--- a/util.c
+++ b/util.c
@@ -370,10 +370,14 @@ int enough_fd(int fd)
array.raid_disks <=3D 0)
return 0;
avail =3D calloc(array.raid_disks, 1);
- for (i=3D0; i + for (i=3D0; i < 1024 && array.raid_disks > 0; i++) {
disk.number =3D i;
if (ioctl(fd, GET_DISK_INFO, &disk) !=3D 0)
continue;
+ if (disk.major == 0 && disk.minor == 0)
+ continue;
+ array.raid_disks--;
+
if (! (disk.state & (1< continue;
if (disk.raid_disk < 0 || disk.raid_disk >=3D array.raid_disks)


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: Mdadm re-add fails

am 27.05.2011 23:16:46 von Annemarie.Schmidt

Hi Neil,

I've unfortunately run into a problem with the patch to the enough_fd c=
ode. It does not appear to work in all cases. =20

mdadm --detail /dev/md21
Number Major Minor RaidDevice State
3 65 18 0 active sync /dev/sdc2
2 65 50 1 active sync /dev/sdk2


Here it works when I remove /dev/sdk2

>> mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2
mdadm: set /dev/sdk2 faulty in /dev/md21
mdadm: hot removed /dev/sdk2 from /dev/md21

>> mdadm /dev/md21 -a /dev/sdk2
mdadm: re-added /dev/sdk2

But when I try to remove the other disk, /dev/sdc2, it doesn't:

>> mdadm /dev/md21 -f /dev/sdc2 -r /dev/sdc2
mdadm: set /dev/sdc2 faulty in /dev/md21
mdadm: hot removed /dev/sdc2 from /dev/md21

>> mdadm /dev/md21 -a /dev/sdc2
mdadm: /dev/sdc2 reports being an active member for /dev/md21, but a --=
re-add fails.
mdadm: not performing --add as that would convert /dev/sdc2 in to a spa=
re.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdc2" fi=
rst.


I could get it all to work when I removed this line from the :

+ array.raid_disks--;

>> mdadm_good_patch_minus_dec /dev/md21 -f /dev/sdk2 -r /dev/sdk2
mdadm: set /dev/sdk2 faulty in /dev/md21
mdadm: hot removed /dev/sdk2 from /dev/md21

>> mdadm_good_patch_minus_dec /dev/md21 -a /dev/sdk2
mdadm: re-added /dev/sdk2


>> mdadm_good_patch_minus_dec /dev/md21 -f /dev/sdc2 -r /dev/sdc2
mdadm: set /dev/sdc2 faulty in /dev/md21
mdadm: hot removed /dev/sdc2 from /dev/md21

>> mdadm_good_patch_minus_dec /dev/md21 -a /dev/sdc2
mdadm: re-added /dev/sdc2

So can this line simply be removed or does the patch need to be reworke=
d?

Thanks & regards,
Annemarie Schmidt


-----Original Message-----
=46rom: Schmidt, Annemarie=20
Sent: Friday, May 20, 2011 1:16 PM
To: 'NeilBrown'
Cc: linux-raid@vger.kernel.org; Dailey, Nate
Subject: RE: Mdadm re-add fails

Neil,

Yes, that worked:

>> [root@typhon ~]# mdadm --detail /dev/md24
/dev/md24:
Version : 1.2
Creation Time : Fri May 20 11:42:17 2011
Raid Level : raid1
Array Size : 5241844 (5.00 GiB 5.37 GB)
Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Fri May 20 12:47:09 2011
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Name : typhon.mno.stratus.com:24 (local to host typhon.mno.=
stratus.com)
UUID : 562323d9:9a7b2979:a734abf0:b3fb8f0b
Events : 155

Number Major Minor RaidDevice State
3 65 22 0 active sync /dev/sdc6
2 65 54 1 active sync /dev/sdk6

>> [root@typhon sbin]# mdadm /dev/md24 -f /dev/sdk6 -r /dev/sdk6
mdadm: set /dev/sdk6 faulty in /dev/md24
mdadm: hot removed /dev/sdk6 from /dev/md24

Without the fix:
---------------------
>> root@typhon sbin]# mdadm /dev/md24 -a /dev/sdk6
mdadm: /dev/sdk6 reports being an active member for /dev/md24, but a --=
re-add fails.
mdadm: not performing --add as that would convert /dev/sdk6 in to a spa=
re.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk6" fi=
rst.

With the fix:
-----------------
>> [root@typhon ~]# ./mdadm /dev/md24 -a /dev/sdk6 =
=20
mdadm: re-added /dev/sdk6

Thanks very much for the assistance.

Regards,
Annemarie


-----Original Message-----
=46rom: NeilBrown [mailto:neilb@suse.de]=20
Sent: Thursday, May 19, 2011 7:52 PM
To: Schmidt, Annemarie
Cc: linux-raid@vger.kernel.org
Subject: Re: Mdadm re-add fails

On Wed, 18 May 2011 10:43:47 -0400 "Schmidt, Annemarie"
wrote:

> Hi!
>=20
> I have a 2 disk raid1 data array. As a result of other testing, the d=
evice info
> in the superblock for one of the partners, /dev/sdc2, ended up being =
in slot 3
> of the device info array:=20
>=20
> [root@typhon ~]# mdadm --detail /dev/md21
> /dev/md21:
>   Version : 1.2
> =A0 Creation Time : Mon May=A0 9 11:19:43 2011
>   Raid Level : raid1
> =A0 Array Size : 5241844 (5.00 GiB 5.37 GB)
> =A0 Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
> =A0 Raid Devices : 2
> =A0 Total Devices : 2
> =A0 Persistence : Superblock is persistent
>=20
> =A0 Intent Bitmap : Internal
>=20
> =A0 Update Time : Thu May 12 15:51:50 2011
> =A0 State : active
> =A0 Active Devices : 2
> Working Devices : 2
> =A0Failed Devices : 0
> =A0 Spare Devices : 0
>=20
>            Name : typhon.mno.stratus.com:21=A0 (l=
ocal to host typhon.mno.stratus.com)
>            UUID : 996d993f:baac367a:8b154ba9:43e5=
6cff
>          Events : 687
>=20
>   =A0 Number   Major   Minor   RaidDevice State
> -->  =A0 3    =A0 65       34      =
=A0 0    =A0 active sync   /dev/sdc2
>        2    =A0 65       82  =A0=
     1    =A0 active sync   /dev/sdk2
>=20
> When I remove /dev/sdk2 and then a re-add it back in, the re-add fail=
s:
>=20
> >> [root@typhon ~]# mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2
> mdadm: set /dev/sdk2 faulty in /dev/md21
> mdadm: hot removed /dev/sdk2 from /dev/md21
>=20
> >> [root@typhon ~]# mdadm /dev/md21 -a /dev/sdk2
> mdadm: /dev/sdk2 reports being an active member for /dev/md21, but a =
--re-add
> fails.
> mdadm: not performing --add as that would convert /dev/sdk2 in to a s=
pare.
> mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk2" =
first.
>=20
> I believe the re-add fails because the enough_fd function (util.c) is=
not searching deep enough into the
> dev_info array with this line of code:
>    for (i=3D0; i >=20
> array.raids_disk =3D 2 and array/nr_disks =3D 1, and so for this part=
icular md device, it is only looking at slots 0-2.=A0
> I believe the code needs to be changed to look at all possible dev_in=
fo array slots, taking into account the=20
> version of the superblock (like the Detail function does (Detail.c).=A0=
=20
>=20
> Do folks agree?
>

I do - largely. I think there might be a better more general way to co=
ntrol
the loop though.
Could you try this please?

Thanks,
NeilBrown


diff --git a/util.c b/util.c
index 1056ae4..d005e0a 100644
--- a/util.c
+++ b/util.c
@@ -370,10 +370,14 @@ int enough_fd(int fd)
array.raid_disks <=3D 0)
return 0;
avail =3D calloc(array.raid_disks, 1);
- for (i=3D0; i + for (i=3D0; i < 1024 && array.raid_disks > 0; i++) {
disk.number =3D i;
if (ioctl(fd, GET_DISK_INFO, &disk) !=3D 0)
continue;
+ if (disk.major == 0 && disk.minor == 0)
+ continue;
+ array.raid_disks--;
+
if (! (disk.state & (1< continue;
if (disk.raid_disk < 0 || disk.raid_disk >=3D array.raid_disks)


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html