Mdadm re-add fails
am 18.05.2011 16:43:47 von Annemarie.Schmidt
Hi!
I have a 2 disk raid1 data array. As a result of other testing, the dev=
ice info
in the superblock for one of the partners, /dev/sdc2, ended up being in=
slot 3
of the device info array:=20
[root@typhon ~]# mdadm --detail /dev/md21
/dev/md21:
Version : 1.2
=A0 Creation Time : Mon May=A0 9 11:19:43 2011
Raid Level : raid1
=A0 Array Size : 5241844 (5.00 GiB 5.37 GB)
=A0 Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
=A0 Raid Devices : 2
=A0 Total Devices : 2
=A0 Persistence : Superblock is persistent
=A0 Intent Bitmap : Internal
=A0 Update Time : Thu May 12 15:51:50 2011
=A0 State : active
=A0 Active Devices : 2
Working Devices : 2
=A0Failed Devices : 0
=A0 Spare Devices : 0
Name : typhon.mno.stratus.com:21=A0 (loc=
al to host typhon.mno.stratus.com)
UUID : 996d993f:baac367a:8b154ba9:43e56c=
ff
Events : 687
=A0 Number Major Minor RaidDevice State
--> =A0 3 =A0 65 34 =
=A0 0 =A0 active sync /dev/sdc2
2 =A0 65 82 =
=A0 1 =A0 active sync /dev/sdk2
When I remove /dev/sdk2 and then a re-add it back in, the re-add fails:
>> [root@typhon ~]# mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2
mdadm: set /dev/sdk2 faulty in /dev/md21
mdadm: hot removed /dev/sdk2 from /dev/md21
>> [root@typhon ~]# mdadm /dev/md21 -a /dev/sdk2
mdadm: /dev/sdk2 reports being an active member for /dev/md21, but a --=
re-add
fails.
mdadm: not performing --add as that would convert /dev/sdk2 in to a spa=
re.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk2" fi=
rst.
I believe the re-add fails because the enough_fd function (util.c) is n=
ot searching deep enough into the
dev_info array with this line of code:
for (i=3D0; i
array.raids_disk =3D 2 and array/nr_disks =3D 1, and so for this partic=
ular md device, it is only looking at slots 0-2.=A0
I believe the code needs to be changed to look at all possible dev_info=
array slots, taking into account the=20
version of the superblock (like the Detail function does (Detail.c).=A0=
=20
Do folks agree?
Thanks & regards,
Annemarie
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Mdadm re-add fails
am 20.05.2011 01:51:33 von NeilBrown
On Wed, 18 May 2011 10:43:47 -0400 "Schmidt, Annemarie"
wrote:
> Hi!
>=20
> I have a 2 disk raid1 data array. As a result of other testing, the d=
evice info
> in the superblock for one of the partners, /dev/sdc2, ended up being =
in slot 3
> of the device info array:=20
>=20
> [root@typhon ~]# mdadm --detail /dev/md21
> /dev/md21:
> Version : 1.2
> =A0 Creation Time : Mon May=A0 9 11:19:43 2011
> Raid Level : raid1
> =A0 Array Size : 5241844 (5.00 GiB 5.37 GB)
> =A0 Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
> =A0 Raid Devices : 2
> =A0 Total Devices : 2
> =A0 Persistence : Superblock is persistent
>=20
> =A0 Intent Bitmap : Internal
>=20
> =A0 Update Time : Thu May 12 15:51:50 2011
> =A0 State : active
> =A0 Active Devices : 2
> Working Devices : 2
> =A0Failed Devices : 0
> =A0 Spare Devices : 0
>=20
> Name : typhon.mno.stratus.com:21=A0 (l=
ocal to host typhon.mno.stratus.com)
> UUID : 996d993f:baac367a:8b154ba9:43e5=
6cff
> Events : 687
>=20
> =A0 Number Major Minor RaidDevice State
> --> =A0 3 =A0 65 34 =
=A0 0 =A0 active sync /dev/sdc2
> 2 =A0 65 82 =A0=
1 =A0 active sync /dev/sdk2
>=20
> When I remove /dev/sdk2 and then a re-add it back in, the re-add fail=
s:
>=20
> >> [root@typhon ~]# mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2
> mdadm: set /dev/sdk2 faulty in /dev/md21
> mdadm: hot removed /dev/sdk2 from /dev/md21
>=20
> >> [root@typhon ~]# mdadm /dev/md21 -a /dev/sdk2
> mdadm: /dev/sdk2 reports being an active member for /dev/md21, but a =
--re-add
> fails.
> mdadm: not performing --add as that would convert /dev/sdk2 in to a s=
pare.
> mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk2" =
first.
>=20
> I believe the re-add fails because the enough_fd function (util.c) is=
not searching deep enough into the
> dev_info array with this line of code:
> for (i=3D0; i
>=20
> array.raids_disk =3D 2 and array/nr_disks =3D 1, and so for this part=
icular md device, it is only looking at slots 0-2.=A0
> I believe the code needs to be changed to look at all possible dev_in=
fo array slots, taking into account the=20
> version of the superblock (like the Detail function does (Detail.c).=A0=
=20
>=20
> Do folks agree?
>
I do - largely. I think there might be a better more general way to co=
ntrol
the loop though.
Could you try this please?
Thanks,
NeilBrown
diff --git a/util.c b/util.c
index 1056ae4..d005e0a 100644
--- a/util.c
+++ b/util.c
@@ -370,10 +370,14 @@ int enough_fd(int fd)
array.raid_disks <=3D 0)
return 0;
avail =3D calloc(array.raid_disks, 1);
- for (i=3D0; i
+ for (i=3D0; i < 1024 && array.raid_disks > 0; i++) {
disk.number =3D i;
if (ioctl(fd, GET_DISK_INFO, &disk) !=3D 0)
continue;
+ if (disk.major == 0 && disk.minor == 0)
+ continue;
+ array.raid_disks--;
+
if (! (disk.state & (1<
continue;
if (disk.raid_disk < 0 || disk.raid_disk >=3D array.raid_disks)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Mdadm re-add fails
am 20.05.2011 19:16:20 von Annemarie.Schmidt
Neil,
Yes, that worked:
>> [root@typhon ~]# mdadm --detail /dev/md24
/dev/md24:
Version : 1.2
Creation Time : Fri May 20 11:42:17 2011
Raid Level : raid1
Array Size : 5241844 (5.00 GiB 5.37 GB)
Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Fri May 20 12:47:09 2011
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Name : typhon.mno.stratus.com:24 (local to host typhon.mno.=
stratus.com)
UUID : 562323d9:9a7b2979:a734abf0:b3fb8f0b
Events : 155
Number Major Minor RaidDevice State
3 65 22 0 active sync /dev/sdc6
2 65 54 1 active sync /dev/sdk6
>> [root@typhon sbin]# mdadm /dev/md24 -f /dev/sdk6 -r /dev/sdk6
mdadm: set /dev/sdk6 faulty in /dev/md24
mdadm: hot removed /dev/sdk6 from /dev/md24
Without the fix:
---------------------
>> root@typhon sbin]# mdadm /dev/md24 -a /dev/sdk6
mdadm: /dev/sdk6 reports being an active member for /dev/md24, but a --=
re-add fails.
mdadm: not performing --add as that would convert /dev/sdk6 in to a spa=
re.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk6" fi=
rst.
With the fix:
-----------------
>> [root@typhon ~]# ./mdadm /dev/md24 -a /dev/sdk6 =
=20
mdadm: re-added /dev/sdk6
Thanks very much for the assistance.
Regards,
Annemarie
-----Original Message-----
=46rom: NeilBrown [mailto:neilb@suse.de]=20
Sent: Thursday, May 19, 2011 7:52 PM
To: Schmidt, Annemarie
Cc: linux-raid@vger.kernel.org
Subject: Re: Mdadm re-add fails
On Wed, 18 May 2011 10:43:47 -0400 "Schmidt, Annemarie"
wrote:
> Hi!
>=20
> I have a 2 disk raid1 data array. As a result of other testing, the d=
evice info
> in the superblock for one of the partners, /dev/sdc2, ended up being =
in slot 3
> of the device info array:=20
>=20
> [root@typhon ~]# mdadm --detail /dev/md21
> /dev/md21:
> Version : 1.2
> =A0 Creation Time : Mon May=A0 9 11:19:43 2011
> Raid Level : raid1
> =A0 Array Size : 5241844 (5.00 GiB 5.37 GB)
> =A0 Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
> =A0 Raid Devices : 2
> =A0 Total Devices : 2
> =A0 Persistence : Superblock is persistent
>=20
> =A0 Intent Bitmap : Internal
>=20
> =A0 Update Time : Thu May 12 15:51:50 2011
> =A0 State : active
> =A0 Active Devices : 2
> Working Devices : 2
> =A0Failed Devices : 0
> =A0 Spare Devices : 0
>=20
> Name : typhon.mno.stratus.com:21=A0 (l=
ocal to host typhon.mno.stratus.com)
> UUID : 996d993f:baac367a:8b154ba9:43e5=
6cff
> Events : 687
>=20
> =A0 Number Major Minor RaidDevice State
> --> =A0 3 =A0 65 34 =
=A0 0 =A0 active sync /dev/sdc2
> 2 =A0 65 82 =A0=
1 =A0 active sync /dev/sdk2
>=20
> When I remove /dev/sdk2 and then a re-add it back in, the re-add fail=
s:
>=20
> >> [root@typhon ~]# mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2
> mdadm: set /dev/sdk2 faulty in /dev/md21
> mdadm: hot removed /dev/sdk2 from /dev/md21
>=20
> >> [root@typhon ~]# mdadm /dev/md21 -a /dev/sdk2
> mdadm: /dev/sdk2 reports being an active member for /dev/md21, but a =
--re-add
> fails.
> mdadm: not performing --add as that would convert /dev/sdk2 in to a s=
pare.
> mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk2" =
first.
>=20
> I believe the re-add fails because the enough_fd function (util.c) is=
not searching deep enough into the
> dev_info array with this line of code:
> for (i=3D0; i
>=20
> array.raids_disk =3D 2 and array/nr_disks =3D 1, and so for this part=
icular md device, it is only looking at slots 0-2.=A0
> I believe the code needs to be changed to look at all possible dev_in=
fo array slots, taking into account the=20
> version of the superblock (like the Detail function does (Detail.c).=A0=
=20
>=20
> Do folks agree?
>
I do - largely. I think there might be a better more general way to co=
ntrol
the loop though.
Could you try this please?
Thanks,
NeilBrown
diff --git a/util.c b/util.c
index 1056ae4..d005e0a 100644
--- a/util.c
+++ b/util.c
@@ -370,10 +370,14 @@ int enough_fd(int fd)
array.raid_disks <=3D 0)
return 0;
avail =3D calloc(array.raid_disks, 1);
- for (i=3D0; i
+ for (i=3D0; i < 1024 && array.raid_disks > 0; i++) {
disk.number =3D i;
if (ioctl(fd, GET_DISK_INFO, &disk) !=3D 0)
continue;
+ if (disk.major == 0 && disk.minor == 0)
+ continue;
+ array.raid_disks--;
+
if (! (disk.state & (1<
continue;
if (disk.raid_disk < 0 || disk.raid_disk >=3D array.raid_disks)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Mdadm re-add fails
am 27.05.2011 23:16:46 von Annemarie.Schmidt
Hi Neil,
I've unfortunately run into a problem with the patch to the enough_fd c=
ode. It does not appear to work in all cases. =20
mdadm --detail /dev/md21
Number Major Minor RaidDevice State
3 65 18 0 active sync /dev/sdc2
2 65 50 1 active sync /dev/sdk2
Here it works when I remove /dev/sdk2
>> mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2
mdadm: set /dev/sdk2 faulty in /dev/md21
mdadm: hot removed /dev/sdk2 from /dev/md21
>> mdadm /dev/md21 -a /dev/sdk2
mdadm: re-added /dev/sdk2
But when I try to remove the other disk, /dev/sdc2, it doesn't:
>> mdadm /dev/md21 -f /dev/sdc2 -r /dev/sdc2
mdadm: set /dev/sdc2 faulty in /dev/md21
mdadm: hot removed /dev/sdc2 from /dev/md21
>> mdadm /dev/md21 -a /dev/sdc2
mdadm: /dev/sdc2 reports being an active member for /dev/md21, but a --=
re-add fails.
mdadm: not performing --add as that would convert /dev/sdc2 in to a spa=
re.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdc2" fi=
rst.
I could get it all to work when I removed this line from the :
+ array.raid_disks--;
>> mdadm_good_patch_minus_dec /dev/md21 -f /dev/sdk2 -r /dev/sdk2
mdadm: set /dev/sdk2 faulty in /dev/md21
mdadm: hot removed /dev/sdk2 from /dev/md21
>> mdadm_good_patch_minus_dec /dev/md21 -a /dev/sdk2
mdadm: re-added /dev/sdk2
>> mdadm_good_patch_minus_dec /dev/md21 -f /dev/sdc2 -r /dev/sdc2
mdadm: set /dev/sdc2 faulty in /dev/md21
mdadm: hot removed /dev/sdc2 from /dev/md21
>> mdadm_good_patch_minus_dec /dev/md21 -a /dev/sdc2
mdadm: re-added /dev/sdc2
So can this line simply be removed or does the patch need to be reworke=
d?
Thanks & regards,
Annemarie Schmidt
-----Original Message-----
=46rom: Schmidt, Annemarie=20
Sent: Friday, May 20, 2011 1:16 PM
To: 'NeilBrown'
Cc: linux-raid@vger.kernel.org; Dailey, Nate
Subject: RE: Mdadm re-add fails
Neil,
Yes, that worked:
>> [root@typhon ~]# mdadm --detail /dev/md24
/dev/md24:
Version : 1.2
Creation Time : Fri May 20 11:42:17 2011
Raid Level : raid1
Array Size : 5241844 (5.00 GiB 5.37 GB)
Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Fri May 20 12:47:09 2011
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Name : typhon.mno.stratus.com:24 (local to host typhon.mno.=
stratus.com)
UUID : 562323d9:9a7b2979:a734abf0:b3fb8f0b
Events : 155
Number Major Minor RaidDevice State
3 65 22 0 active sync /dev/sdc6
2 65 54 1 active sync /dev/sdk6
>> [root@typhon sbin]# mdadm /dev/md24 -f /dev/sdk6 -r /dev/sdk6
mdadm: set /dev/sdk6 faulty in /dev/md24
mdadm: hot removed /dev/sdk6 from /dev/md24
Without the fix:
---------------------
>> root@typhon sbin]# mdadm /dev/md24 -a /dev/sdk6
mdadm: /dev/sdk6 reports being an active member for /dev/md24, but a --=
re-add fails.
mdadm: not performing --add as that would convert /dev/sdk6 in to a spa=
re.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk6" fi=
rst.
With the fix:
-----------------
>> [root@typhon ~]# ./mdadm /dev/md24 -a /dev/sdk6 =
=20
mdadm: re-added /dev/sdk6
Thanks very much for the assistance.
Regards,
Annemarie
-----Original Message-----
=46rom: NeilBrown [mailto:neilb@suse.de]=20
Sent: Thursday, May 19, 2011 7:52 PM
To: Schmidt, Annemarie
Cc: linux-raid@vger.kernel.org
Subject: Re: Mdadm re-add fails
On Wed, 18 May 2011 10:43:47 -0400 "Schmidt, Annemarie"
wrote:
> Hi!
>=20
> I have a 2 disk raid1 data array. As a result of other testing, the d=
evice info
> in the superblock for one of the partners, /dev/sdc2, ended up being =
in slot 3
> of the device info array:=20
>=20
> [root@typhon ~]# mdadm --detail /dev/md21
> /dev/md21:
> Version : 1.2
> =A0 Creation Time : Mon May=A0 9 11:19:43 2011
> Raid Level : raid1
> =A0 Array Size : 5241844 (5.00 GiB 5.37 GB)
> =A0 Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
> =A0 Raid Devices : 2
> =A0 Total Devices : 2
> =A0 Persistence : Superblock is persistent
>=20
> =A0 Intent Bitmap : Internal
>=20
> =A0 Update Time : Thu May 12 15:51:50 2011
> =A0 State : active
> =A0 Active Devices : 2
> Working Devices : 2
> =A0Failed Devices : 0
> =A0 Spare Devices : 0
>=20
> Name : typhon.mno.stratus.com:21=A0 (l=
ocal to host typhon.mno.stratus.com)
> UUID : 996d993f:baac367a:8b154ba9:43e5=
6cff
> Events : 687
>=20
> =A0 Number Major Minor RaidDevice State
> --> =A0 3 =A0 65 34 =
=A0 0 =A0 active sync /dev/sdc2
> 2 =A0 65 82 =A0=
1 =A0 active sync /dev/sdk2
>=20
> When I remove /dev/sdk2 and then a re-add it back in, the re-add fail=
s:
>=20
> >> [root@typhon ~]# mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2
> mdadm: set /dev/sdk2 faulty in /dev/md21
> mdadm: hot removed /dev/sdk2 from /dev/md21
>=20
> >> [root@typhon ~]# mdadm /dev/md21 -a /dev/sdk2
> mdadm: /dev/sdk2 reports being an active member for /dev/md21, but a =
--re-add
> fails.
> mdadm: not performing --add as that would convert /dev/sdk2 in to a s=
pare.
> mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk2" =
first.
>=20
> I believe the re-add fails because the enough_fd function (util.c) is=
not searching deep enough into the
> dev_info array with this line of code:
> for (i=3D0; i
>=20
> array.raids_disk =3D 2 and array/nr_disks =3D 1, and so for this part=
icular md device, it is only looking at slots 0-2.=A0
> I believe the code needs to be changed to look at all possible dev_in=
fo array slots, taking into account the=20
> version of the superblock (like the Detail function does (Detail.c).=A0=
=20
>=20
> Do folks agree?
>
I do - largely. I think there might be a better more general way to co=
ntrol
the loop though.
Could you try this please?
Thanks,
NeilBrown
diff --git a/util.c b/util.c
index 1056ae4..d005e0a 100644
--- a/util.c
+++ b/util.c
@@ -370,10 +370,14 @@ int enough_fd(int fd)
array.raid_disks <=3D 0)
return 0;
avail =3D calloc(array.raid_disks, 1);
- for (i=3D0; i
+ for (i=3D0; i < 1024 && array.raid_disks > 0; i++) {
disk.number =3D i;
if (ioctl(fd, GET_DISK_INFO, &disk) !=3D 0)
continue;
+ if (disk.major == 0 && disk.minor == 0)
+ continue;
+ array.raid_disks--;
+
if (! (disk.state & (1<
continue;
if (disk.raid_disk < 0 || disk.raid_disk >=3D array.raid_disks)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html