Couldn"t remove rebuilding drive from RAID5 rebuild, now can"t addnew drive to array?

Couldn"t remove rebuilding drive from RAID5 rebuild, now can"t addnew drive to array?

am 11.08.2011 15:48:29 von Another Sillyname

I have a RAID5 array consisting of 4 drives that recently had a problem.

One of the drives 'removed' itself from the array and when I added it
back it started the background rebuilding I expected, however I then
noticed from smartctl that the drive was showing 'imminent failure'
due to 3300+ reallocated sector errors.

At this stage I decided I wanted to pull the drive before it finished
the rebuild and replace it.

However after I stopped the array using:-

mdadm --stop /dev/md126

I was unable to put that drive into fail status

mdadm --fail /dev/sdj1

No Such Device

At this stage I decided to leave the array offline till I had a
replacement drive available to slot in.

I now have the replacement drive and as I was unable to either fail or
remove the offending drive I decided to do a physical pull of the
drive, reboot the machine to show the drive remove and then a second
reboot with the new blank drive available.

This seems to have partially worked in that

mdadm -D /dev/md126
/dev/md126:
Version : 1.2
Creation Time : Sat Aug 6 01:24:12 2011
Raid Level : raid5
Used Dev Size : 1953512960 (1863.02 GiB 2000.40 GB)
Raid Devices : 4
Total Devices : 3
Persistence : Superblock is persistent

Update Time : Sun Aug 7 05:23:45 2011
State : active, degraded, Not Started
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Name : MY_NEW_RAID
UUID : herro_this_isnt_needed
Events : 36003

Number Major Minor RaidDevice State
0 8 129 0 active sync /dev/sdi1
1 0 0 1 removed
2 8 161 2 active sync /dev/sdk1
3 8 177 3 active sync /dev/sdl1

Which is what I expected to see.

However I cannot add the replacement drive into the array.

~ >:mdadm --add /dev/md126 /dev/sdj1
mdadm: add new device failed for /dev/sdj1 as 4: Invalid argument

~ >:mdadm --add --force /dev/md126 /dev/sdj1
mdadm: set device faulty failed for /dev/sdj1: No such device

~ >:mdadm --re-add /dev/md126 /dev/sdj1
mdadm: --re-add for /dev/sdj1 to /dev/md126 is not possible

and even more confusingly

~ >:mdadm -E /dev/sdj1
/dev/sdj1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : not needed
Name : My_NEW_RAID
Creation Time : Sat Aug 6 01:24:12 2011
Raid Level : raid5
Raid Devices : 4

Avail Dev Size : 3907027053 (1863.02 GiB 2000.40 GB)
Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : not needed

Update Time : Sun Aug 7 05:23:45 2011
Checksum : 6172254 - correct
Events : 0

Layout : left-symmetric
Chunk Size : 512K

Device Role : spare
Array State : AAAA ('A' == active, '.' == missing)


Could someone possibly point me in the right direction as to what I'm
doing wrong?

Thanks in advance.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Couldn"t remove rebuilding drive from RAID5 rebuild, now can"tadd new drive to array?

am 11.08.2011 16:22:58 von Robin Hill

--X1bOJ3K7DJ5YkBrT
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu Aug 11, 2011 at 02:48:29PM +0100, Another Sillyname wrote:

> I have a RAID5 array consisting of 4 drives that recently had a problem.
>=20
> One of the drives 'removed' itself from the array and when I added it
> back it started the background rebuilding I expected, however I then
> noticed from smartctl that the drive was showing 'imminent failure'
> due to 3300+ reallocated sector errors.
>=20
> At this stage I decided I wanted to pull the drive before it finished
> the rebuild and replace it.
>=20
> However after I stopped the array using:-
>=20
> mdadm --stop /dev/md126
>=20
> I was unable to put that drive into fail status
>=20
> mdadm --fail /dev/sdj1
>=20
> No Such Device
>=20
Well obviously you can't fail a drive from an array that isn't running
(not to mention that your fail syntax is wrong). What you should have
done (with the array running) is:
mdadm /dev/md126 --fail /dev/sdj1

> At this stage I decided to leave the array offline till I had a
> replacement drive available to slot in.
>=20
> I now have the replacement drive and as I was unable to either fail or
> remove the offending drive I decided to do a physical pull of the
> drive, reboot the machine to show the drive remove and then a second
> reboot with the new blank drive available.
>=20
There's no need for all the rebooting. Simply replacing the offending
drive with the new one and restarting the array (either by reboot or
a controller scan and array re-assemble) would have worked fine.

> This seems to have partially worked in that
>=20
> mdadm -D /dev/md126
> /dev/md126:
> Version : 1.2
> Creation Time : Sat Aug 6 01:24:12 2011
> Raid Level : raid5
> Used Dev Size : 1953512960 (1863.02 GiB 2000.40 GB)
> Raid Devices : 4
> Total Devices : 3
> Persistence : Superblock is persistent
>=20
> Update Time : Sun Aug 7 05:23:45 2011
> State : active, degraded, Not Started
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 0
> Spare Devices : 0
>=20
> Layout : left-symmetric
> Chunk Size : 512K
>=20
> Name : MY_NEW_RAID
> UUID : herro_this_isnt_needed
> Events : 36003
>=20
> Number Major Minor RaidDevice State
> 0 8 129 0 active sync /dev/sdi1
> 1 0 0 1 removed
> 2 8 161 2 active sync /dev/sdk1
> 3 8 177 3 active sync /dev/sdl1
>=20
> Which is what I expected to see.
>=20
Yep, the removed drive is no longer in the array at all.

> However I cannot add the replacement drive into the array.
>=20
> ~ >:mdadm --add /dev/md126 /dev/sdj1
> mdadm: add new device failed for /dev/sdj1 as 4: Invalid argument
>=20
You really need to check dmesg here to see why it's been rejected.

> ~ >:mdadm --add --force /dev/md126 /dev/sdj1
> mdadm: set device faulty failed for /dev/sdj1: No such device
>=20
I've no idea what it's doing here. Are you sure that's exactly what you
typed? If you'd missed a "-" before the force then it may be
interpreting it as "-f" instead, which would fail as /dev/sdj1 is not in
the array.

> ~ >:mdadm --re-add /dev/md126 /dev/sdj1
> mdadm: --re-add for /dev/sdj1 to /dev/md126 is not possible
>=20
As the new drive does not contain any array metadata, it can't be
re-added here.

> and even more confusingly
>=20
> ~ >:mdadm -E /dev/sdj1
> /dev/sdj1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : not needed
> Name : My_NEW_RAID
> Creation Time : Sat Aug 6 01:24:12 2011
> Raid Level : raid5
> Raid Devices : 4
>=20
> Avail Dev Size : 3907027053 (1863.02 GiB 2000.40 GB)
> Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
> Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : not needed
>=20
> Update Time : Sun Aug 7 05:23:45 2011
> Checksum : 6172254 - correct
> Events : 0
>=20
> Layout : left-symmetric
> Chunk Size : 512K
>=20
> Device Role : spare
> Array State : AAAA ('A' == active, '.' == missing)
>=20
>=20
> Could someone possibly point me in the right direction as to what I'm
> doing wrong?
>=20
What's the output of "cat /proc/mdstat" at this point? If it doesn't
show /dev/sdj1 as being in the array at all, then I'd go with trying to
add it again:
mdadm /dev/md126 --add /dev/sdj1

If that still fails, check "dmesg", and possibly try running with -vv to
get a more verbose error.

Cheers,
Robin
--=20
___ =20
( ' } | Robin Hill |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |

--X1bOJ3K7DJ5YkBrT
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iEYEARECAAYFAk5D5cEACgkQShxCyD40xBKThQCfWdOwq6P3/fpV+nLM6dxQ pvHX
cpcAnjyURdyRa0jGsSKY8zbeiDFmJqYF
=WQ9x
-----END PGP SIGNATURE-----

--X1bOJ3K7DJ5YkBrT--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Couldn"t remove rebuilding drive from RAID5 rebuild, now can"tadd new drive to array?

am 12.08.2011 00:13:38 von Another Sillyname

On 11/08/2011, Another Sillyname wrote:
> I have a RAID5 array consisting of 4 drives that recently had a problem.
>
> One of the drives 'removed' itself from the array and when I added it
> back it started the background rebuilding I expected, however I then
> noticed from smartctl that the drive was showing 'imminent failure'
> due to 3300+ reallocated sector errors.
>
> At this stage I decided I wanted to pull the drive before it finished
> the rebuild and replace it.
>
> However after I stopped the array using:-
>
> mdadm --stop /dev/md126
>
> I was unable to put that drive into fail status
>
> mdadm --fail /dev/sdj1
>
> No Such Device
>
> At this stage I decided to leave the array offline till I had a
> replacement drive available to slot in.
>
> I now have the replacement drive and as I was unable to either fail or
> remove the offending drive I decided to do a physical pull of the
> drive, reboot the machine to show the drive remove and then a second
> reboot with the new blank drive available.
>
> This seems to have partially worked in that
>
> mdadm -D /dev/md126
> /dev/md126:
> Version : 1.2
> Creation Time : Sat Aug 6 01:24:12 2011
> Raid Level : raid5
> Used Dev Size : 1953512960 (1863.02 GiB 2000.40 GB)
> Raid Devices : 4
> Total Devices : 3
> Persistence : Superblock is persistent
>
> Update Time : Sun Aug 7 05:23:45 2011
> State : active, degraded, Not Started
> Active Devices : 3
> Working Devices : 3
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Name : MY_NEW_RAID
> UUID : herro_this_isnt_needed
> Events : 36003
>
> Number Major Minor RaidDevice State
> 0 8 129 0 active sync /dev/sdi1
> 1 0 0 1 removed
> 2 8 161 2 active sync /dev/sdk1
> 3 8 177 3 active sync /dev/sdl1
>
> Which is what I expected to see.
>
> However I cannot add the replacement drive into the array.
>
> ~ >:mdadm --add /dev/md126 /dev/sdj1
> mdadm: add new device failed for /dev/sdj1 as 4: Invalid argument
>
> ~ >:mdadm --add --force /dev/md126 /dev/sdj1
> mdadm: set device faulty failed for /dev/sdj1: No such device
>
> ~ >:mdadm --re-add /dev/md126 /dev/sdj1
> mdadm: --re-add for /dev/sdj1 to /dev/md126 is not possible
>
> and even more confusingly
>
> ~ >:mdadm -E /dev/sdj1
> /dev/sdj1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : not needed
> Name : My_NEW_RAID
> Creation Time : Sat Aug 6 01:24:12 2011
> Raid Level : raid5
> Raid Devices : 4
>
> Avail Dev Size : 3907027053 (1863.02 GiB 2000.40 GB)
> Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
> Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : not needed
>
> Update Time : Sun Aug 7 05:23:45 2011
> Checksum : 6172254 - correct
> Events : 0
>
> Layout : left-symmetric
> Chunk Size : 512K
>
> Device Role : spare
> Array State : AAAA ('A' == active, '.' == missing)
>
>
> Could someone possibly point me in the right direction as to what I'm
> doing wrong?
>
> Thanks in advance.
>
Further to my earlier Message.....

I took the array offline again

mdadm --stop /dev/md126

and re-assembled it

mdadm --assemble /dev/md126 /dev/sd[i-l]1

and the array seems to be reconstructed but sdj1 isn't rebuilding....

I have attached the mdadm -E for the drives, can someone explain why
the three active drives are showing a missing drive from the array
while the spare is showing all 4 slots active?

Also what do I need to do to force the sdj1 to start it's rebuild?

~ >:mdadm -E /dev/sd[i-l]1
/dev/sdi1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : not_needed_matches_others
Name : MY_NEW_RAID
Creation Time : Sat Aug 6 01:24:12 2011
Raid Level : raid5
Raid Devices : 4

Avail Dev Size : 3907027053 (1863.02 GiB 2000.40 GB)
Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : not_needed

Update Time : Thu Aug 11 22:51:57 2011
Checksum : 551e2ef7 - correct
Events : 36006

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 0
Array State : A.AA ('A' == active, '.' == missing)
/dev/sdj1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : not_needed_matches_others
Name : MY_NEW_RAID
Creation Time : Sat Aug 6 01:24:12 2011
Raid Level : raid5
Raid Devices : 4

Avail Dev Size : 3907027053 (1863.02 GiB 2000.40 GB)
Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : not_needed

Update Time : Sun Aug 7 05:23:45 2011
Checksum : 6172254 - correct
Events : 0

Layout : left-symmetric
Chunk Size : 512K

Device Role : spare
Array State : AAAA ('A' == active, '.' == missing)
/dev/sdk1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : not_needed_matches_others
Name : MY_NEW_RAID
Creation Time : Sat Aug 6 01:24:12 2011
Raid Level : raid5
Raid Devices : 4

Avail Dev Size : 3907027053 (1863.02 GiB 2000.40 GB)
Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : not_needed_matches_others

Update Time : Thu Aug 11 22:51:57 2011
Checksum : a5145e39 - correct
Events : 36006

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 2
Array State : A.AA ('A' == active, '.' == missing)
/dev/sdl1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : not_needed_matches_others
Name : MY_NEW_RAID
Creation Time : Sat Aug 6 01:24:12 2011
Raid Level : raid5
Raid Devices : 4

Avail Dev Size : 3907027053 (1863.02 GiB 2000.40 GB)
Array Size : 11721077760 (5589.05 GiB 6001.19 GB)
Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : not_needed_matchesothers

Update Time : Thu Aug 11 22:51:57 2011
Checksum : c269d59b - correct
Events : 36006

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 3
Array State : A.AA ('A' == active, '.' == missing)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html