Help - power failure during RAID6 grow

am 26.08.2011 17:18:03 von Michael-John Turner

Hi all,

Yesterday I added three more disks to an existing five disk mdadm RAID6
array and left it reshaping overnight. Unfortunately, there was a power
outage this afternoon and due to a UPS error, my server restarted.

Due to a problem with my one disk chassis (containing the three new disks),
after the reboot the new disks weren't visible, which I think caused some
problems with the array. I've since corrected the problem with the chassis,
but now get the following on boot:
[ 64.868206] md: md20 stopped.
[ 64.872415] md: bind
[ 64.872585] md: bind
[ 64.872720] md: bind
[ 64.873081] md: bind
[ 64.873198] md: bind
[ 64.875280] md: bind
[ 64.875395] md: bind
[ 64.875532] md: bind
[ 64.875544] md: kicking non-fresh sda1 from array!
[ 64.875548] md: unbind
[ 64.880544] md: export_rdev(sda1)
[ 64.880548] md: kicking non-fresh sdb1 from array!
[ 64.880553] md: unbind
[ 64.892007] md: export_rdev(sdb1)
[ 64.892031] md: kicking non-fresh sdc1 from array!
[ 64.892034] md: unbind
[ 64.904007] md: export_rdev(sdc1)
[ 64.904823] raid5: reshape will continue
[ 64.904829] raid5: device sdd1 operational as raid disk 0
[ 64.904831] raid5: device sdh1 operational as raid disk 4
[ 64.904832] raid5: device sdg1 operational as raid disk 3
[ 64.904833] raid5: device sdf1 operational as raid disk 2
[ 64.904835] raid5: device sde1 operational as raid disk 1
[ 64.905253] raid5: allocated 8490kB for md20
[ 64.905271] 0: w=1 pa=2 pr=5 m=2 a=2 r=8 op1=0 op2=0
[ 64.905273] 4: w=2 pa=2 pr=5 m=2 a=2 r=8 op1=0 op2=0
[ 64.905275] 3: w=3 pa=2 pr=5 m=2 a=2 r=8 op1=0 op2=0
[ 64.905277] 2: w=4 pa=2 pr=5 m=2 a=2 r=8 op1=0 op2=0
[ 64.905278] 1: w=5 pa=2 pr=5 m=2 a=2 r=8 op1=0 op2=0
[ 64.905280] raid5: not enough operational devices for md20 (3/8 failed)

md20 is the array in question and sda1, sdb1 and sdc1 are the partitions on
the three new disks (sd[defgh]1 are the five original members of md20,
before the grow).

I tried stopping and re-assembling by hand, but get the following:
# mdadm --assemble /dev/md20 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1
mdadm: /dev/md20 assembled from 5 drives and 1 spare - not enough to start the array.

# uname -a
Linux majestic 2.6.32-5-amd64 #1 SMP Tue Jun 14 09:42:28 UTC 2011 x86_64 GNU/Linux

# mdadm -V
mdadm - v3.1.4 - 31st August 2010

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md20 : inactive sdd1[0](S) sda1[8](S) sdb1[6](S) sdc1[7](S) sdh1[4](S) sdg1[3](S) sdf1[2](S) sde1[1](S)
15609328616 blocks super 1.2
The above is after attempting the manual --assemble above.

# mdadm --detail /dev/md20
mdadm: md device /dev/md20 does not appear to be active.

Help!! Should I force a re-assemble (I haven't tried that as I don't want
to do anything risky)? mdadm -E output for all the disks pasted below.

-mj
--
Michael-John Turner
mj@mjturner.net <> http://mjturner.net/

############################################################ ###########

/dev/sda1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : ba21b41c:4e89d27a:be416498:12763c48
Name : majestic:20 (local to host majestic)
Creation Time : Sun Jul 4 19:54:16 2010
Raid Level : raid6
Raid Devices : 8

Avail Dev Size : 3902331909 (1860.78 GiB 1997.99 GB)
Array Size : 23413991424 (11164.66 GiB 11987.96 GB)
Used Dev Size : 3902331904 (1860.78 GiB 1997.99 GB)
Data Offset : 664 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 84dc8ff5:2f9c8735:d42328fc:cc5549a8

Reshape pos'n : 3213327360 (3064.47 GiB 3290.45 GB)
Delta Devices : 3 (5->8)

Update Time : Fri Aug 26 14:25:13 2011
Checksum : a4a9dc2e - correct
Events : 0

Layout : left-symmetric
Chunk Size : 256K

Device Role : spare
Array State : AAAAA... ('A' == active, '.' == missing)
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : ba21b41c:4e89d27a:be416498:12763c48
Name : majestic:20 (local to host majestic)
Creation Time : Sun Jul 4 19:54:16 2010
Raid Level : raid6
Raid Devices : 8

Avail Dev Size : 3902331909 (1860.78 GiB 1997.99 GB)
Array Size : 23413991424 (11164.66 GiB 11987.96 GB)
Used Dev Size : 3902331904 (1860.78 GiB 1997.99 GB)
Data Offset : 664 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 6748c837:b1e86725:edf2184d:81d55fd6

Reshape pos'n : 3213327360 (3064.47 GiB 3290.45 GB)
Delta Devices : 3 (5->8)

Update Time : Fri Aug 26 14:11:11 2011
Checksum : 55d90bc0 - correct
Events : 11174

Layout : left-symmetric
Chunk Size : 256K

Device Role : Active device 6
Array State : AAAAAAAA ('A' == active, '.' == missing)
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : ba21b41c:4e89d27a:be416498:12763c48
Name : majestic:20 (local to host majestic)
Creation Time : Sun Jul 4 19:54:16 2010
Raid Level : raid6
Raid Devices : 8

Avail Dev Size : 3902331909 (1860.78 GiB 1997.99 GB)
Array Size : 23413991424 (11164.66 GiB 11987.96 GB)
Used Dev Size : 3902331904 (1860.78 GiB 1997.99 GB)
Data Offset : 664 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 47f47246:55da1b0e:831ab899:ac2d1eed

Reshape pos'n : 3213327360 (3064.47 GiB 3290.45 GB)
Delta Devices : 3 (5->8)

Update Time : Fri Aug 26 14:11:11 2011
Checksum : b0952906 - correct
Events : 11174

Layout : left-symmetric
Chunk Size : 256K

Device Role : Active device 5
Array State : AAAAAAAA ('A' == active, '.' == missing)
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : ba21b41c:4e89d27a:be416498:12763c48
Name : majestic:20 (local to host majestic)
Creation Time : Sun Jul 4 19:54:16 2010
Raid Level : raid6
Raid Devices : 8

Avail Dev Size : 3902332301 (1860.78 GiB 1997.99 GB)
Array Size : 23413991424 (11164.66 GiB 11987.96 GB)
Used Dev Size : 3902331904 (1860.78 GiB 1997.99 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : c3bc9bf5:16baaa20:811b93f7:bb61b63a

Reshape pos'n : 3213327360 (3064.47 GiB 3290.45 GB)
Delta Devices : 3 (5->8)

Update Time : Fri Aug 26 14:25:13 2011
Checksum : 1db10998 - correct
Events : 11184

Layout : left-symmetric
Chunk Size : 256K

Device Role : Active device 0
Array State : AAAAA... ('A' == active, '.' == missing)
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : ba21b41c:4e89d27a:be416498:12763c48
Name : majestic:20 (local to host majestic)
Creation Time : Sun Jul 4 19:54:16 2010
Raid Level : raid6
Raid Devices : 8

Avail Dev Size : 3902332301 (1860.78 GiB 1997.99 GB)
Array Size : 23413991424 (11164.66 GiB 11987.96 GB)
Used Dev Size : 3902331904 (1860.78 GiB 1997.99 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 3f60aea4:00a038c8:844ead69:d9591383

Reshape pos'n : 3213327360 (3064.47 GiB 3290.45 GB)
Delta Devices : 3 (5->8)

Update Time : Fri Aug 26 14:25:13 2011
Checksum : 2ec8be20 - correct
Events : 11184

Layout : left-symmetric
Chunk Size : 256K

Device Role : Active device 1
Array State : AAAAA... ('A' == active, '.' == missing)
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : ba21b41c:4e89d27a:be416498:12763c48
Name : majestic:20 (local to host majestic)
Creation Time : Sun Jul 4 19:54:16 2010
Raid Level : raid6
Raid Devices : 8

Avail Dev Size : 3902332301 (1860.78 GiB 1997.99 GB)
Array Size : 23413991424 (11164.66 GiB 11987.96 GB)
Used Dev Size : 3902331904 (1860.78 GiB 1997.99 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : b70e9161:e73c145f:2bcfe99a:c061ef73

Reshape pos'n : 3213327360 (3064.47 GiB 3290.45 GB)
Delta Devices : 3 (5->8)

Update Time : Fri Aug 26 14:25:13 2011
Checksum : a49f920d - correct
Events : 11184

Layout : left-symmetric
Chunk Size : 256K

Device Role : Active device 2
Array State : AAAAA... ('A' == active, '.' == missing)
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : ba21b41c:4e89d27a:be416498:12763c48
Name : majestic:20 (local to host majestic)
Creation Time : Sun Jul 4 19:54:16 2010
Raid Level : raid6
Raid Devices : 8

Avail Dev Size : 3902332301 (1860.78 GiB 1997.99 GB)
Array Size : 23413991424 (11164.66 GiB 11987.96 GB)
Used Dev Size : 3902331904 (1860.78 GiB 1997.99 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : c7c739f0:d1421fdb:136e8eb7:431bd0ec

Reshape pos'n : 3213327360 (3064.47 GiB 3290.45 GB)
Delta Devices : 3 (5->8)

Update Time : Fri Aug 26 14:25:13 2011
Checksum : 44d8a975 - correct
Events : 11184

Layout : left-symmetric
Chunk Size : 256K

Device Role : Active device 3
Array State : AAAAA... ('A' == active, '.' == missing)
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x4
Array UUID : ba21b41c:4e89d27a:be416498:12763c48
Name : majestic:20 (local to host majestic)
Creation Time : Sun Jul 4 19:54:16 2010
Raid Level : raid6
Raid Devices : 8

Avail Dev Size : 3902332301 (1860.78 GiB 1997.99 GB)
Array Size : 23413991424 (11164.66 GiB 11987.96 GB)
Used Dev Size : 3902331904 (1860.78 GiB 1997.99 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 24c8066f:b85de6c5:9382acd4:6e8b8f6e

Reshape pos'n : 3213327360 (3064.47 GiB 3290.45 GB)
Delta Devices : 3 (5->8)

Update Time : Fri Aug 26 14:25:13 2011
Checksum : 4d4a4964 - correct
Events : 11184

Layout : left-symmetric
Chunk Size : 256K

Device Role : Active device 4
Array State : AAAAA... ('A' == active, '.' == missing)

############################################################ ###########

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Help - power failure during RAID6 grow

am 26.08.2011 18:17:46 von Michael-John Turner

On Fri, Aug 26, 2011 at 04:18:03PM +0100, Michael-John Turner wrote:
[...]
> I tried stopping and re-assembling by hand, but get the following:
> # mdadm --assemble /dev/md20 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1
> mdadm: /dev/md20 assembled from 5 drives and 1 spare - not enough to start the array.
[...]

A bit more info. If I try an assemble with -vv, I get the following:
mdadm: /dev/sdh1 is identified as a member of /dev/md20, slot 4.
mdadm: /dev/sdf1 is identified as a member of /dev/md20, slot 2.
mdadm: /dev/sdg1 is identified as a member of /dev/md20, slot 3.
mdadm: /dev/sde1 is identified as a member of /dev/md20, slot 1.
mdadm: /dev/sdd1 is identified as a member of /dev/md20, slot 0.
mdadm: /dev/sdb1 is identified as a member of /dev/md20, slot 6.
mdadm: /dev/sdc1 is identified as a member of /dev/md20, slot 5.
mdadm: /dev/sda1 is identified as a member of /dev/md20, slot -1.
mdadm:/dev/md20 has an active reshape - checking if critical section needs to be restored
mdadm: too-old timestamp on backup-metadata on device-5
mdadm: too-old timestamp on backup-metadata on device-6
mdadm: too-old timestamp on backup-metadata on device-8
mdadm: added /dev/sde1 to /dev/md20 as 1
mdadm: added /dev/sdf1 to /dev/md20 as 2
mdadm: added /dev/sdg1 to /dev/md20 as 3
mdadm: added /dev/sdh1 to /dev/md20 as 4
mdadm: added /dev/sdc1 to /dev/md20 as 5
mdadm: added /dev/sdb1 to /dev/md20 as 6
mdadm: no uptodate device for slot 7 of /dev/md20
mdadm: added /dev/sda1 to /dev/md20 as -1
mdadm: added /dev/sdd1 to /dev/md20 as 0
mdadm: /dev/md20 assembled from 5 drives and 1 spare - not enough to start the array.

Taking the contents of my previous mail into account (with the details
of each array member), is it safe to do an assemble with
MDADM_GROW_ALLOW_OLD=1?

-mj
--
Michael-John Turner
mj@mjturner.net <> http://mjturner.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Help - power failure during RAID6 grow

am 26.08.2011 22:51:51 von NeilBrown

On Fri, 26 Aug 2011 17:17:46 +0100 Michael-John Turner
wrote:

> On Fri, Aug 26, 2011 at 04:18:03PM +0100, Michael-John Turner wrote:
> [...]
> > I tried stopping and re-assembling by hand, but get the following:
> > # mdadm --assemble /dev/md20 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1
> > mdadm: /dev/md20 assembled from 5 drives and 1 spare - not enough to start the array.
> [...]
>
> A bit more info. If I try an assemble with -vv, I get the following:
> mdadm: /dev/sdh1 is identified as a member of /dev/md20, slot 4.
> mdadm: /dev/sdf1 is identified as a member of /dev/md20, slot 2.
> mdadm: /dev/sdg1 is identified as a member of /dev/md20, slot 3.
> mdadm: /dev/sde1 is identified as a member of /dev/md20, slot 1.
> mdadm: /dev/sdd1 is identified as a member of /dev/md20, slot 0.
> mdadm: /dev/sdb1 is identified as a member of /dev/md20, slot 6.
> mdadm: /dev/sdc1 is identified as a member of /dev/md20, slot 5.
> mdadm: /dev/sda1 is identified as a member of /dev/md20, slot -1.
> mdadm:/dev/md20 has an active reshape - checking if critical section needs to be restored
> mdadm: too-old timestamp on backup-metadata on device-5
> mdadm: too-old timestamp on backup-metadata on device-6
> mdadm: too-old timestamp on backup-metadata on device-8
> mdadm: added /dev/sde1 to /dev/md20 as 1
> mdadm: added /dev/sdf1 to /dev/md20 as 2
> mdadm: added /dev/sdg1 to /dev/md20 as 3
> mdadm: added /dev/sdh1 to /dev/md20 as 4
> mdadm: added /dev/sdc1 to /dev/md20 as 5
> mdadm: added /dev/sdb1 to /dev/md20 as 6
> mdadm: no uptodate device for slot 7 of /dev/md20
> mdadm: added /dev/sda1 to /dev/md20 as -1
> mdadm: added /dev/sdd1 to /dev/md20 as 0
> mdadm: /dev/md20 assembled from 5 drives and 1 spare - not enough to start the array.
>
> Taking the contents of my previous mail into account (with the details
> of each array member), is it safe to do an assemble with
> MDADM_GROW_ALLOW_OLD=1?
>
> -mj

Leave sda1 out of the list - it looks too much like a spare. Sometime must
have reset the metadata on it. You can live without it so do so for now.

Assemble the array with the rest of the devices and give the "--force" flag
so it will update the event counts to all be in sync.
And do this with MDADM_GROW_ALLOW_OLD=1 set.

This should finish the reshape and give you a singly degraded 8 device RAID6.

Then add sda1 back in and it will recover and the array will be optimal.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Help - power failure during RAID6 grow

am 26.08.2011 23:24:47 von Michael-John Turner

On Sat, Aug 27, 2011 at 06:51:51AM +1000, NeilBrown wrote:
> Assemble the array with the rest of the devices and give the "--force" flag
> so it will update the event counts to all be in sync.
> And do this with MDADM_GROW_ALLOW_OLD=1 set.

Thanks - that's exactly what I've done and it's rebuilding as we speak. I
did toy with recreating the array (to get sda1 back in place as a
non-spare) but wasn't sure if re-writing the RAID superblock would affect
the resizing that was in progress?

> Then add sda1 back in and it will recover and the array will be optimal.

Will do. Having to do a sync twice is a bit painful, but at least I've got
the array back :)

Once again, thanks - the assistance is much appreciated.

-mj
--
Michael-John Turner
mj@mjturner.net <> http://mjturner.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html