best way to try recovering inactive raid6

best way to try recovering inactive raid6

am 19.12.2010 14:21:01 von beikeland

Hi,

I've got myself into a pickle:

$ sudo mdadm -As
mdadm: /dev/md6 assembled from 2 drives and 2 spares - not enough to
start the array.

This was a 5 device raid6 array, made up of /dev/sd[abcdh]1. I'm
unsure what caused /dev/sdh to need resync, during this process
/dev/sdb decided it was a good time to get hardware problems, and this
slowed the resync down to <1000kB/s.

Then I got the incredible idea that I didn't need the drive that was
_actually_ failing, and failed/pulled the disk labeled with that
particular serial number, only to find out I had labeled it wrong,
which left me with:

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md6 : inactive sdc1[0](S) sdd1[6](S) sdh1[5](S) sda1[3](S)
3906236416 blocks

unused devices:

As far as I've gone through google and this list, I should first try
mdadm --assemble --force and if that fails try to re-create the array.

Force assemble fails, so I guess re-create it is. I've looked at
Permute array.pl and just have the following questions

do I need to re-create it to the same /dev/mdN device?
with raid6 should I try permutations with two missing drives or go for
one missing and --assume-clean?
is there any point to ddrescue / clone /dev/sdb when it has a
different event count than the rest?
there is no --read-only involved in creating the array, only mounting?

Hope you guys can help me out with some input here!

regards,
Bjorn


/dev/sda1:
Magic : a92b4efc
Version : 00.90.00
UUID : 50ed6314:84f965cd:c5795ac2:70ba0679
Creation Time : Thu May 29 20:51:39 2008
Raid Level : raid6
Used Dev Size : 976559104 (931.32 GiB 1000.00 GB)
Array Size : 2929677312 (2793.96 GiB 2999.99 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 6

Update Time : Sun Dec 19 12:12:34 2010
State : clean
Active Devices : 2
Working Devices : 4
Failed Devices : 3
Spare Devices : 2
Checksum : 84f2dc10 - correct
Events : 1290388

Chunk Size : 256K

Number Major Minor RaidDevice State
this 3 8 1 3 active sync /dev/sda1

0 0 8 33 0 active sync /dev/sdc1
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 8 1 3 active sync /dev/sda1
4 4 0 0 4 faulty removed
5 5 8 113 5 spare /dev/sdh1
6 6 8 49 6 spare /dev/sdd1
------------------------------------------------------------ -----
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.00
UUID : 50ed6314:84f965cd:c5795ac2:70ba0679
Creation Time : Thu May 29 20:51:39 2008
Raid Level : raid6
Used Dev Size : 976559104 (931.32 GiB 1000.00 GB)
Array Size : 2929677312 (2793.96 GiB 2999.99 GB)
Raid Devices : 5
Total Devices : 4
Preferred Minor : 6

Update Time : Sun Dec 19 12:02:52 2010
State : active
Active Devices : 3
Working Devices : 4
Failed Devices : 2
Spare Devices : 1
Checksum : 84df28e2 - correct
Events : 1290382

Chunk Size : 256K

Number Major Minor RaidDevice State
this 1 8 17 1 active sync /dev/sdb1

0 0 8 33 0 active sync /dev/sdc1
1 1 8 17 1 active sync /dev/sdb1
2 2 0 0 2 faulty removed
3 3 8 1 3 active sync /dev/sda1
4 4 0 0 4 faulty removed
5 5 8 113 5 spare /dev/sdh1
------------------------------------------------------------ -----
/dev/sdc1:
Magic : a92b4efc
Version : 00.90.00
UUID : 50ed6314:84f965cd:c5795ac2:70ba0679
Creation Time : Thu May 29 20:51:39 2008
Raid Level : raid6
Used Dev Size : 976559104 (931.32 GiB 1000.00 GB)
Array Size : 2929677312 (2793.96 GiB 2999.99 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 6

Update Time : Sun Dec 19 12:12:34 2010
State : clean
Active Devices : 2
Working Devices : 4
Failed Devices : 3
Spare Devices : 2
Checksum : 84f2dc2a - correct
Events : 1290388

Chunk Size : 256K

Number Major Minor RaidDevice State
this 0 8 33 0 active sync /dev/sdc1

0 0 8 33 0 active sync /dev/sdc1
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 8 1 3 active sync /dev/sda1
4 4 0 0 4 faulty removed
5 5 8 113 5 spare /dev/sdh1
6 6 8 49 6 spare /dev/sdd1
------------------------------------------------------------ -----
/dev/sdd1:
Magic : a92b4efc
Version : 00.90.00
UUID : 50ed6314:84f965cd:c5795ac2:70ba0679
Creation Time : Thu May 29 20:51:39 2008
Raid Level : raid6
Used Dev Size : 976559104 (931.32 GiB 1000.00 GB)
Array Size : 2929677312 (2793.96 GiB 2999.99 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 6

Update Time : Sun Dec 19 12:12:34 2010
State : clean
Active Devices : 2
Working Devices : 4
Failed Devices : 3
Spare Devices : 2
Checksum : 84f2dc40 - correct
Events : 1290388

Chunk Size : 256K

Number Major Minor RaidDevice State
this 6 8 49 6 spare /dev/sdd1

0 0 8 33 0 active sync /dev/sdc1
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 8 1 3 active sync /dev/sda1
4 4 0 0 4 faulty removed
5 5 8 113 5 spare /dev/sdh1
6 6 8 49 6 spare /dev/sdd1
------------------------------------------------------------ -----
/dev/sdh1:
Magic : a92b4efc
Version : 00.90.00
UUID : 50ed6314:84f965cd:c5795ac2:70ba0679
Creation Time : Thu May 29 20:51:39 2008
Raid Level : raid6
Used Dev Size : 976559104 (931.32 GiB 1000.00 GB)
Array Size : 2929677312 (2793.96 GiB 2999.99 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 6

Update Time : Sun Dec 19 12:12:34 2010
State : clean
Active Devices : 2
Working Devices : 4
Failed Devices : 3
Spare Devices : 2
Checksum : 84f2dc7e - correct
Events : 1290388

Chunk Size : 256K

Number Major Minor RaidDevice State
this 5 8 113 5 spare /dev/sdh1

0 0 8 33 0 active sync /dev/sdc1
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 8 1 3 active sync /dev/sda1
4 4 0 0 4 faulty removed
5 5 8 113 5 spare /dev/sdh1
6 6 8 49 6 spare /dev/sdd1
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: best way to try recovering inactive raid6

am 19.12.2010 16:01:15 von Mikael Abrahamsson

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

---137064504-1790758269-1292770755=:27193
Content-Type: TEXT/PLAIN; CHARSET=ISO-8859-15; FORMAT=flowed
Content-Transfer-Encoding: 8BIT

On Sun, 19 Dec 2010, Bjørn Eikeland wrote:

> Hope you guys can help me out with some input here!

What mdadm and kernel version are you using?

If you're not using the latest mdadm version (3.1.4 afaik) then try
--assemble --force with that one first. It does a better job than most of
the versions that come with most distributions today.

Recreating the array is highly dangerous and it's definitely a last
resort.

--
Mikael Abrahamsson email: swmike@swm.pp.se
---137064504-1790758269-1292770755=:27193--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: best way to try recovering inactive raid6

am 19.12.2010 17:44:21 von beikeland

2010/12/19 Mikael Abrahamsson :
> On Sun, 19 Dec 2010, Bj=F8rn Eikeland wrote:
>
>> Hope you guys can help me out with some input here!
>
> What mdadm and kernel version are you using?
>
> If you're not using the latest mdadm version (3.1.4 afaik) then try
> --assemble --force with that one first. It does a better job than mos=
t of
> the versions that come with most distributions today.
>
> Recreating the array is highly dangerous and it's definitely a last r=
esort.
>
> --
> Mikael Abrahamsson =A0 =A0email: swmike@swm.pp.se

$ uname -a
Linux filebear 2.6.32-26-generic #48-Ubuntu SMP Wed Nov 24 09:00:03
UTC 2010 i686 GNU/Linux

$ mdadm -V
mdadm - v3.1.4 - 31st August 2010
$ sudo mdadm --assemble --force /dev/md6
mdadm: /dev/md6 assembled from 2 drives and 2 spares - not enough to
start the array.

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md6 : inactive sdc1[0](S) sdd1[6](S) sdh1[5](S) sda1[3](S)
3906236416 blocks

unused devices:

mdadm was older, but same behavior with 3.1.14. It was a new ubuntu
install a few days ago so didn't think of updating mdadm, kernel is
fairly recent at least.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: best way to try recovering inactive raid6

am 21.12.2010 22:43:17 von beikeland

>> $ uname -a
>> Linux filebear 2.6.32-26-generic #48-Ubuntu SMP Wed Nov 24 09:00:03
>> UTC 2010 i686 GNU/Linux
>>
>> $ mdadm -V
>> mdadm - v3.1.4 - 31st August 2010
>> $ sudo mdadm --assemble --force /dev/md6
>> mdadm: /dev/md6 assembled from 2 drives and 2 spares - not enough to
>> start the array.
>>
>> $ cat /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4]
>> md6 : inactive sdc1[0](S) sdd1[6](S) sdh1[5](S) sda1[3](S)
>> =A0 =A0 3906236416 blocks
>>
>> unused devices:
>>
>> mdadm was older, but same behavior with 3.1.14. It was a new ubuntu
>> install a few days ago so didn't think of updating mdadm, kernel is
>> fairly recent at least.
>>
> You should send this to the list together with dmesg output from when=
this
> happens.
>
> Also, I would include the actual drives on the mdadm line, "sudo mdad=
m
> --assemble --force /dev/md6 /dev/sd[wxyz]".
>
> --
> Mikael Abrahamsson =A0 =A0email: swmike@swm.pp.se


I've cloned and re-added /dev/sdb, and it's starting.., but if
/dev/sdb was stale, and the UU_U_ pattern seems to indicate its
resyncing /dev/sdb and /dev/sdh - why would it ever need /dev/sdb?

$ mdadm --assemble --force /dev/md6 /dev/sd[abcdh]1
mdadm: forcing event count in /dev/sdb1(1) from 1290382 upto 1290388
mdadm: clearing FAULTY flag for device 1 in /dev/md6 for /dev/sdb1
mdadm: /dev/md6 has been started with 3 drives (out of 5) and 2 spares.
root@filebear:/home/bjorn# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md6 : active raid6 sdc1[0] sdd1[6] sdh1[5] sda1[3] sdb1[1]
2929677312 blocks level 6, 256k chunk, algorithm 2 [5/3] [UU_U_]
[>....................] recovery =3D 0.0% (841000/976559104)
finish=3D309.3min speed=3D52562K/sec

unused devices:

Oh well, I'm just glad this was md based and possible to resurrect at
all! I've mounted it read-only and tested quite a few large files, and
they're intact.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html