(help!) MD RAID6 won"t --re-add devices?

(help!) MD RAID6 won"t --re-add devices?

am 13.01.2011 14:03:57 von Bart Kus

Hello,

I had a Port Multiplier failure overnight. This put 5 out of 10 drives
offline, degrading my RAID6 array. The file system is still mounted
(and failing to write):

Buffer I/O error on device md4, logical block 3907023608
Filesystem "md4": xfs_log_force: error 5 returned.
etc...

The array is in the following state:

/dev/md4:
Version : 1.02
Creation Time : Sun Aug 10 23:41:49 2008
Raid Level : raid6
Array Size : 15628094464 (14904.11 GiB 16003.17 GB)
Used Dev Size : 1953511808 (1863.01 GiB 2000.40 GB)
Raid Devices : 10
Total Devices : 11
Persistence : Superblock is persistent

Update Time : Wed Jan 12 05:32:14 2011
State : clean, degraded
Active Devices : 5
Working Devices : 5
Failed Devices : 6
Spare Devices : 0

Chunk Size : 64K

Name : 4
UUID : da14eb85:00658f24:80f7a070:b9026515
Events : 4300692

Number Major Minor RaidDevice State
15 8 1 0 active sync /dev/sda1
1 0 0 1 removed
12 8 33 2 active sync /dev/sdc1
16 8 49 3 active sync /dev/sdd1
4 0 0 4 removed
20 8 193 5 active sync /dev/sdm1
6 0 0 6 removed
7 0 0 7 removed
8 0 0 8 removed
13 8 17 9 active sync /dev/sdb1

10 8 97 - faulty spare
11 8 129 - faulty spare
14 8 113 - faulty spare
17 8 81 - faulty spare
18 8 65 - faulty spare
19 8 145 - faulty spare

I have replaced the faulty PM and the drives have registered back with
the system, under new names:

sd 3:0:0:0: [sdn] Attached SCSI disk
sd 3:1:0:0: [sdo] Attached SCSI disk
sd 3:2:0:0: [sdp] Attached SCSI disk
sd 3:4:0:0: [sdr] Attached SCSI disk
sd 3:3:0:0: [sdq] Attached SCSI disk

But I can't seem to --re-add them into the array now!

# mdadm /dev/md4 --re-add /dev/sdn1 --re-add /dev/sdo1 --re-add
/dev/sdp1 --re-add /dev/sdr1 --re-add /dev/sdq1
mdadm: add new device failed for /dev/sdn1 as 21: Device or resource busy

I haven't unmounted the file system and/or stopped the /dev/md4 device,
since I think that would drop any buffers either layer might be
holding. I'd of course prefer to lose as little data as possible. How
can I get this array going again?

PS: I think the reason "Failed Devices" shows 6 and not 5 is because I
had a single HD failure a couple weeks back. I replaced the drive and
the array re-built A-OK. I guess it still counted the failure since the
array wasn't stopped during the repair.

Thanks for any guidance,

--Bart

PPS: mdadm - v3.0 - 2nd June 2009
PPS: Linux jo.bartk.us 2.6.35-gentoo-r9 #1 SMP Sat Oct 2 21:22:14 PDT
2010 x86_64 Intel(R) Core(TM)2 Quad CPU @ 2.40GHz GenuineIntel GNU/Linux
PPS: # mdadm --examine /dev/sdn1
/dev/sdn1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : da14eb85:00658f24:80f7a070:b9026515
Name : 4
Creation Time : Sun Aug 10 23:41:49 2008
Raid Level : raid6
Raid Devices : 10

Avail Dev Size : 3907023730 (1863.01 GiB 2000.40 GB)
Array Size : 31256188928 (14904.11 GiB 16003.17 GB)
Used Dev Size : 3907023616 (1863.01 GiB 2000.40 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : c0cf419f:4c33dc64:84bc1c1a:7e9778ba

Update Time : Wed Jan 12 05:39:55 2011
Checksum : bdb14e66 - correct
Events : 4300672

Chunk Size : 64K

Device Role : spare
Array State : A.AA.A...A ('A' == active, '.' == missing)

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: (help!) MD RAID6 won"t --re-add devices?

am 15.01.2011 18:48:55 von Bart Kus

Things seem to have gone from bad to worse. I upgraded to the latest
mdadm, and it actually let me do an --add operation, but --re-add was
still failing. It added all the devices as spares though. I stopped
the array and tried to re-assemble it, but it's not starting.

jo ~ # mdadm -A /dev/md4 -f -u da14eb85:00658f24:80f7a070:b9026515
mdadm: /dev/md4 assembled from 5 drives and 5 spares - not enough to
start the array.

How do I promote these "spares" to being the active decides they once
were? Yes, they're behind a few events, so there will be some data loss.

--Bart

On 1/13/2011 5:03 AM, Bart Kus wrote:
> Hello,
>
> I had a Port Multiplier failure overnight. This put 5 out of 10
> drives offline, degrading my RAID6 array. The file system is still
> mounted (and failing to write):
>
> Buffer I/O error on device md4, logical block 3907023608
> Filesystem "md4": xfs_log_force: error 5 returned.
> etc...
>
> The array is in the following state:
>
> /dev/md4:
> Version : 1.02
> Creation Time : Sun Aug 10 23:41:49 2008
> Raid Level : raid6
> Array Size : 15628094464 (14904.11 GiB 16003.17 GB)
> Used Dev Size : 1953511808 (1863.01 GiB 2000.40 GB)
> Raid Devices : 10
> Total Devices : 11
> Persistence : Superblock is persistent
>
> Update Time : Wed Jan 12 05:32:14 2011
> State : clean, degraded
> Active Devices : 5
> Working Devices : 5
> Failed Devices : 6
> Spare Devices : 0
>
> Chunk Size : 64K
>
> Name : 4
> UUID : da14eb85:00658f24:80f7a070:b9026515
> Events : 4300692
>
> Number Major Minor RaidDevice State
> 15 8 1 0 active sync /dev/sda1
> 1 0 0 1 removed
> 12 8 33 2 active sync /dev/sdc1
> 16 8 49 3 active sync /dev/sdd1
> 4 0 0 4 removed
> 20 8 193 5 active sync /dev/sdm1
> 6 0 0 6 removed
> 7 0 0 7 removed
> 8 0 0 8 removed
> 13 8 17 9 active sync /dev/sdb1
>
> 10 8 97 - faulty spare
> 11 8 129 - faulty spare
> 14 8 113 - faulty spare
> 17 8 81 - faulty spare
> 18 8 65 - faulty spare
> 19 8 145 - faulty spare
>
> I have replaced the faulty PM and the drives have registered back with
> the system, under new names:
>
> sd 3:0:0:0: [sdn] Attached SCSI disk
> sd 3:1:0:0: [sdo] Attached SCSI disk
> sd 3:2:0:0: [sdp] Attached SCSI disk
> sd 3:4:0:0: [sdr] Attached SCSI disk
> sd 3:3:0:0: [sdq] Attached SCSI disk
>
> But I can't seem to --re-add them into the array now!
>
> # mdadm /dev/md4 --re-add /dev/sdn1 --re-add /dev/sdo1 --re-add
> /dev/sdp1 --re-add /dev/sdr1 --re-add /dev/sdq1
> mdadm: add new device failed for /dev/sdn1 as 21: Device or resource busy
>
> I haven't unmounted the file system and/or stopped the /dev/md4
> device, since I think that would drop any buffers either layer might
> be holding. I'd of course prefer to lose as little data as possible.
> How can I get this array going again?
>
> PS: I think the reason "Failed Devices" shows 6 and not 5 is because I
> had a single HD failure a couple weeks back. I replaced the drive and
> the array re-built A-OK. I guess it still counted the failure since
> the array wasn't stopped during the repair.
>
> Thanks for any guidance,
>
> --Bart
>
> PPS: mdadm - v3.0 - 2nd June 2009
> PPS: Linux jo.bartk.us 2.6.35-gentoo-r9 #1 SMP Sat Oct 2 21:22:14 PDT
> 2010 x86_64 Intel(R) Core(TM)2 Quad CPU @ 2.40GHz GenuineIntel GNU/Linux
> PPS: # mdadm --examine /dev/sdn1
> /dev/sdn1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : da14eb85:00658f24:80f7a070:b9026515
> Name : 4
> Creation Time : Sun Aug 10 23:41:49 2008
> Raid Level : raid6
> Raid Devices : 10
>
> Avail Dev Size : 3907023730 (1863.01 GiB 2000.40 GB)
> Array Size : 31256188928 (14904.11 GiB 16003.17 GB)
> Used Dev Size : 3907023616 (1863.01 GiB 2000.40 GB)
> Data Offset : 272 sectors
> Super Offset : 8 sectors
> State : clean
> Device UUID : c0cf419f:4c33dc64:84bc1c1a:7e9778ba
>
> Update Time : Wed Jan 12 05:39:55 2011
> Checksum : bdb14e66 - correct
> Events : 4300672
>
> Chunk Size : 64K
>
> Device Role : spare
> Array State : A.AA.A...A ('A' == active, '.' == missing)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: (help!) MD RAID6 won"t --re-add devices?

am 15.01.2011 20:50:58 von Bart Kus

Some research has revealed a frightening solution:

http://forums.gentoo.org/viewtopic-t-716757-start-0.html

That thread calls upon mdadm --create with the --assume-clean flag. It
also seems to re-enforce my suspicions that MD has lost my device order
numbers when it marked the drives as spare (thanks, MD! Remind me to
get you a nice christmas present next year.). I know the order of 5 out
of 10 devices, so that leaves 120 permutations to try. I've whipped up
some software to generate all the permuted mdadm --create commands.

The question now: how do I test if I've got the right combination? Can
I dd a meg off the assembled array and check for errors somewhere?

The other question: Is testing incorrect combinations destructive to any
data on the drives? Like, would RAID6 kick in and start "fixing" parity
errors, even if I'm just reading?

--Bart

On 1/15/2011 9:48 AM, Bart Kus wrote:
> Things seem to have gone from bad to worse. I upgraded to the latest
> mdadm, and it actually let me do an --add operation, but --re-add was
> still failing. It added all the devices as spares though. I stopped
> the array and tried to re-assemble it, but it's not starting.
>
> jo ~ # mdadm -A /dev/md4 -f -u da14eb85:00658f24:80f7a070:b9026515
> mdadm: /dev/md4 assembled from 5 drives and 5 spares - not enough to
> start the array.
>
> How do I promote these "spares" to being the active decides they once
> were? Yes, they're behind a few events, so there will be some data loss.
>
> --Bart
>
> On 1/13/2011 5:03 AM, Bart Kus wrote:
>> Hello,
>>
>> I had a Port Multiplier failure overnight. This put 5 out of 10
>> drives offline, degrading my RAID6 array. The file system is still
>> mounted (and failing to write):
>>
>> Buffer I/O error on device md4, logical block 3907023608
>> Filesystem "md4": xfs_log_force: error 5 returned.
>> etc...
>>
>> The array is in the following state:
>>
>> /dev/md4:
>> Version : 1.02
>> Creation Time : Sun Aug 10 23:41:49 2008
>> Raid Level : raid6
>> Array Size : 15628094464 (14904.11 GiB 16003.17 GB)
>> Used Dev Size : 1953511808 (1863.01 GiB 2000.40 GB)
>> Raid Devices : 10
>> Total Devices : 11
>> Persistence : Superblock is persistent
>>
>> Update Time : Wed Jan 12 05:32:14 2011
>> State : clean, degraded
>> Active Devices : 5
>> Working Devices : 5
>> Failed Devices : 6
>> Spare Devices : 0
>>
>> Chunk Size : 64K
>>
>> Name : 4
>> UUID : da14eb85:00658f24:80f7a070:b9026515
>> Events : 4300692
>>
>> Number Major Minor RaidDevice State
>> 15 8 1 0 active sync /dev/sda1
>> 1 0 0 1 removed
>> 12 8 33 2 active sync /dev/sdc1
>> 16 8 49 3 active sync /dev/sdd1
>> 4 0 0 4 removed
>> 20 8 193 5 active sync /dev/sdm1
>> 6 0 0 6 removed
>> 7 0 0 7 removed
>> 8 0 0 8 removed
>> 13 8 17 9 active sync /dev/sdb1
>>
>> 10 8 97 - faulty spare
>> 11 8 129 - faulty spare
>> 14 8 113 - faulty spare
>> 17 8 81 - faulty spare
>> 18 8 65 - faulty spare
>> 19 8 145 - faulty spare
>>
>> I have replaced the faulty PM and the drives have registered back
>> with the system, under new names:
>>
>> sd 3:0:0:0: [sdn] Attached SCSI disk
>> sd 3:1:0:0: [sdo] Attached SCSI disk
>> sd 3:2:0:0: [sdp] Attached SCSI disk
>> sd 3:4:0:0: [sdr] Attached SCSI disk
>> sd 3:3:0:0: [sdq] Attached SCSI disk
>>
>> But I can't seem to --re-add them into the array now!
>>
>> # mdadm /dev/md4 --re-add /dev/sdn1 --re-add /dev/sdo1 --re-add
>> /dev/sdp1 --re-add /dev/sdr1 --re-add /dev/sdq1
>> mdadm: add new device failed for /dev/sdn1 as 21: Device or resource
>> busy
>>
>> I haven't unmounted the file system and/or stopped the /dev/md4
>> device, since I think that would drop any buffers either layer might
>> be holding. I'd of course prefer to lose as little data as
>> possible. How can I get this array going again?
>>
>> PS: I think the reason "Failed Devices" shows 6 and not 5 is because
>> I had a single HD failure a couple weeks back. I replaced the drive
>> and the array re-built A-OK. I guess it still counted the failure
>> since the array wasn't stopped during the repair.
>>
>> Thanks for any guidance,
>>
>> --Bart
>>
>> PPS: mdadm - v3.0 - 2nd June 2009
>> PPS: Linux jo.bartk.us 2.6.35-gentoo-r9 #1 SMP Sat Oct 2 21:22:14 PDT
>> 2010 x86_64 Intel(R) Core(TM)2 Quad CPU @ 2.40GHz GenuineIntel GNU/Linux
>> PPS: # mdadm --examine /dev/sdn1
>> /dev/sdn1:
>> Magic : a92b4efc
>> Version : 1.2
>> Feature Map : 0x0
>> Array UUID : da14eb85:00658f24:80f7a070:b9026515
>> Name : 4
>> Creation Time : Sun Aug 10 23:41:49 2008
>> Raid Level : raid6
>> Raid Devices : 10
>>
>> Avail Dev Size : 3907023730 (1863.01 GiB 2000.40 GB)
>> Array Size : 31256188928 (14904.11 GiB 16003.17 GB)
>> Used Dev Size : 3907023616 (1863.01 GiB 2000.40 GB)
>> Data Offset : 272 sectors
>> Super Offset : 8 sectors
>> State : clean
>> Device UUID : c0cf419f:4c33dc64:84bc1c1a:7e9778ba
>>
>> Update Time : Wed Jan 12 05:39:55 2011
>> Checksum : bdb14e66 - correct
>> Events : 4300672
>>
>> Chunk Size : 64K
>>
>> Device Role : spare
>> Array State : A.AA.A...A ('A' == active, '.' == missing)
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: (help!) MD RAID6 won"t --re-add devices?

am 16.01.2011 01:05:16 von jeromepoulin

On Sat, Jan 15, 2011 at 2:50 PM, Bart Kus wrote:
> Some research has revealed a frightening solution:
>
> http://forums.gentoo.org/viewtopic-t-716757-start-0.html
>
> That thread calls upon mdadm --create with the --assume-clean flag. =A0=
It also
> seems to re-enforce my suspicions that MD has lost my device order nu=
mbers
> when it marked the drives as spare (thanks, MD! =A0Remind me to get y=
ou a nice
> christmas present next year.). =A0I know the order of 5 out of 10 dev=
ices, so
> that leaves 120 permutations to try. =A0I've whipped up some software=
to
> generate all the permuted mdadm --create commands.
>
> The question now: how do I test if I've got the right combination? =A0=
Can I dd
> a meg off the assembled array and check for errors somewhere?

I guess running a read-only fsck is the best way to proove it working.

>
> The other question: Is testing incorrect combinations destructive to =
any
> data on the drives? =A0Like, would RAID6 kick in and start "fixing" p=
arity
> errors, even if I'm just reading?
>

If you don't want to risk your data, you could create a cowloop of
each device before writing to it, or dm snapshot using dmsetup.

I made a script for dmsetup snapshot on the side when I really needed
it because cowloop wouldn't compile. Here it is, it should help you
understand how it works!


RODATA=3D$1
shift
COWFILE=3D$1
shift
=46SIZE=3D$1
shift
PREFIX=3D$1
shift

if [ -z $RODATA ] || [ -z $COWFILE ] || [ -z $FSIZE ] || [ ! -z $5 ]
then
echo "Usage: $0 [read only device] [loop file] [size of loop in MB] {p=
refix}"
echo "Read only device won't ever get a write."
echo "Loop file can be a file or device where writes will be directed =
too."
echo "Size is specified in MB, you will be able to write that much
change to the device created."
echo "Prefix will get prepended to all devices created by this script
in /dev/mapper"
exit -1
fi

MRODATA=3D$PREFIX${RODATA#/dev/}data
COWFILELOOP=3D$(losetup -f)
MCOWFILE=3D$PREFIX${RODATA#/dev/}cow
MSNAPSHOT=3D$PREFIX${RODATA#/dev/}snap


dd if=3D/dev/zero of=3D$COWFILE bs=3D1M seek=3D$FSIZE count=3D1
losetup $COWFILELOOP $COWFILE
echo "0 $(blockdev --getsz $RODATA) linear $RODATA 0" | dmsetup create =
$MRODATA
echo "0 $(blockdev --getsz $COWFILELOOP) linear $COWFILELOOP 0" |
dmsetup create $MCOWFILE
echo "0 $(blockdev --getsz /dev/mapper/$MRODATA) snapshot
/dev/mapper/$MRODATA /dev/mapper/$MCOWFILE p 64" | dmsetup create
$MSNAPSHOT

echo "You can now use $MSNAPSHOT for your tests, up to ${FSIZE}MB."
exit 0


> --Bart
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: (help!) MD RAID6 won"t --re-add devices? [SOLVED!]

am 16.01.2011 22:19:26 von Bart Kus

Thanks for the COW idea, had not thought of that. Luckily, I had 10=20
spare 2TB drives racked and powered, so I just backed up all the drives=
=20
using dd.

Turns out a good way to test if you've got the right combination of=20
drives is to do echo check > sync_action, wait 5 seconds, and then chec=
k=20
mismatch_cnt. If you've found the right combination, the count will be=
=20
low or zero.

Another important thing to note is that "Version" reported by mdadm=20
--detail /dev/mdX is NOT always the same as version reported by mdadm=20
--examine /dev/sdX. I guess array header and drive header track=20
different version numbers. My array header was reporting 1.02 while al=
l=20
the drives were showing 1.2.

And a key thing to know is that the default Data Offset has CHANGED ove=
r=20
the years. My original drives reported an offset of 272 sectors, and I=
=20
believe the array was made with mdadm-2.6.6. Using mdadm-3.1.4 to=20
create a new array put the offset at 2048 sectors, a huge change! Also=
,=20
it seems when mdadm-3.1.4 added the old drives (272 offset at the time)=
=20
into the array that was missing 5/10 drives and marked them as spares,=20
the spare-marking process changed the offset to 384 sectors. The array=
=20
when created with mdadm-3.1.4 had actually reduced the Used Dev Size a=20
bit from what the original array had, so none of the permutations worke=
d=20
since everything was misaligned. I had to downgrade to mdadm-3.0 which=
=20
created the array with the proper Dev Size and the proper Data Offset o=
f=20
272 sectors for the RAID6 blocks to line up.

Is there documentation somewhere about all these default changes? I sa=
w=20
no options to specify the data offset either. That would be a good=20
option to add.

But best to add would be functional --re-add capability! Reporting the=
=20
array is "busy" when I'm trying to return its 5 missing drives isn't=20
useful. It should re-add its old drives as expected and flush any=20
pending buffers.

Below is the (very hacky) code I used to test all the permutations of=20
the 5 drives whose sequence was lost by being marked as spares. =20
Hopefully it doesn't have to help anyone in the future.

#include
#include
#include
#include
#include

char *permutation[] =3D { "nopqr", "noprq", "noqpr", "noqrp", "norpq",=20
"norqp", "npoqr", "nporq", "npqor", "npqro", "nproq", "nprqo", "nqopr",=
=20
"nqorp", "nqpor", "nqpro", "nqrop", "nqrpo", "nropq", "nroqp", "nrpoq",=
=20
"nrpqo", "nrqop", "nrqpo", "onpqr", "onprq", "onqpr", "onqrp", "onrpq",=
=20
"onrqp", "opnqr", "opnrq", "opqnr", "opqrn", "oprnq", "oprqn", "oqnpr",=
=20
"oqnrp", "oqpnr", "oqprn", "oqrnp", "oqrpn", "ornpq", "ornqp", "orpnq",=
=20
"orpqn", "orqnp", "orqpn", "pnoqr", "pnorq", "pnqor", "pnqro", "pnroq",=
=20
"pnrqo", "ponqr", "ponrq", "poqnr", "poqrn", "pornq", "porqn", "pqnor",=
=20
"pqnro", "pqonr", "pqorn", "pqrno", "pqron", "prnoq", "prnqo", "pronq",=
=20
"proqn", "prqno", "prqon", "qnopr", "qnorp", "qnpor", "qnpro", "qnrop",=
=20
"qnrpo", "qonpr", "qonrp", "qopnr", "qoprn", "qornp", "qorpn", "qpnor",=
=20
"qpnro", "qponr", "qporn", "qprno", "qpron", "qrnop", "qrnpo", "qronp",=
=20
"qropn", "qrpno", "qrpon", "rnopq", "rnoqp", "rnpoq", "rnpqo", "rnqop",=
=20
"rnqpo", "ronpq", "ronqp", "ropnq", "ropqn", "roqnp", "roqpn", "rpnoq",=
=20
"rpnqo", "rponq", "rpoqn", "rpqno", "rpqon", "rqnop", "rqnpo", "rqonp",=
=20
"rqopn", "rqpno", "rqpon" };

int main()
{
int i, mismatches, status;
FILE *handle;
char command[1024];

for (i =3D 0; i < sizeof permutation / sizeof (char *); i++) {
mismatches =3D -1; // Safety
sprintf(command, "/sbin/mdadm --create /dev/md4=20
--assume-clean -R -e 1.2 -l 6 -n 10 -c 64 /dev/sda1 /dev/sd%c1 /dev/sdc=
1=20
/dev/sdd1 /dev/sd%c1 /dev/sdm1 /dev/sd%c1 /dev/sd%c1 /dev/sd%c1 /dev/sd=
b1",
permutation[i][0], permutation[i][1],=20
permutation[i][2], permutation[i][3], permutation[i][4]);
printf("Running: %s\n", command);
status =3D system(command);
if (WEXITSTATUS(status) !=3D 0) {
printf("Command error\n");
return;
}
sleep(1);
handle =3D fopen("/sys/block/md4/md/sync_action", "w")=
;
fprintf(handle, "check\n");
fclose(handle);
sleep(5);
handle =3D fopen("/sys/block/md4/md/mismatch_cnt", "r"=
);
fscanf(handle, "%d", &mismatches);
fclose(handle);
printf("Permutation %s =3D %d mismatches\n",=20
permutation[i], mismatches);
fflush(stdout);
sprintf(command, "/sbin/mdadm --stop /dev/md4");
printf("Running: %s\n", command);
status =3D system(command);
if (WEXITSTATUS(status) !=3D 0) {
printf("Command error\n");
return;
}
sleep(1);

}
}

The permutations I got from an online permutation generator:

http://users.telenet.be/vdmoortel/dirk/Maths/permutations.ht ml

Didn't feel like writing that part of the algorithm.

--Bart


On 1/15/2011 4:05 PM, J=E9r=F4me Poulin wrote:
> On Sat, Jan 15, 2011 at 2:50 PM, Bart Kus wrote:
>> Some research has revealed a frightening solution:
>>
>> http://forums.gentoo.org/viewtopic-t-716757-start-0.html
>>
>> That thread calls upon mdadm --create with the --assume-clean flag. =
It also
>> seems to re-enforce my suspicions that MD has lost my device order n=
umbers
>> when it marked the drives as spare (thanks, MD! Remind me to get yo=
u a nice
>> christmas present next year.). I know the order of 5 out of 10 devi=
ces, so
>> that leaves 120 permutations to try. I've whipped up some software =
to
>> generate all the permuted mdadm --create commands.
>>
>> The question now: how do I test if I've got the right combination? =
Can I dd
>> a meg off the assembled array and check for errors somewhere?
> I guess running a read-only fsck is the best way to proove it working=

>
>> The other question: Is testing incorrect combinations destructive to=
any
>> data on the drives? Like, would RAID6 kick in and start "fixing" pa=
rity
>> errors, even if I'm just reading?
>>
> If you don't want to risk your data, you could create a cowloop of
> each device before writing to it, or dm snapshot using dmsetup.
>
> I made a script for dmsetup snapshot on the side when I really needed
> it because cowloop wouldn't compile. Here it is, it should help you
> understand how it works!
>
>
> RODATA=3D$1
> shift
> COWFILE=3D$1
> shift
> FSIZE=3D$1
> shift
> PREFIX=3D$1
> shift
>
> if [ -z $RODATA ] || [ -z $COWFILE ] || [ -z $FSIZE ] || [ ! -z $5 ]
> then
> echo "Usage: $0 [read only device] [loop file] [size of loop in MB] =
{prefix}"
> echo "Read only device won't ever get a write."
> echo "Loop file can be a file or device where writes will be directe=
d too."
> echo "Size is specified in MB, you will be able to write that much
> change to the device created."
> echo "Prefix will get prepended to all devices created by this scrip=
t
> in /dev/mapper"
> exit -1
> fi
>
> MRODATA=3D$PREFIX${RODATA#/dev/}data
> COWFILELOOP=3D$(losetup -f)
> MCOWFILE=3D$PREFIX${RODATA#/dev/}cow
> MSNAPSHOT=3D$PREFIX${RODATA#/dev/}snap
>
>
> dd if=3D/dev/zero of=3D$COWFILE bs=3D1M seek=3D$FSIZE count=3D1
> losetup $COWFILELOOP $COWFILE
> echo "0 $(blockdev --getsz $RODATA) linear $RODATA 0" | dmsetup creat=
e $MRODATA
> echo "0 $(blockdev --getsz $COWFILELOOP) linear $COWFILELOOP 0" |
> dmsetup create $MCOWFILE
> echo "0 $(blockdev --getsz /dev/mapper/$MRODATA) snapshot
> /dev/mapper/$MRODATA /dev/mapper/$MCOWFILE p 64" | dmsetup create
> $MSNAPSHOT
>
> echo "You can now use $MSNAPSHOT for your tests, up to ${FSIZE}MB."
> exit 0
>
>
>> --Bart
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html