How do I determine which drive should be in which slot?

How do I determine which drive should be in which slot?

am 26.06.2010 07:32:08 von Dave W

After making some changes in the bios to my boot order, my 5-drive RAID6
stopped assembling. Three of the five known-good drives suddenly stopped
getting added to the right slots. There are no disk errors in the logs and
I can read from the drives with no problems.

# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
# mdadm -v -Af /dev/md0 /dev/sd[bcdef]1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 7.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 6.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 5.
mdadm: no uptodate device for slot 0 of /dev/md0
mdadm: no uptodate device for slot 1 of /dev/md0
mdadm: added /dev/sde1 to /dev/md0 as 3
mdadm: no uptodate device for slot 4 of /dev/md0
mdadm: added /dev/sdf1 to /dev/md0 as 5
mdadm: added /dev/sdc1 to /dev/md0 as 6
mdadm: added /dev/sdb1 to /dev/md0 as 7
mdadm: added /dev/sdd1 to /dev/md0 as 2
mdadm: /dev/md0 assembled from 2 drives and 3 spares - not enough to start the
array.
#

Reading the thread at http://thread.gmane.org/gmane.linux.raid/17774, I think
that a command like this:

mdadm -C /dev/md0 -l6 -n5 /dev/sd[bcdef]1

would tell mdadm to rewrite the superblock with the disks in the right slots.
Does the order on the command line determine the slot allocations? How can I
tell that my disks will get put in the right slots? Is it safe to just try
this order?

Thanks,
Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: How do I determine which drive should be in which slot?

am 29.06.2010 08:26:17 von Dave W

Previously, I wrote:

> After making some changes in the bios to my boot order, my 5-drive RAID6
> stopped assembling. Three of the five known-good drives suddenly stopped
> getting added to the right slots. There are no disk errors in the logs and
> I can read from the drives with no problems.

Any help with this one? My disks are fine but mdadm seems confused about which
slots they're supposed to be in. How can I tell mdadm to put them in the right
slots?

Thanks,
Dave

# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
# mdadm -v -Af /dev/md0 /dev/sd[bcdef]1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 7.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 6.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 5.
mdadm: no uptodate device for slot 0 of /dev/md0
mdadm: no uptodate device for slot 1 of /dev/md0
mdadm: added /dev/sde1 to /dev/md0 as 3
mdadm: no uptodate device for slot 4 of /dev/md0
mdadm: added /dev/sdf1 to /dev/md0 as 5
mdadm: added /dev/sdc1 to /dev/md0 as 6
mdadm: added /dev/sdb1 to /dev/md0 as 7
mdadm: added /dev/sdd1 to /dev/md0 as 2
mdadm: /dev/md0 assembled from 2 drives and 3 spares - not enough to start the
array.
#


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: How do I determine which drive should be in which slot?

am 29.06.2010 09:02:09 von NeilBrown

On Tue, 29 Jun 2010 06:26:17 +0000 (UTC)
Dave W wrote:

> Previously, I wrote:
>
> > After making some changes in the bios to my boot order, my 5-drive RAID6
> > stopped assembling. Three of the five known-good drives suddenly stopped
> > getting added to the right slots. There are no disk errors in the logs and
> > I can read from the drives with no problems.
>
> Any help with this one? My disks are fine but mdadm seems confused about which
> slots they're supposed to be in. How can I tell mdadm to put them in the right
> slots?

This is very odd.... that should not happen. I think I've seen a few reports
of something like that happening and I'm beginning to wonder if I broke
something subtle....
What kernel/mdadm version are you using.

You should use "mdadm --examine" to see the configuration of the array, and
make sure that configuration is copied exactly when you creat a new array -
same chunk size, same layout, same metadata version etc.

You need to list the 5 drives in the correct order, from slot 0 to slot 4.
Clearly d1 is2 and e1 is 3.
Presumably b1 is 0, c1 is 1, f1 is 4.

So a command like:
mdadm --create /dev/md0 --assume-clean --metadata=XX --chunk=XX --level=6
--raid-devices=5 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1

is likely to work (if you get all the 'xx' right).
Keep a copy of the "mdadm --examine" output and compare it with the output
after runing the --create and make sure everything is still the same (e.g.
Data Offset could be changed - that would be awkward).

Then try "fsck -n -f /dev/md0" to see if fsck thinks it is OK.
If it is, try mounting "-o ro" and check that the data looks OK.

If it passed all that you should be fine - you might like to
echo repair > /sys/block/md0/md/sync_action
to make sure all the parity blocks are correct.

If 'fsck' fails, you might like to try again, re-arranging the devices that
you aren't sure of.

Good luck.

NeilBrown


>
> Thanks,
> Dave
>
> # mdadm --stop /dev/md0
> mdadm: stopped /dev/md0
> # mdadm -v -Af /dev/md0 /dev/sd[bcdef]1
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 7.
> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 6.
> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 5.
> mdadm: no uptodate device for slot 0 of /dev/md0
> mdadm: no uptodate device for slot 1 of /dev/md0
> mdadm: added /dev/sde1 to /dev/md0 as 3
> mdadm: no uptodate device for slot 4 of /dev/md0
> mdadm: added /dev/sdf1 to /dev/md0 as 5
> mdadm: added /dev/sdc1 to /dev/md0 as 6
> mdadm: added /dev/sdb1 to /dev/md0 as 7
> mdadm: added /dev/sdd1 to /dev/md0 as 2
> mdadm: /dev/md0 assembled from 2 drives and 3 spares - not enough to start the
> array.
> #
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: How do I determine which drive should be in which slot?

am 29.06.2010 18:00:57 von Dave W

Neil Brown suse.de> writes:

> > How can I tell mdadm to put them in the right slots?
>
> This is very odd.... that should not happen. I think I've seen a few reports
> of something like that happening and I'm beginning to wonder if I broke
> something subtle....
> What kernel/mdadm version are you using.

# uname -a
Linux fileserver.whome 2.6.27.24-170.2.68.fc10.i686 #1 SMP Wed May 20 23:10:16
EDT 2009 i686 i686 i386 GNU/Linux
# mdadm --version
mdadm - v2.6.7.1 - 15th October 2008


> You should use "mdadm --examine" to see the configuration of the array, and
> make sure that configuration is copied exactly when you creat a new array -
> same chunk size, same layout, same metadata version etc.

I don't know what metadata version refers to. I don't see it in the
"mdadm --examine" output.


> Keep a copy of the "mdadm --examine" output and compare it with the output
> after runing the --create and make sure everything is still the same (e.g.
> Data Offset could be changed - that would be awkward).

I also don't see Data Offset in the --examine output. I wonder if I should
upgrade to a newer mdadm? Or is that something that I can only see if I
run --examine on a running array?
at the data looks OK.

> If 'fsck' fails, you might like to try again, re-arranging the devices that
> you aren't sure of.

OK, it sounds like you're saying that the --create command won't hurt anything
that I can't fix by running it again. It is truly safe that way?

Here is the /proc/mdstat and the --examine output:


# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdd1[2](S) sdb1[7](S) sdc1[6](S) sdf1[5](S) sde1[3](S)
9767559680 blocks

unused devices:
# mdadm --examine /dev/sd[bcdef]1
/dev/sdb1:
Magic : a92b4efc
Version : 0.90.00
UUID : ddebe2dc:9c023128:eb301463:823c4aa0
Creation Time : Sun May 2 01:30:10 2010
Raid Level : raid6
Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0

Update Time : Fri Jun 11 05:18:13 2010
State : clean
Active Devices : 2
Working Devices : 5
Failed Devices : 2
Spare Devices : 3
Checksum : 9ce7de69 - correct
Events : 48984

Chunk Size : 64K

Number Major Minor RaidDevice State
this 7 8 17 7 spare /dev/sdb1

0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 0 0 4 faulty removed
5 5 8 81 5 spare /dev/sdf1
6 6 8 33 6 spare /dev/sdc1
7 7 8 17 7 spare /dev/sdb1
/dev/sdc1:
Magic : a92b4efc
Version : 0.90.00
UUID : ddebe2dc:9c023128:eb301463:823c4aa0
Creation Time : Sun May 2 01:30:10 2010
Raid Level : raid6
Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0

Update Time : Fri Jun 11 05:18:13 2010
State : clean
Active Devices : 2
Working Devices : 5
Failed Devices : 2
Spare Devices : 3
Checksum : 9ce7de77 - correct
Events : 48984

Chunk Size : 64K

Number Major Minor RaidDevice State
this 6 8 33 6 spare /dev/sdc1

0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 0 0 4 faulty removed
5 5 8 81 5 spare /dev/sdf1
6 6 8 33 6 spare /dev/sdc1
7 7 8 17 7 spare /dev/sdb1
/dev/sdd1:
Magic : a92b4efc
Version : 0.90.00
UUID : ddebe2dc:9c023128:eb301463:823c4aa0
Creation Time : Sun May 2 01:30:10 2010
Raid Level : raid6
Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0

Update Time : Fri Jun 11 05:18:13 2010
State : clean
Active Devices : 2
Working Devices : 5
Failed Devices : 2
Spare Devices : 3
Checksum : 9ce7de85 - correct
Events : 48984

Chunk Size : 64K

Number Major Minor RaidDevice State
this 2 8 49 2 active sync /dev/sdd1

0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 0 0 4 faulty removed
5 5 8 81 5 spare /dev/sdf1
6 6 8 33 6 spare /dev/sdc1
7 7 8 17 7 spare /dev/sdb1
/dev/sde1:
Magic : a92b4efc
Version : 0.90.00
UUID : ddebe2dc:9c023128:eb301463:823c4aa0
Creation Time : Sun May 2 01:30:10 2010
Raid Level : raid6
Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0

Update Time : Fri Jun 11 05:18:13 2010
State : clean
Active Devices : 2
Working Devices : 5
Failed Devices : 2
Spare Devices : 3
Checksum : 9ce7de97 - correct
Events : 48984

Chunk Size : 64K

Number Major Minor RaidDevice State
this 3 8 65 3 active sync /dev/sde1

0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 0 0 4 faulty removed
5 5 8 81 5 spare /dev/sdf1
6 6 8 33 6 spare /dev/sdc1
7 7 8 17 7 spare /dev/sdb1
/dev/sdf1:
Magic : a92b4efc
Version : 0.90.00
UUID : ddebe2dc:9c023128:eb301463:823c4aa0
Creation Time : Sun May 2 01:30:10 2010
Raid Level : raid6
Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0

Update Time : Fri Jun 11 05:18:13 2010
State : clean
Active Devices : 2
Working Devices : 5
Failed Devices : 2
Spare Devices : 3
Checksum : 9ce7dea5 - correct
Events : 48984

Chunk Size : 64K

Number Major Minor RaidDevice State
this 5 8 81 5 spare /dev/sdf1

0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 0 0 4 faulty removed
5 5 8 81 5 spare /dev/sdf1
6 6 8 33 6 spare /dev/sdc1
7 7 8 17 7 spare /dev/sdb1
#



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: How do I determine which drive should be in which slot?

am 30.06.2010 03:02:28 von NeilBrown

On Tue, 29 Jun 2010 16:00:57 +0000 (UTC)
Dave W wrote:

> Neil Brown suse.de> writes:
>
> > > How can I tell mdadm to put them in the right slots?
> >
> > This is very odd.... that should not happen. I think I've seen a few reports
> > of something like that happening and I'm beginning to wonder if I broke
> > something subtle....
> > What kernel/mdadm version are you using.
>
> # uname -a
> Linux fileserver.whome 2.6.27.24-170.2.68.fc10.i686 #1 SMP Wed May 20 23:10:16
> EDT 2009 i686 i686 i386 GNU/Linux
> # mdadm --version
> mdadm - v2.6.7.1 - 15th October 2008

Hmmm... I obviously didn't introduce it recently then. That is a little bit
encouraging.

>
>
> > You should use "mdadm --examine" to see the configuration of the array, and
> > make sure that configuration is copied exactly when you creat a new array -
> > same chunk size, same layout, same metadata version etc.
>
> I don't know what metadata version refers to. I don't see it in the
> "mdadm --examine" output.

It is the "Version : " field. 0.90 in your case.

>
>
> > Keep a copy of the "mdadm --examine" output and compare it with the output
> > after runing the --create and make sure everything is still the same (e.g.
> > Data Offset could be changed - that would be awkward).
>
> I also don't see Data Offset in the --examine output. I wonder if I should
> upgrade to a newer mdadm? Or is that something that I can only see if I
> run --examine on a running array?
> at the data looks OK.

"Data Offset" is only present in 1.x metadata. As you have 0.90 you won't
see it and it cannot change. so you are safe from that.

--examine reports the same info whether the array is running or not.

>
> > If 'fsck' fails, you might like to try again, re-arranging the devices that
> > you aren't sure of.
>
> OK, it sounds like you're saying that the --create command won't hurt anything
> that I can't fix by running it again. It is truly safe that way?

Almost.
If you run
--create --metadata=1.1
on devices what were part of a 0.90 array, then as the 1.1 superblock is
written at the start of the device, and 0.90 puts data at the start of the
device, you would get corruption.

But if you use the same --metadata= and use --assume-clean and don't specify
a bitmap, then --create will over-write the metadata but not touch the data
at all.


>
> Here is the /proc/mdstat and the --examine output:

So you probably want

mdadm --create /dev/md -l6 -n5 --chunk 64 --assume-clean --metadata=0.90 \
/dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html