RAID showing all devices as spares after partial unplug

RAID showing all devices as spares after partial unplug

am 17.09.2011 22:39:23 von Mike Hartman

I have 11 drives in a RAID 6 array. 6 are plugged into one esata
enclosure, the other 4 are in another. These esata cables are prone to
loosening when I'm working on nearby hardware.

If that happens and I start the host up, big chunks of the array are
missing and things could get ugly. Thus I cooked up a custom startup
script that verifies each device is present before starting the array
with

mdadm --assemble --no-degraded -u 4fd7659f:12044eff:ba25240d:
de22249d /dev/md3

So I thought I was covered. In case something got unplugged I would
see the array failing to start at boot and I could shut down, fix the
cables and try again. However, I hit a new scenario today where one of
the plugs was loosened while everything was turned on.

The good news is that there should have been no activity on the array
when this happened, particularly write activity. It's a big media
partition and sees much less writing then reading. I'm also the only
one that uses it and I know I wasn't transferring anything. The system
also seems to have immediately marked the filesystem read-only,
because I discovered the issue when I went to write to it later and
got a "read-only filesystem" error. So I believe the state of the
drives should be the same - nothing should be out of sync.

However, I shut the system down, fixed the cables and brought it back
up. All the devices are detected by my script and it tries to start
the array with the command I posted above, but I've ended up with
this:

md0 : inactive sdn1[1](S) sdj1[9](S) sdm1[10](S) sdl1[11](S)
sdk1[12](S) md3p1[8](S) sdc1[6](S) sdd1[5](S) md1p1[4](S) sdf1[3](S)
sdh1[0](S)
    =A0 16113893731 blocks super 1.2

Instead of all coming back up, or still showing the unplugged drives
missing, everything is a spare? I'm suitably disturbed.

It seems to me that if the data on the drives still reflects the
last-good data from the array (and since no writing was going on it
should) then this is just a matter of some metadata getting messed up
and it should be fixable. Can someone please walk me through the
commands to do that?

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID showing all devices as spares after partial unplug

am 18.09.2011 00:16:27 von Mike Hartman

I should add that the mdadm command in question actually ends in
/dev/md0, not /dev/md3 (that's for another array). So the device name
for the array I'm seeing in mdstat DOES match the one in the assemble
command.

On Sat, Sep 17, 2011 at 4:39 PM, Mike Hartman m> wrote:
> I have 11 drives in a RAID 6 array. 6 are plugged into one esata
> enclosure, the other 4 are in another. These esata cables are prone t=
o
> loosening when I'm working on nearby hardware.
>
> If that happens and I start the host up, big chunks of the array are
> missing and things could get ugly. Thus I cooked up a custom startup
> script that verifies each device is present before starting the array
> with
>
> mdadm --assemble --no-degraded -u 4fd7659f:12044eff:ba25240d:
> de22249d /dev/md3
>
> So I thought I was covered. In case something got unplugged I would
> see the array failing to start at boot and I could shut down, fix the
> cables and try again. However, I hit a new scenario today where one o=
f
> the plugs was loosened while everything was turned on.
>
> The good news is that there should have been no activity on the array
> when this happened, particularly write activity. It's a big media
> partition and sees much less writing then reading. I'm also the only
> one that uses it and I know I wasn't transferring anything. The syste=
m
> also seems to have immediately marked the filesystem read-only,
> because I discovered the issue when I went to write to it later and
> got a "read-only filesystem" error. So I believe the state of the
> drives should be the same - nothing should be out of sync.
>
> However, I shut the system down, fixed the cables and brought it back
> up. All the devices are detected by my script and it tries to start
> the array with the command I posted above, but I've ended up with
> this:
>
> md0 : inactive sdn1[1](S) sdj1[9](S) sdm1[10](S) sdl1[11](S)
> sdk1[12](S) md3p1[8](S) sdc1[6](S) sdd1[5](S) md1p1[4](S) sdf1[3](S)
> sdh1[0](S)
>     =A0 16113893731 blocks super 1.2
>
> Instead of all coming back up, or still showing the unplugged drives
> missing, everything is a spare? I'm suitably disturbed.
>
> It seems to me that if the data on the drives still reflects the
> last-good data from the array (and since no writing was going on it
> should) then this is just a matter of some metadata getting messed up
> and it should be fixable. Can someone please walk me through the
> commands to do that?
>
> Mike
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID showing all devices as spares after partial unplug

am 18.09.2011 03:16:50 von Jim Schatzman

Mike-

I have seen very similar problems. I regret that electronics engineers cannot design more secure connectors. eSata connector are terrible - they come loose at the slightest tug. For this reason, I am gradually abandoning eSata enclosures and going to internal drives only. Fortunately, there are some inexpensive RAID chassis available now.

I tried the same thing as you. I removed the array(s) from mdadm.conf and I wrote a script for "/etc/cron.reboot" which assembles the array, "no-degraded". Doing this seems to minimize the damage caused by drives prior to a reboot. However, if the drives are disconnected while Linux is up, then either the array will stay up but some drives will become stale or the array will be stopped. The behavior I usually see is that all the drives that went offline now become "spare".

It would be nice if md would just reassemble the array once all the drives come back online. Unfortunately, it doesn't. I would run mdadm -E against all the drives/partitions, verifying that the metadata all indicates that they are/were part of the expected array. At that point, you should be able ro re-create the RAID. Be sure you list the drives in the correct order. Once the array is going again, mount the resulting partitions RO and verify that the data is o.k. before going RW.

Jim









At 04:16 PM 9/17/2011, Mike Hartman wrote:
>I should add that the mdadm command in question actually ends in
>/dev/md0, not /dev/md3 (that's for another array). So the device name
>for the array I'm seeing in mdstat DOES match the one in the assemble
>command.
>
>On Sat, Sep 17, 2011 at 4:39 PM, Mike Hartman wrote:
>> I have 11 drives in a RAID 6 array. 6 are plugged into one esata
>> enclosure, the other 4 are in another. These esata cables are prone to
>> loosening when I'm working on nearby hardware.
>>
>> If that happens and I start the host up, big chunks of the array are
>> missing and things could get ugly. Thus I cooked up a custom startup
>> script that verifies each device is present before starting the array
>> with
>>
>> mdadm --assemble --no-degraded -u 4fd7659f:12044eff:ba25240d:
>> de22249d /dev/md3
>>
>> So I thought I was covered. In case something got unplugged I would
>> see the array failing to start at boot and I could shut down, fix the
>> cables and try again. However, I hit a new scenario today where one of
>> the plugs was loosened while everything was turned on.
>>
>> The good news is that there should have been no activity on the array
>> when this happened, particularly write activity. It's a big media
>> partition and sees much less writing then reading. I'm also the only
>> one that uses it and I know I wasn't transferring anything. The system
>> also seems to have immediately marked the filesystem read-only,
>> because I discovered the issue when I went to write to it later and
>> got a "read-only filesystem" error. So I believe the state of the
>> drives should be the same - nothing should be out of sync.
>>
>> However, I shut the system down, fixed the cables and brought it back
>> up. All the devices are detected by my script and it tries to start
>> the array with the command I posted above, but I've ended up with
>> this:
>>
>> md0 : inactive sdn1[1](S) sdj1[9](S) sdm1[10](S) sdl1[11](S)
>> sdk1[12](S) md3p1[8](S) sdc1[6](S) sdd1[5](S) md1p1[4](S) sdf1[3](S)
>> sdh1[0](S)
>> 16113893731 blocks super 1.2
>>
>> Instead of all coming back up, or still showing the unplugged drives
>> missing, everything is a spare? I'm suitably disturbed.
>>
>> It seems to me that if the data on the drives still reflects the
>> last-good data from the array (and since no writing was going on it
>> should) then this is just a matter of some metadata getting messed up
>> and it should be fixable. Can someone please walk me through the
>> commands to do that?
>>
>> Mike
>>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID showing all devices as spares after partial unplug

am 18.09.2011 03:34:29 von Mike Hartman

On Sat, Sep 17, 2011 at 9:16 PM, Jim Schatzman
wrote:
> Mike-
>
> I have seen very similar problems. I regret that electronics engineer=
s cannot design more secure connectors. eSata connector are terrible - =
they come loose at the slightest tug. For this reason, I am gradually a=
bandoning eSata enclosures and going to internal drives only. Fortunate=
ly, there are some inexpensive RAID chassis available now.
>
> I tried the same thing as you. I removed the array(s) from mdadm.conf=
and I wrote a script for "/etc/cron.reboot" which assembles the array,=
"no-degraded". Doing this seems to minimize the damage caused by drive=
s prior to a reboot. However, if the drives are disconnected while Linu=
x is up, then either the array will stay up but some drives will become=
stale or the array will be stopped. The behavior I usually see is that=
all the drives that went offline now become "spare".
>

That sounds similar, although I only had 4/11 go offline and now
they're ALL spare.

> It would be nice if md would just reassemble the array once all the d=
rives come back online. Unfortunately, it doesn't. I would run mdadm -E=
against all the drives/partitions, verifying that the metadata all ind=
icates that they are/were part of the expected array.

I ran mdadm -E and they all correctly appear as part of the array:

for d in /dev/sd[cdfhjklmn]1 /dev/md1p1 /dev/md3p1; do echo $d; mdadm
-E $d | grep Role; done

/dev/sdc1
Device Role : Active device 5
/dev/sdd1
Device Role : Active device 4
/dev/sdf1
Device Role : Active device 2
/dev/sdh1
Device Role : Active device 0
/dev/sdj1
Device Role : Active device 10
/dev/sdk1
Device Role : Active device 7
/dev/sdl1
Device Role : Active device 8
/dev/sdm1
Device Role : Active device 9
/dev/sdn1
Device Role : Active device 1
/dev/md1p1
Device Role : Active device 3
/dev/md3p1
Device Role : Active device 6

But they have varying event counts (although all pretty close together)=
:

for d in /dev/sd[cdfhjklmn]1 /dev/md1p1 /dev/md3p1; do echo $d; mdadm
-E $d | grep Event; done

/dev/sdc1
Events : 1756743
/dev/sdd1
Events : 1756743
/dev/sdf1
Events : 1756737
/dev/sdh1
Events : 1756737
/dev/sdj1
Events : 1756743
/dev/sdk1
Events : 1756743
/dev/sdl1
Events : 1756743
/dev/sdm1
Events : 1756743
/dev/sdn1
Events : 1756743
/dev/md1p1
Events : 1756737
/dev/md3p1
Events : 1756740

And they don't seem to agree on the overall status of the array. The
ones that never went down seem to think the array is missing 4 nodes,
while the ones that went down seem to think all the nodes are good:

for d in /dev/sd[cdfhjklmn]1 /dev/md1p1 /dev/md3p1; do echo $d; mdadm
-E $d | grep State; done

/dev/sdc1
State : clean
Array State : .A..AA.AAAA ('A' == active, '.' == missing)
/dev/sdd1
State : clean
Array State : .A..AA.AAAA ('A' == active, '.' == missing)
/dev/sdf1
State : clean
Array State : AAAAAAAAAAA ('A' == active, '.' == missing)
/dev/sdh1
State : clean
Array State : AAAAAAAAAAA ('A' == active, '.' == missing)
/dev/sdj1
State : clean
Array State : .A..AA.AAAA ('A' == active, '.' == missing)
/dev/sdk1
State : clean
Array State : .A..AA.AAAA ('A' == active, '.' == missing)
/dev/sdl1
State : clean
Array State : .A..AA.AAAA ('A' == active, '.' == missing)
/dev/sdm1
State : clean
Array State : .A..AA.AAAA ('A' == active, '.' == missing)
/dev/sdn1
State : clean
Array State : .A..AA.AAAA ('A' == active, '.' == missing)
/dev/md1p1
State : clean
Array State : AAAAAAAAAAA ('A' == active, '.' == missing)
/dev/md3p1
State : clean
Array State : .A..AAAAAAA ('A' == active, '.' == missing)

So it seems like overall the array is intact, I just need to convince
it of that fact.

> At that point, you should be able ro re-create the RAID. Be sure you =
list the drives in the correct order. Once the array is going again, mo=
unt the resulting partitions RO and verify that the data is o.k. before=
going RW.

Could you be more specific about how exactly I should re-create the
RAID? Should I just do --assemble --force?

>
> Jim
>
>
>
>
>
>
>
>
>
> At 04:16 PM 9/17/2011, Mike Hartman wrote:
>>I should add that the mdadm command in question actually ends in
>>/dev/md0, not /dev/md3 (that's for another array). So the device name
>>for the array I'm seeing in mdstat DOES match the one in the assemble
>>command.
>>
>>On Sat, Sep 17, 2011 at 4:39 PM, Mike Hartman com> wrote:
>>> I have 11 drives in a RAID 6 array. 6 are plugged into one esata
>>> enclosure, the other 4 are in another. These esata cables are prone=
to
>>> loosening when I'm working on nearby hardware.
>>>
>>> If that happens and I start the host up, big chunks of the array ar=
e
>>> missing and things could get ugly. Thus I cooked up a custom startu=
p
>>> script that verifies each device is present before starting the arr=
ay
>>> with
>>>
>>> mdadm --assemble --no-degraded -u 4fd7659f:12044eff:ba25240d:
>>> de22249d /dev/md3
>>>
>>> So I thought I was covered. In case something got unplugged I would
>>> see the array failing to start at boot and I could shut down, fix t=
he
>>> cables and try again. However, I hit a new scenario today where one=
of
>>> the plugs was loosened while everything was turned on.
>>>
>>> The good news is that there should have been no activity on the arr=
ay
>>> when this happened, particularly write activity. It's a big media
>>> partition and sees much less writing then reading. I'm also the onl=
y
>>> one that uses it and I know I wasn't transferring anything. The sys=
tem
>>> also seems to have immediately marked the filesystem read-only,
>>> because I discovered the issue when I went to write to it later and
>>> got a "read-only filesystem" error. So I believe the state of the
>>> drives should be the same - nothing should be out of sync.
>>>
>>> However, I shut the system down, fixed the cables and brought it ba=
ck
>>> up. All the devices are detected by my script and it tries to start
>>> the array with the command I posted above, but I've ended up with
>>> this:
>>>
>>> md0 : inactive sdn1[1](S) sdj1[9](S) sdm1[10](S) sdl1[11](S)
>>> sdk1[12](S) md3p1[8](S) sdc1[6](S) sdd1[5](S) md1p1[4](S) sdf1[3](S=
)
>>> sdh1[0](S)
>>> =A0 =A0 =A0 16113893731 blocks super 1.2
>>>
>>> Instead of all coming back up, or still showing the unplugged drive=
s
>>> missing, everything is a spare? I'm suitably disturbed.
>>>
>>> It seems to me that if the data on the drives still reflects the
>>> last-good data from the array (and since no writing was going on it
>>> should) then this is just a matter of some metadata getting messed =
up
>>> and it should be fixable. Can someone please walk me through the
>>> commands to do that?
>>>
>>> Mike
>>>
>>--
>>To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID showing all devices as spares after partial unplug

am 18.09.2011 04:57:40 von Jim Schatzman

Mike-

See my response below.

Good luck!

Jim


At 07:34 PM 9/17/2011, Mike Hartman wrote:
>On Sat, Sep 17, 2011 at 9:16 PM, Jim Schatzman
> wrote:
>> Mike-
>>
>> I have seen very similar problems. I regret that electronics engineers cannot design more secure connectors. eSata connector are terrible - they come loose at the slightest tug. For this reason, I am gradually abandoning eSata enclosures and going to internal drives only. Fortunately, there are some inexpensive RAID chassis available now.
>>
>> I tried the same thing as you. I removed the array(s) from mdadm.conf and I wrote a script for "/etc/cron.reboot" which assembles the array, "no-degraded". Doing this seems to minimize the damage caused by drives prior to a reboot. However, if the drives are disconnected while Linux is up, then either the array will stay up but some drives will become stale or the array will be stopped. The behavior I usually see is that all the drives that went offline now become "spare".
>>
>
>That sounds similar, although I only had 4/11 go offline and now
>they're ALL spare.
>
>> It would be nice if md would just reassemble the array once all the drives come back online. Unfortunately, it doesn't. I would run mdadm -E against all the drives/partitions, verifying that the metadata all indicates that they are/were part of the expected array.
>
>I ran mdadm -E and they all correctly appear as part of the array:
>
>for d in /dev/sd[cdfhjklmn]1 /dev/md1p1 /dev/md3p1; do echo $d; mdadm
>-E $d | grep Role; done
>
>/dev/sdc1
> Device Role : Active device 5
>/dev/sdd1
> Device Role : Active device 4
>/dev/sdf1
> Device Role : Active device 2
>/dev/sdh1
> Device Role : Active device 0
>/dev/sdj1
> Device Role : Active device 10
>/dev/sdk1
> Device Role : Active device 7
>/dev/sdl1
> Device Role : Active device 8
>/dev/sdm1
> Device Role : Active device 9
>/dev/sdn1
> Device Role : Active device 1
>/dev/md1p1
> Device Role : Active device 3
>/dev/md3p1
> Device Role : Active device 6
>
>But they have varying event counts (although all pretty close together):
>
>for d in /dev/sd[cdfhjklmn]1 /dev/md1p1 /dev/md3p1; do echo $d; mdadm
>-E $d | grep Event; done
>
>/dev/sdc1
> Events : 1756743
>/dev/sdd1
> Events : 1756743
>/dev/sdf1
> Events : 1756737
>/dev/sdh1
> Events : 1756737
>/dev/sdj1
> Events : 1756743
>/dev/sdk1
> Events : 1756743
>/dev/sdl1
> Events : 1756743
>/dev/sdm1
> Events : 1756743
>/dev/sdn1
> Events : 1756743
>/dev/md1p1
> Events : 1756737
>/dev/md3p1
> Events : 1756740
>
>And they don't seem to agree on the overall status of the array. The
>ones that never went down seem to think the array is missing 4 nodes,
>while the ones that went down seem to think all the nodes are good:
>
>for d in /dev/sd[cdfhjklmn]1 /dev/md1p1 /dev/md3p1; do echo $d; mdadm
>-E $d | grep State; done
>
>/dev/sdc1
> State : clean
> Array State : .A..AA.AAAA ('A' == active, '.' == missing)
>/dev/sdd1
> State : clean
> Array State : .A..AA.AAAA ('A' == active, '.' == missing)
>/dev/sdf1
> State : clean
> Array State : AAAAAAAAAAA ('A' == active, '.' == missing)
>/dev/sdh1
> State : clean
> Array State : AAAAAAAAAAA ('A' == active, '.' == missing)
>/dev/sdj1
> State : clean
> Array State : .A..AA.AAAA ('A' == active, '.' == missing)
>/dev/sdk1
> State : clean
> Array State : .A..AA.AAAA ('A' == active, '.' == missing)
>/dev/sdl1
> State : clean
> Array State : .A..AA.AAAA ('A' == active, '.' == missing)
>/dev/sdm1
> State : clean
> Array State : .A..AA.AAAA ('A' == active, '.' == missing)
>/dev/sdn1
> State : clean
> Array State : .A..AA.AAAA ('A' == active, '.' == missing)
>/dev/md1p1
> State : clean
> Array State : AAAAAAAAAAA ('A' == active, '.' == missing)
>/dev/md3p1
> State : clean
> Array State : .A..AAAAAAA ('A' == active, '.' == missing)
>
>So it seems like overall the array is intact, I just need to convince
>it of that fact.
>
>> At that point, you should be able ro re-create the RAID. Be sure you list the drives in the correct order. Once the array is going again, mount the resulting partitions RO and verify that the data is o.k. before going RW.
>
>Could you be more specific about how exactly I should re-create the
>RAID? Should I just do --assemble --force?



--> No. As far as I know, you have to use "-C"/"--create". You need to use exactly the same array parameters that were used to create the array the first time. Same metadata version. Same stripe size. Raid mode the same. Physical devices in the same order.

Why do you have to use "--create", and thus open the door for catastropic error?? I have asked the same question myself. Maybe, if more people ping Neil Brown on this, he may be willing to find another way.



>>
>> Jim
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> At 04:16 PM 9/17/2011, Mike Hartman wrote:
>>>I should add that the mdadm command in question actually ends in
>>>/dev/md0, not /dev/md3 (that's for another array). So the device name
>>>for the array I'm seeing in mdstat DOES match the one in the assemble
>>>command.
>>>
>>>On Sat, Sep 17, 2011 at 4:39 PM, Mike Hartman wrote:
>>>> I have 11 drives in a RAID 6 array. 6 are plugged into one esata
>>>> enclosure, the other 4 are in another. These esata cables are prone to
>>>> loosening when I'm working on nearby hardware.
>>>>
>>>> If that happens and I start the host up, big chunks of the array are
>>>> missing and things could get ugly. Thus I cooked up a custom startup
>>>> script that verifies each device is present before starting the array
>>>> with
>>>>
>>>> mdadm --assemble --no-degraded -u 4fd7659f:12044eff:ba25240d:
>>>> de22249d /dev/md3
>>>>
>>>> So I thought I was covered. In case something got unplugged I would
>>>> see the array failing to start at boot and I could shut down, fix the
>>>> cables and try again. However, I hit a new scenario today where one of
>>>> the plugs was loosened while everything was turned on.
>>>>
>>>> The good news is that there should have been no activity on the array
>>>> when this happened, particularly write activity. It's a big media
>>>> partition and sees much less writing then reading. I'm also the only
>>>> one that uses it and I know I wasn't transferring anything. The system
>>>> also seems to have immediately marked the filesystem read-only,
>>>> because I discovered the issue when I went to write to it later and
>>>> got a "read-only filesystem" error. So I believe the state of the
>>>> drives should be the same - nothing should be out of sync.
>>>>
>>>> However, I shut the system down, fixed the cables and brought it back
>>>> up. All the devices are detected by my script and it tries to start
>>>> the array with the command I posted above, but I've ended up with
>>>> this:
>>>>
>>>> md0 : inactive sdn1[1](S) sdj1[9](S) sdm1[10](S) sdl1[11](S)
>>>> sdk1[12](S) md3p1[8](S) sdc1[6](S) sdd1[5](S) md1p1[4](S) sdf1[3](S)
>>>> sdh1[0](S)
>>>> 16113893731 blocks super 1.2
>>>>
>>>> Instead of all coming back up, or still showing the unplugged drives
>>>> missing, everything is a spare? I'm suitably disturbed.
>>>>
>>>> It seems to me that if the data on the drives still reflects the
>>>> last-good data from the array (and since no writing was going on it
>>>> should) then this is just a matter of some metadata getting messed up
>>>> and it should be fixable. Can someone please walk me through the
>>>> commands to do that?
>>>>
>>>> Mike
>>>>
>>>--
>>>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>the body of a message to majordomo@vger.kernel.org
>>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID showing all devices as spares after partial unplug

am 18.09.2011 05:07:17 von Mike Hartman

Yikes. That's a pretty terrifying prospect.

On Sat, Sep 17, 2011 at 10:57 PM, Jim Schatzman
wrote:
> Mike-
>
> See my response below.
>
> Good luck!
>
> Jim
>
>
> At 07:34 PM 9/17/2011, Mike Hartman wrote:
>>On Sat, Sep 17, 2011 at 9:16 PM, Jim Schatzman
>> wrote:
>>> Mike-
>>>
>>> I have seen very similar problems. I regret that electronics engine=
ers cannot design more secure connectors. eSata connector are terrible =
- they come loose at the slightest tug. For this reason, I am gradually=
abandoning eSata enclosures and going to internal drives only. Fortuna=
tely, there are some inexpensive RAID chassis available now.
>>>
>>> I tried the same thing as you. I removed the array(s) from mdadm.co=
nf and I wrote a script for "/etc/cron.reboot" which assembles the arra=
y, "no-degraded". Doing this seems to minimize the damage caused by dri=
ves prior to a reboot. However, if the drives are disconnected while Li=
nux is up, then either the array will stay up but some drives will beco=
me stale or the array will be stopped. The behavior I usually see is th=
at all the drives that went offline now become "spare".
>>>
>>
>>That sounds similar, although I only had 4/11 go offline and now
>>they're ALL spare.
>>
>>> It would be nice if md would just reassemble the array once all the=
drives come back online. Unfortunately, it doesn't. I would run mdadm =
-E against all the drives/partitions, verifying that the metadata all i=
ndicates that they are/were part of the expected array.
>>
>>I ran mdadm -E and they all correctly appear as part of the array:
>>
>>for d in /dev/sd[cdfhjklmn]1 /dev/md1p1 /dev/md3p1; do echo $d; mdadm
>>-E $d | grep Role; done
>>
>>/dev/sdc1
>> =A0 Device Role : Active device 5
>>/dev/sdd1
>> =A0 Device Role : Active device 4
>>/dev/sdf1
>> =A0 Device Role : Active device 2
>>/dev/sdh1
>> =A0 Device Role : Active device 0
>>/dev/sdj1
>> =A0 Device Role : Active device 10
>>/dev/sdk1
>> =A0 Device Role : Active device 7
>>/dev/sdl1
>> =A0 Device Role : Active device 8
>>/dev/sdm1
>> =A0 Device Role : Active device 9
>>/dev/sdn1
>> =A0 Device Role : Active device 1
>>/dev/md1p1
>> =A0 Device Role : Active device 3
>>/dev/md3p1
>> =A0 Device Role : Active device 6
>>
>>But they have varying event counts (although all pretty close togethe=
r):
>>
>>for d in /dev/sd[cdfhjklmn]1 /dev/md1p1 /dev/md3p1; do echo $d; mdadm
>>-E $d | grep Event; done
>>
>>/dev/sdc1
>> =A0 =A0 =A0 =A0 Events : 1756743
>>/dev/sdd1
>> =A0 =A0 =A0 =A0 Events : 1756743
>>/dev/sdf1
>> =A0 =A0 =A0 =A0 Events : 1756737
>>/dev/sdh1
>> =A0 =A0 =A0 =A0 Events : 1756737
>>/dev/sdj1
>> =A0 =A0 =A0 =A0 Events : 1756743
>>/dev/sdk1
>> =A0 =A0 =A0 =A0 Events : 1756743
>>/dev/sdl1
>> =A0 =A0 =A0 =A0 Events : 1756743
>>/dev/sdm1
>> =A0 =A0 =A0 =A0 Events : 1756743
>>/dev/sdn1
>> =A0 =A0 =A0 =A0 Events : 1756743
>>/dev/md1p1
>> =A0 =A0 =A0 =A0 Events : 1756737
>>/dev/md3p1
>> =A0 =A0 =A0 =A0 Events : 1756740
>>
>>And they don't seem to agree on the overall status of the array. The
>>ones that never went down seem to think the array is missing 4 nodes,
>>while the ones that went down seem to think all the nodes are good:
>>
>>for d in /dev/sd[cdfhjklmn]1 /dev/md1p1 /dev/md3p1; do echo $d; mdadm
>>-E $d | grep State; done
>>
>>/dev/sdc1
>> =A0 =A0 =A0 =A0 =A0State : clean
>> =A0 Array State : .A..AA.AAAA ('A' == active, '.' == missing=
)
>>/dev/sdd1
>> =A0 =A0 =A0 =A0 =A0State : clean
>> =A0 Array State : .A..AA.AAAA ('A' == active, '.' == missing=
)
>>/dev/sdf1
>> =A0 =A0 =A0 =A0 =A0State : clean
>> =A0 Array State : AAAAAAAAAAA ('A' == active, '.' == missing=
)
>>/dev/sdh1
>> =A0 =A0 =A0 =A0 =A0State : clean
>> =A0 Array State : AAAAAAAAAAA ('A' == active, '.' == missing=
)
>>/dev/sdj1
>> =A0 =A0 =A0 =A0 =A0State : clean
>> =A0 Array State : .A..AA.AAAA ('A' == active, '.' == missing=
)
>>/dev/sdk1
>> =A0 =A0 =A0 =A0 =A0State : clean
>> =A0 Array State : .A..AA.AAAA ('A' == active, '.' == missing=
)
>>/dev/sdl1
>> =A0 =A0 =A0 =A0 =A0State : clean
>> =A0 Array State : .A..AA.AAAA ('A' == active, '.' == missing=
)
>>/dev/sdm1
>> =A0 =A0 =A0 =A0 =A0State : clean
>> =A0 Array State : .A..AA.AAAA ('A' == active, '.' == missing=
)
>>/dev/sdn1
>> =A0 =A0 =A0 =A0 =A0State : clean
>> =A0 Array State : .A..AA.AAAA ('A' == active, '.' == missing=
)
>>/dev/md1p1
>> =A0 =A0 =A0 =A0 =A0State : clean
>> =A0 Array State : AAAAAAAAAAA ('A' == active, '.' == missing=
)
>>/dev/md3p1
>> =A0 =A0 =A0 =A0 =A0State : clean
>> =A0 Array State : .A..AAAAAAA ('A' == active, '.' == missing=
)
>>
>>So it seems like overall the array is intact, I just need to convince
>>it of that fact.
>>
>>> At that point, you should be able ro re-create the RAID. Be sure yo=
u list the drives in the correct order. Once the array is going again, =
mount the resulting partitions RO and verify that the data is o.k. befo=
re going RW.
>>
>>Could you be more specific about how exactly I should re-create the
>>RAID? Should I just do --assemble --force?
>
>
>
> =A0--> =A0No. As far as I know, you have to use "-C"/"--create". =A0Y=
ou need to use exactly the same array parameters that were used to crea=
te the array the first time. Same metadata version. Same stripe size. R=
aid mode the same. Physical devices in the same order.
>
> Why do you have to use "--create", and thus open the door for catastr=
opic error?? I have asked the same question myself. Maybe, if more peop=
le ping Neil Brown on this, he may be willing to find another way.
>
>
>
>>>
>>> Jim
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> At 04:16 PM 9/17/2011, Mike Hartman wrote:
>>>>I should add that the mdadm command in question actually ends in
>>>>/dev/md0, not /dev/md3 (that's for another array). So the device na=
me
>>>>for the array I'm seeing in mdstat DOES match the one in the assemb=
le
>>>>command.
>>>>
>>>>On Sat, Sep 17, 2011 at 4:39 PM, Mike Hartman n.com> wrote:
>>>>> I have 11 drives in a RAID 6 array. 6 are plugged into one esata
>>>>> enclosure, the other 4 are in another. These esata cables are pro=
ne to
>>>>> loosening when I'm working on nearby hardware.
>>>>>
>>>>> If that happens and I start the host up, big chunks of the array =
are
>>>>> missing and things could get ugly. Thus I cooked up a custom star=
tup
>>>>> script that verifies each device is present before starting the a=
rray
>>>>> with
>>>>>
>>>>> mdadm --assemble --no-degraded -u 4fd7659f:12044eff:ba25240d:
>>>>> de22249d /dev/md3
>>>>>
>>>>> So I thought I was covered. In case something got unplugged I wou=
ld
>>>>> see the array failing to start at boot and I could shut down, fix=
the
>>>>> cables and try again. However, I hit a new scenario today where o=
ne of
>>>>> the plugs was loosened while everything was turned on.
>>>>>
>>>>> The good news is that there should have been no activity on the a=
rray
>>>>> when this happened, particularly write activity. It's a big media
>>>>> partition and sees much less writing then reading. I'm also the o=
nly
>>>>> one that uses it and I know I wasn't transferring anything. The s=
ystem
>>>>> also seems to have immediately marked the filesystem read-only,
>>>>> because I discovered the issue when I went to write to it later a=
nd
>>>>> got a "read-only filesystem" error. So I believe the state of the
>>>>> drives should be the same - nothing should be out of sync.
>>>>>
>>>>> However, I shut the system down, fixed the cables and brought it =
back
>>>>> up. All the devices are detected by my script and it tries to sta=
rt
>>>>> the array with the command I posted above, but I've ended up with
>>>>> this:
>>>>>
>>>>> md0 : inactive sdn1[1](S) sdj1[9](S) sdm1[10](S) sdl1[11](S)
>>>>> sdk1[12](S) md3p1[8](S) sdc1[6](S) sdd1[5](S) md1p1[4](S) sdf1[3]=
(S)
>>>>> sdh1[0](S)
>>>>> =A0 =A0 =A0 16113893731 blocks super 1.2
>>>>>
>>>>> Instead of all coming back up, or still showing the unplugged dri=
ves
>>>>> missing, everything is a spare? I'm suitably disturbed.
>>>>>
>>>>> It seems to me that if the data on the drives still reflects the
>>>>> last-good data from the array (and since no writing was going on =
it
>>>>> should) then this is just a matter of some metadata getting messe=
d up
>>>>> and it should be fixable. Can someone please walk me through the
>>>>> commands to do that?
>>>>>
>>>>> Mike
>>>>>
>>>>--
>>>>To unsubscribe from this list: send the line "unsubscribe linux-rai=
d" in
>>>>the body of a message to majordomo@vger.kernel.org
>>>>More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm=
l
>>>
>>>
>>--
>>To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: RAID showing all devices as spares after partial unplug

am 18.09.2011 05:59:16 von Mike Hartman

On Sat, Sep 17, 2011 at 11:07 PM, Mike Hartman
wrote:
> Yikes. That's a pretty terrifying prospect.
>
> On Sat, Sep 17, 2011 at 10:57 PM, Jim Schatzman
> wrote:
>> Mike-
>>
>> See my response below.
>>
>> Good luck!
>>
>> Jim
>>
>>
>> At 07:34 PM 9/17/2011, Mike Hartman wrote:
>>>On Sat, Sep 17, 2011 at 9:16 PM, Jim Schatzman
>>> wrote:
>>>> Mike-
>>>>
>>>> I have seen very similar problems. I regret that electronics engin=
eers cannot design more secure connectors. eSata connector are terrible=
- they come loose at the slightest tug. For this reason, I am graduall=
y abandoning eSata enclosures and going to internal drives only. Fortun=
ately, there are some inexpensive RAID chassis available now.
>>>>
>>>> I tried the same thing as you. I removed the array(s) from mdadm.c=
onf and I wrote a script for "/etc/cron.reboot" which assembles the arr=
ay, "no-degraded". Doing this seems to minimize the damage caused by dr=
ives prior to a reboot. However, if the drives are disconnected while L=
inux is up, then either the array will stay up but some drives will bec=
ome stale or the array will be stopped. The behavior I usually see is t=
hat all the drives that went offline now become "spare".
>>>>
>>>
>>>That sounds similar, although I only had 4/11 go offline and now
>>>they're ALL spare.
>>>
>>>> It would be nice if md would just reassemble the array once all th=
e drives come back online. Unfortunately, it doesn't. I would run mdadm=
-E against all the drives/partitions, verifying that the metadata all =
indicates that they are/were part of the expected array.
>>>
>>>I ran mdadm -E and they all correctly appear as part of the array:
>>>
>>>for d in /dev/sd[cdfhjklmn]1 /dev/md1p1 /dev/md3p1; do echo $d; mdad=
m
>>>-E $d | grep Role; done
>>>
>>>/dev/sdc1
>>> =A0 Device Role : Active device 5
>>>/dev/sdd1
>>> =A0 Device Role : Active device 4
>>>/dev/sdf1
>>> =A0 Device Role : Active device 2
>>>/dev/sdh1
>>> =A0 Device Role : Active device 0
>>>/dev/sdj1
>>> =A0 Device Role : Active device 10
>>>/dev/sdk1
>>> =A0 Device Role : Active device 7
>>>/dev/sdl1
>>> =A0 Device Role : Active device 8
>>>/dev/sdm1
>>> =A0 Device Role : Active device 9
>>>/dev/sdn1
>>> =A0 Device Role : Active device 1
>>>/dev/md1p1
>>> =A0 Device Role : Active device 3
>>>/dev/md3p1
>>> =A0 Device Role : Active device 6
>>>
>>>But they have varying event counts (although all pretty close togeth=
er):
>>>
>>>for d in /dev/sd[cdfhjklmn]1 /dev/md1p1 /dev/md3p1; do echo $d; mdad=
m
>>>-E $d | grep Event; done
>>>
>>>/dev/sdc1
>>> =A0 =A0 =A0 =A0 Events : 1756743
>>>/dev/sdd1
>>> =A0 =A0 =A0 =A0 Events : 1756743
>>>/dev/sdf1
>>> =A0 =A0 =A0 =A0 Events : 1756737
>>>/dev/sdh1
>>> =A0 =A0 =A0 =A0 Events : 1756737
>>>/dev/sdj1
>>> =A0 =A0 =A0 =A0 Events : 1756743
>>>/dev/sdk1
>>> =A0 =A0 =A0 =A0 Events : 1756743
>>>/dev/sdl1
>>> =A0 =A0 =A0 =A0 Events : 1756743
>>>/dev/sdm1
>>> =A0 =A0 =A0 =A0 Events : 1756743
>>>/dev/sdn1
>>> =A0 =A0 =A0 =A0 Events : 1756743
>>>/dev/md1p1
>>> =A0 =A0 =A0 =A0 Events : 1756737
>>>/dev/md3p1
>>> =A0 =A0 =A0 =A0 Events : 1756740
>>>
>>>And they don't seem to agree on the overall status of the array. The
>>>ones that never went down seem to think the array is missing 4 nodes=
,
>>>while the ones that went down seem to think all the nodes are good:
>>>
>>>for d in /dev/sd[cdfhjklmn]1 /dev/md1p1 /dev/md3p1; do echo $d; mdad=
m
>>>-E $d | grep State; done
>>>
>>>/dev/sdc1
>>> =A0 =A0 =A0 =A0 =A0State : clean
>>> =A0 Array State : .A..AA.AAAA ('A' == active, '.' == missin=
g)
>>>/dev/sdd1
>>> =A0 =A0 =A0 =A0 =A0State : clean
>>> =A0 Array State : .A..AA.AAAA ('A' == active, '.' == missin=
g)
>>>/dev/sdf1
>>> =A0 =A0 =A0 =A0 =A0State : clean
>>> =A0 Array State : AAAAAAAAAAA ('A' == active, '.' == missin=
g)
>>>/dev/sdh1
>>> =A0 =A0 =A0 =A0 =A0State : clean
>>> =A0 Array State : AAAAAAAAAAA ('A' == active, '.' == missin=
g)
>>>/dev/sdj1
>>> =A0 =A0 =A0 =A0 =A0State : clean
>>> =A0 Array State : .A..AA.AAAA ('A' == active, '.' == missin=
g)
>>>/dev/sdk1
>>> =A0 =A0 =A0 =A0 =A0State : clean
>>> =A0 Array State : .A..AA.AAAA ('A' == active, '.' == missin=
g)
>>>/dev/sdl1
>>> =A0 =A0 =A0 =A0 =A0State : clean
>>> =A0 Array State : .A..AA.AAAA ('A' == active, '.' == missin=
g)
>>>/dev/sdm1
>>> =A0 =A0 =A0 =A0 =A0State : clean
>>> =A0 Array State : .A..AA.AAAA ('A' == active, '.' == missin=
g)
>>>/dev/sdn1
>>> =A0 =A0 =A0 =A0 =A0State : clean
>>> =A0 Array State : .A..AA.AAAA ('A' == active, '.' == missin=
g)
>>>/dev/md1p1
>>> =A0 =A0 =A0 =A0 =A0State : clean
>>> =A0 Array State : AAAAAAAAAAA ('A' == active, '.' == missin=
g)
>>>/dev/md3p1
>>> =A0 =A0 =A0 =A0 =A0State : clean
>>> =A0 Array State : .A..AAAAAAA ('A' == active, '.' == missin=
g)
>>>
>>>So it seems like overall the array is intact, I just need to convinc=
e
>>>it of that fact.
>>>
>>>> At that point, you should be able ro re-create the RAID. Be sure y=
ou list the drives in the correct order. Once the array is going again,=
mount the resulting partitions RO and verify that the data is o.k. bef=
ore going RW.
>>>
>>>Could you be more specific about how exactly I should re-create the
>>>RAID? Should I just do --assemble --force?
>>
>>
>>
>> =A0--> =A0No. As far as I know, you have to use "-C"/"--create". =A0=
You need to use exactly the same array parameters that were used to cre=
ate the array the first time. Same metadata version. Same stripe size. =
Raid mode the same. Physical devices in the same order.
>>
>> Why do you have to use "--create", and thus open the door for catast=
ropic error?? I have asked the same question myself. Maybe, if more peo=
ple ping Neil Brown on this, he may be willing to find another way.
>>
>>
>>


Is there any way to construct the exact create command using the info
given by mdadm -E? This array started as a RAID 5 that was reshaped
into a 6 and then grown many times, so I don't have a single original
create command lying around to reference.

I know the devices and their order (as previously listed) - are all
the other options I need to specify part of the -E output? If so, can
someone clarify how that maps into the command?

Here's an example output:

mdadm -E /dev/sdh1

/dev/sdh1:
=A0 =A0 =A0 =A0 =A0Magic : a92b4efc
=A0 =A0 =A0 =A0Version : 1.2
=A0 =A0Feature Map : 0x1
=A0 =A0 Array UUID : 714c307e:71626854:2c2cc6c8:c67339a0
=A0 =A0 =A0 =A0 =A0 Name : odin:0 =A0(local to host odin)
=A0Creation Time : Sat Sep =A04 12:52:59 2010
=A0 =A0 Raid Level : raid6
=A0 Raid Devices : 11

=A0Avail Dev Size : 2929691614 (1396.99 GiB 1500.00 GB)
=A0 =A0 Array Size : 26367220224 (12572.87 GiB 13500.02 GB)
=A0Used Dev Size : 2929691136 (1396.99 GiB 1500.00 GB)
=A0 =A0Data Offset : 2048 sectors
=A0 Super Offset : 8 sectors
=A0 =A0 =A0 =A0 =A0State : clean
=A0 =A0Device UUID : 384875df:23db9d35:f63202d0:01c03ba2

Internal Bitmap : 2 sectors from superblock
=A0 =A0Update Time : Thu Sep 15 05:10:57 2011
=A0 =A0 =A0 Checksum : f679cecb - correct
=A0 =A0 =A0 =A0 Events : 1756737

=A0 =A0 =A0 =A0 Layout : left-symmetric
=A0 =A0 Chunk Size : 256K

=A0 Device Role : Active device 0
=A0 Array State : AAAAAAAAAAA ('A' == active, '.' == missing)

Mike


>>>>
>>>> Jim
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> At 04:16 PM 9/17/2011, Mike Hartman wrote:
>>>>>I should add that the mdadm command in question actually ends in
>>>>>/dev/md0, not /dev/md3 (that's for another array). So the device n=
ame
>>>>>for the array I'm seeing in mdstat DOES match the one in the assem=
ble
>>>>>command.
>>>>>
>>>>>On Sat, Sep 17, 2011 at 4:39 PM, Mike Hartman on.com> wrote:
>>>>>> I have 11 drives in a RAID 6 array. 6 are plugged into one esata
>>>>>> enclosure, the other 4 are in another. These esata cables are pr=
one to
>>>>>> loosening when I'm working on nearby hardware.
>>>>>>
>>>>>> If that happens and I start the host up, big chunks of the array=
are
>>>>>> missing and things could get ugly. Thus I cooked up a custom sta=
rtup
>>>>>> script that verifies each device is present before starting the =
array
>>>>>> with
>>>>>>
>>>>>> mdadm --assemble --no-degraded -u 4fd7659f:12044eff:ba25240d:
>>>>>> de22249d /dev/md3
>>>>>>
>>>>>> So I thought I was covered. In case something got unplugged I wo=
uld
>>>>>> see the array failing to start at boot and I could shut down, fi=
x the
>>>>>> cables and try again. However, I hit a new scenario today where =
one of
>>>>>> the plugs was loosened while everything was turned on.
>>>>>>
>>>>>> The good news is that there should have been no activity on the =
array
>>>>>> when this happened, particularly write activity. It's a big medi=
a
>>>>>> partition and sees much less writing then reading. I'm also the =
only
>>>>>> one that uses it and I know I wasn't transferring anything. The =
system
>>>>>> also seems to have immediately marked the filesystem read-only,
>>>>>> because I discovered the issue when I went to write to it later =
and
>>>>>> got a "read-only filesystem" error. So I believe the state of th=
e
>>>>>> drives should be the same - nothing should be out of sync.
>>>>>>
>>>>>> However, I shut the system down, fixed the cables and brought it=
back
>>>>>> up. All the devices are detected by my script and it tries to st=
art
>>>>>> the array with the command I posted above, but I've ended up wit=
h
>>>>>> this:
>>>>>>
>>>>>> md0 : inactive sdn1[1](S) sdj1[9](S) sdm1[10](S) sdl1[11](S)
>>>>>> sdk1[12](S) md3p1[8](S) sdc1[6](S) sdd1[5](S) md1p1[4](S) sdf1[3=
](S)
>>>>>> sdh1[0](S)
>>>>>> =A0 =A0 =A0 16113893731 blocks super 1.2
>>>>>>
>>>>>> Instead of all coming back up, or still showing the unplugged dr=
ives
>>>>>> missing, everything is a spare? I'm suitably disturbed.
>>>>>>
>>>>>> It seems to me that if the data on the drives still reflects the
>>>>>> last-good data from the array (and since no writing was going on=
it
>>>>>> should) then this is just a matter of some metadata getting mess=
ed up
>>>>>> and it should be fixable. Can someone please walk me through the
>>>>>> commands to do that?
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>--
>>>>>To unsubscribe from this list: send the line "unsubscribe linux-ra=
id" in
>>>>>the body of a message to majordomo@vger.kernel.org
>>>>>More majordomo info at =A0http://vger.kernel.org/majordomo-info.ht=
ml
>>>>
>>>>
>>>--
>>>To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
>>>the body of a message to majordomo@vger.kernel.org
>>>More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID showing all devices as spares after partial unplug

am 18.09.2011 15:23:40 von Phil Turmel

Hi Mike,

On 09/17/2011 11:59 PM, Mike Hartman wrote:
> On Sat, Sep 17, 2011 at 11:07 PM, Mike Hartman
> wrote:
>> Yikes. That's a pretty terrifying prospect.

*Don't do it!*

"mdadm --create" in these situations is an absolute last resort.

First, try --assemble --force.

If needed, check the archives for the environment variable setting that'll temporarily allow mdadm to ignore the event counts for more --assemble and --assemble --force tries.

(I can't remember the variable name off the top of my head.)

Only if all of the above fails do you fall back to "--create", and every single "--create" attempt *must* include "--assume-clean", or your data is in grave danger.

Based on the output of the one full mdadm -E report, your array was created with a recent version mdadm, so you shouldn't have trouble with data offsets. Please post the full "mdadm -E" report for each drive if you want help putting together a --create command.

I'd also like to see the output of lsdrv[1], so there's a good record of drive serial numbers vs. device names.

HTH,

Phil

[1] http://github.com/pturmel/lsdrv


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID showing all devices as spares after partial unplug

am 18.09.2011 18:07:42 von Mike Hartman

Thanks Phil. --assemble --force is all it took - glad I held off. That
was my first instinct to try, but I was worried it would still leave
the drives as spare AND somehow mess up the metadata enough that it
wouldn't be recoverable, so I was afraid to touch it until someone
could confirm the approach. Seems like a silly reason to have the
array down for multiple days, but better safe than sorry with that
much data.

Thanks again to both you and Jim!

Mike

On Sun, Sep 18, 2011 at 9:23 AM, Phil Turmel wrote:
> Hi Mike,
>
> On 09/17/2011 11:59 PM, Mike Hartman wrote:
>> On Sat, Sep 17, 2011 at 11:07 PM, Mike Hartman
>> wrote:
>>> Yikes. That's a pretty terrifying prospect.
>
> *Don't do it!*
>
> "mdadm --create" in these situations is an absolute last resort.
>
> First, try --assemble --force.
>
> If needed, check the archives for the environment variable setting th=
at'll temporarily allow mdadm to ignore the event counts for more --ass=
emble and --assemble --force tries.
>
> (I can't remember the variable name off the top of my head.)
>
> Only if all of the above fails do you fall back to "--create", and ev=
ery single "--create" attempt *must* include "--assume-clean", or your =
data is in grave danger.
>
> Based on the output of the one full mdadm -E report, your array was c=
reated with a recent version mdadm, so you shouldn't have trouble with =
data offsets. =A0Please post the full "mdadm -E" report for each drive =
if you want help putting together a --create command.
>
> I'd also like to see the output of lsdrv[1], so there's a good record=
of drive serial numbers vs. device names.
>
> HTH,
>
> Phil
>
> [1] http://github.com/pturmel/lsdrv
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID showing all devices as spares after partial unplug

am 18.09.2011 18:18:23 von Phil Turmel

On 09/18/2011 12:07 PM, Mike Hartman wrote:
> Thanks Phil. --assemble --force is all it took - glad I held off. That
> was my first instinct to try, but I was worried it would still leave
> the drives as spare AND somehow mess up the metadata enough that it
> wouldn't be recoverable, so I was afraid to touch it until someone
> could confirm the approach. Seems like a silly reason to have the
> array down for multiple days, but better safe than sorry with that
> much data.

Good to hear! This list's archives have a number of cases where premature use of "--create" pushed a recoverable array over the edge, with resulting grief for the owner. Not that it can't or shouldn't ever be done, but the pitfalls have sharp stakes.

> Thanks again to both you and Jim!

You're welcome.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID showing all devices as spares after partial unplug

am 19.09.2011 01:08:31 von NeilBrown

--Sig_/XGcfVYAXExkMQdzrkaNkyZ/
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Sat, 17 Sep 2011 19:16:50 -0600 Jim Schatzman
wrote:

> Mike-
>=20
> I have seen very similar problems. I regret that electronics engineers ca=
nnot design more secure connectors. eSata connector are terrible - they com=
e loose at the slightest tug. For this reason, I am gradually abandoning eS=
ata enclosures and going to internal drives only. Fortunately, there are so=
me inexpensive RAID chassis available now.
>=20
> I tried the same thing as you. I removed the array(s) from mdadm.conf and=
I wrote a script for "/etc/cron.reboot" which assembles the array, "no-deg=
raded". Doing this seems to minimize the damage caused by drives prior to a=
reboot. However, if the drives are disconnected while Linux is up, then ei=
ther the array will stay up but some drives will become stale or the array =
will be stopped. The behavior I usually see is that all the drives that wen=
t offline now become "spare".
>=20
> It would be nice if md would just reassemble the array once all the drive=
s come back online. Unfortunately, it doesn't. I would run mdadm -E against=
all the drives/partitions, verifying that the metadata all indicates that =
they are/were part of the expected array. At that point, you should be able=
ro re-create the RAID. Be sure you list the drives in the correct order. O=
nce the array is going again, mount the resulting partitions RO and verify =
that the data is o.k. before going RW.

mdadm certainly can "just reassemble the array once all the drives come ...
online".

If you have udev configured to run "mdadm -I device-name" when a device
appears, then as soon as all required devices have appeared the array will =
be
started.

It would be good to have better handling of "half the devices disappeared",
particular if this is notice while trying to read or while trying to mark t=
he
array "dirty" in preparation for write.
If it happens during a real 'write' it is a bit harder to handle cleanly.

I should add that to my list :-)

NeilBrown

--Sig_/XGcfVYAXExkMQdzrkaNkyZ/
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iD8DBQFOdnnvG5fc6gV+Wb0RAucMAJ9unq/zAbSKtvobfmJvY+xLA7Sl5wCf RPr9
2iwuRZrfpSZk4OJVwjEnctg=
=Pa0Z
-----END PGP SIGNATURE-----

--Sig_/XGcfVYAXExkMQdzrkaNkyZ/--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID showing all devices as spares after partial unplug

am 20.09.2011 06:33:05 von Phil Turmel

[added the list CC: back. Please use reply-to-all on kernel.org lists.]

On 09/19/2011 09:00 PM, Jim Schatzman wrote:
> Thanks to all. Some notes
>
> 1) I have never gotten "mdadm --assemble --force" to work as desired.
> Having tried this on the 6-8 occasions when I have temporarily
> disconnected some drives, all that I have seen is that the
> temporarily-disconnect drives/partitions get added as spares and
> that's not helpful, as far as I can see. I'll have to try it the next
> time and see if it works.

Seems to be dependent on dirty status of the array (write in progress). Also, you should ensure the array is stopped before assembling after reconnecting.

> 2) Thanks for reminding me about the --assume-clean with "mdadm
> --create" option. Very important. My bad for forgetting it.
>
> 3) This is the first time I have heard that it is possible to get
> mdadm/md to ignore the event counts in the metadata via environmental
> variable. Can someone please provide the details?

I was mistaken... The variable I was thinking of only applies to interrupted --grow operations: "MDADM_GROW_ALLOW_OLD".

> I freely acknowledge that forcing mdadm to do something abnormal
> risks losing data. My situation, like Mike's, has always (knock on
> wood) been when the array was up but idle. Two slightly different
> cases are (1) drives are disconnected when the system is up; (2)
> drives are disconnected when the system is powered down and then
> rebooted. Both situations have always occurred when enough drives are
> offlined that the array cannot function and gets stopped
> automatically. Both situations have always resulted in
> drives/partitions being marked as "spare" if the subsequent assembly
> is done without "--no-degraded".

Neil has already responded that this needs looking at. The key will be the recognition of multiple simultaneous failures as not really a drive problem, triggering some form of alternate recovery.

> Following Mike's procedure of removing the arrays from
> /etc/mdadm.conf and always assembling with "--no-degraded", the
> problem is eliminated in the case that drives are unplugged during
> power-off. However, if the drives are unplugged while the system is
> up, then I still have to jump through hoops (i.e., mdadm --create
> --assume-clean) to get the arrays back up. I haven't tried "mdadm
> --assemble --force" for several versions of md/mdadm, so maybe things
> have changed?

--assemble --force will always be at least as safe as --create --assume-clean. Since it honors the recorded role numbers, it reduces the chance of a typo letting a create happen with devices in the wrong order. Device naming on boot can vary, especially with recent kernels that are capable of simultaneous probing. Using the original metadata really helps in this case. It also helps when the mdadm version has changed substantially since the array was created.

> For me, the fundamental problem has been the very insecure nature of
> eSata connectors. Poor design, in my opinion. The same kind of thing
> could occur, though, with an external enclosure if the power to the
> enclosure is lost.

Indeed. I haven't experienced the issue, though, as my arrays are all internal. (so far...)

Phil.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html