disk order problem in a raid 10 array

disk order problem in a raid 10 array

am 18.03.2011 15:49:20 von Xavier Brochard

Hello

trying to solve my problem with a unusable raid10 array, I discovered that
disk order is mixed between each boot - even with live-cd.
Here's an extract from dmesg:
[ 12.5] sda:
[ 12.5] sdc:
[ 12.5] sdd:
[ 12.5] sde: sdd1
[ 12.5] sdf: sdc1
[ 12.5] sda1 sda2
[ 12.5] sdg: sde1
[ 12.5] sdf1

is that normal?
could this be a sign of hardware controler problem?
could this happen because all disks are sata-3 except 1 SSD which is sata-2?


Xavier
xavier@alternatif.org
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 18.03.2011 18:22:34 von hansBKK

On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard wrote:
> disk order is mixed between each boot - even with live-cd.

> is that normal?

If nothing is changing and the order is swapping really every boot,
then IMO that is odd.

But it's very normal for ordering to change from time to time, and
definitely when elements change - kernel version/flavor, drivers, BIOS
settings, etc.

Part of my SOP is now to record both mdadm and the boot loader's
ordering against serial number and UUID of drives when creating an
array, and to put the relevant information on labels securely attached
to the physical drives, along with creating a map of their physical
location and taping that inside the case.

It's critical to know what's what in a crisis. . .
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 18.03.2011 21:09:31 von Xavier Brochard

Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez =E9crit =
:
> On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard rg>=20
wrote:
> > disk order is mixed between each boot - even with live-cd.
> > is that normal?
>=20
> If nothing is changing and the order is swapping really every boot,
> then IMO that is odd.

nothing has changed, except kernel minor version

>=20
> Part of my SOP is now to record both mdadm and the boot loader's
> ordering against serial number and UUID of drives when creating an
> array, and to put the relevant information on labels securely attache=
d
> to the physical drives, along with creating a map of their physical
> location and taping that inside the case.nt
>=20
> It's critical to know what's what in a crisis. . .

mdadm --examine output is somewhat weird as it shows
/dev/sde1
this 0 8 33 0 active sync /dev/sdd1
/dev/sdd1
this 0 8 33 0 active sync /dev/sdc1
/dev/sdc1
this 0 8 33 0 active sync /dev/sde1
and /dev/sdf1 as sdf1

I think I can believe mdadm?
and that /proc/mdstat content comes directly from mdadm (that is with "=
real"=20
sdc,d,e)?

what trouble me is that after I removed 2 disk drive from the bay, mdad=
m start=20
to recover:=20
md0 : active raid10 sdb1[1] sdc1[4] sdd1[3]
976767872 blocks 64K chunks 2 near-copies [4/2] [_U_U]
[=3D>...................] recovery =3D 5.0% (24436736/488383936=
)=20
finish=3D56.2min speed=3D137513K/sec

I guess that it is ok, and that it is recovering with the spare. But I =
wld=20
like to be sure...


Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 18.03.2011 21:12:49 von Xavier Brochard

Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez =E9crit =
:
> On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard rg>=20
wrote:
> > disk order is mixed between each boot - even with live-cd.
> > is that normal?
>=20
> If nothing is changing and the order is swapping really every boot,
> then IMO that is odd.

nothing has changed, except kernel minor version

>=20
> Part of my SOP is now to record both mdadm and the boot loader's
> ordering against serial number and UUID of drives when creating an
> array, and to put the relevant information on labels securely attache=
d
> to the physical drives, along with creating a map of their physical
> location and taping that inside the case.nt
>=20
> It's critical to know what's what in a crisis. . .

exactly, in my case mdadm --examine output is somewhat weird as it show=
s:
/dev/sde1
this 0 8 33 0 active sync /dev/sdd1
/dev/sdd1
this 0 8 33 0 active sync /dev/sdc1
/dev/sdc1
this 0 8 33 0 active sync /dev/sde1
and /dev/sdf1 as sdf1

I think I can believe mdadm?
and that /proc/mdstat content comes directly from mdadm (that is with "=
exact"
sdc,d,e)?

what trouble me is that after I removed 2 disk drive from the bay, mdad=
m start=20
to recover:=20
md0 : active raid10 sdb1[1] sdc1[4] sdd1[3]
976767872 blocks 64K chunks 2 near-copies [4/2] [_U_U]
[=3D>...................] recovery =3D 5.0% (24436736/488383936=
)=20
finish=3D56.2min speed=3D137513K/sec

I guess that it is ok, and that it is recovering with the spare. But I =
would=20
like to be sure...


Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Adaptive throttling for RAID1 background resync

am 18.03.2011 21:26:52 von Hari Subramanian

I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.

Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)

From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.

Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.

Thanks
~ Hari
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Adaptive throttling for RAID1 background resync

am 18.03.2011 21:28:37 von Roberto Spadim

maybe this could be better solved at queue linux kernel area... at
elevators or block devices

2011/3/18 Hari Subramanian :
> I am hitting an issue when performing RAID1 resync from a replica hos=
ted on a fast disk to one on a slow disk. When resync throughput is set=
at 20Mbps min and 200Mbps max and we have enough data to resync, I see=
the kernel running out of memory quickly (within a minute). From the c=
rash dumps, I see that a whole lot (12,000+) of biovec-64s that are act=
ive on the slab cache.
>
> Our guess is that MD is allowing data to be read from the fast disk a=
t a frequency much higher than what the slow disk is able to write to. =
This continues for a long time (> 1 minute) in an unbounded fashion res=
ulting in buildup of IOs that are waiting to be written to the disk. Th=
is eventually causes the machine to panic (we have panic on OOM selecte=
d)
>
> From reading the MD and RAID1 resync code, I don't see anything that =
would prevent something like this from happening. So, we would like to =
implement something to this effect that adaptively throttles the backgr=
ound resync.
>
> Can someone confirm or deny these claims and also the need for a new =
solution. Maybe I'm missing something that already exists that would gi=
ve me the adaptive throttling. We cannot make do with the static thrott=
ling (sync_speed_max and min) since that would be too difficult to get =
right for varying IO throughputs form the different RAID1 replicas.
>
> Thanks
> ~ Hari
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: Adaptive throttling for RAID1 background resync

am 18.03.2011 21:31:19 von Hari Subramanian

Roberto, My use case involves both foreground and background resyncs ha=
ppening at the same time. So, by throttling it at the block or IO queue=
s, I would be limiting my throughout for foreground IOs as well which i=
s undesirable.

~ Hari

-----Original Message-----
=46rom: rspadim@gmail.com [mailto:rspadim@gmail.com] On Behalf Of Rober=
to Spadim
Sent: Friday, March 18, 2011 4:29 PM
To: Hari Subramanian
Cc: linux-raid@vger.kernel.org
Subject: Re: Adaptive throttling for RAID1 background resync

maybe this could be better solved at queue linux kernel area... at
elevators or block devices

2011/3/18 Hari Subramanian :
> I am hitting an issue when performing RAID1 resync from a replica hos=
ted on a fast disk to one on a slow disk. When resync throughput is set=
at 20Mbps min and 200Mbps max and we have enough data to resync, I see=
the kernel running out of memory quickly (within a minute). From the c=
rash dumps, I see that a whole lot (12,000+) of biovec-64s that are act=
ive on the slab cache.
>
> Our guess is that MD is allowing data to be read from the fast disk a=
t a frequency much higher than what the slow disk is able to write to. =
This continues for a long time (> 1 minute) in an unbounded fashion res=
ulting in buildup of IOs that are waiting to be written to the disk. Th=
is eventually causes the machine to panic (we have panic on OOM selecte=
d)
>
> From reading the MD and RAID1 resync code, I don't see anything that =
would prevent something like this from happening. So, we would like to =
implement something to this effect that adaptively throttles the backgr=
ound resync.
>
> Can someone confirm or deny these claims and also the need for a new =
solution. Maybe I'm missing something that already exists that would gi=
ve me the adaptive throttling. We cannot make do with the static thrott=
ling (sync_speed_max and min) since that would be too difficult to get =
right for varying IO throughputs form the different RAID1 replicas.
>
> Thanks
> ~ Hari
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Adaptive throttling for RAID1 background resync

am 18.03.2011 21:36:12 von Roberto Spadim

hum, it=B4s not a io queue size (very big ram memory queue) problem?
maybe getting it smaller could help?
resync is something like read here write there, if you have write
problem, read should stop when async writes can=B4t work more (no ram
memory)
i=B4m right? if true, that=B4s why i think queue is a point to check

2011/3/18 Hari Subramanian :
> Roberto, My use case involves both foreground and background resyncs =
happening at the same time. So, by throttling it at the block or IO que=
ues, I would be limiting my throughout for foreground IOs as well which=
is undesirable.
>
> ~ Hari
>
> -----Original Message-----
> From: rspadim@gmail.com [mailto:rspadim@gmail.com] On Behalf Of Rober=
to Spadim
> Sent: Friday, March 18, 2011 4:29 PM
> To: Hari Subramanian
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Adaptive throttling for RAID1 background resync
>
> maybe this could be better solved at queue linux kernel area... at
> elevators or block devices
>
> 2011/3/18 Hari Subramanian :
>> I am hitting an issue when performing RAID1 resync from a replica ho=
sted on a fast disk to one on a slow disk. When resync throughput is se=
t at 20Mbps min and 200Mbps max and we have enough data to resync, I se=
e the kernel running out of memory quickly (within a minute). From the =
crash dumps, I see that a whole lot (12,000+) of biovec-64s that are ac=
tive on the slab cache.
>>
>> Our guess is that MD is allowing data to be read from the fast disk =
at a frequency much higher than what the slow disk is able to write to.=
This continues for a long time (> 1 minute) in an unbounded fashion re=
sulting in buildup of IOs that are waiting to be written to the disk. T=
his eventually causes the machine to panic (we have panic on OOM select=
ed)
>>
>> From reading the MD and RAID1 resync code, I don't see anything that=
would prevent something like this from happening. So, we would like to=
implement something to this effect that adaptively throttles the backg=
round resync.
>>
>> Can someone confirm or deny these claims and also the need for a new=
solution. Maybe I'm missing something that already exists that would g=
ive me the adaptive throttling. We cannot make do with the static throt=
tling (sync_speed_max and min) since that would be too difficult to get=
right for varying IO throughputs form the different RAID1 replicas.
>>
>> Thanks
>> ~ Hari
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: Adaptive throttling for RAID1 background resync

am 18.03.2011 21:54:45 von Hari Subramanian

Roberto, I still think the solution you point out has the potential for=
throttling foreground IOs issued to MD from the filesystem as well as =
the MD initiated background resyncs. So, I don't want to limit the IO q=
ueues, esp since our foreground workload involves a LOT of small random=
IO.

Thanks
~ Hari

-----Original Message-----
=46rom: rspadim@gmail.com [mailto:rspadim@gmail.com] On Behalf Of Rober=
to Spadim
Sent: Friday, March 18, 2011 4:36 PM
To: Hari Subramanian
Cc: linux-raid@vger.kernel.org
Subject: Re: Adaptive throttling for RAID1 background resync

hum, it=B4s not a io queue size (very big ram memory queue) problem?
maybe getting it smaller could help?
resync is something like read here write there, if you have write
problem, read should stop when async writes can=B4t work more (no ram
memory)
i=B4m right? if true, that=B4s why i think queue is a point to check

2011/3/18 Hari Subramanian :
> Roberto, My use case involves both foreground and background resyncs =
happening at the same time. So, by throttling it at the block or IO que=
ues, I would be limiting my throughout for foreground IOs as well which=
is undesirable.
>
> ~ Hari
>
> -----Original Message-----
> From: rspadim@gmail.com [mailto:rspadim@gmail.com] On Behalf Of Rober=
to Spadim
> Sent: Friday, March 18, 2011 4:29 PM
> To: Hari Subramanian
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Adaptive throttling for RAID1 background resync
>
> maybe this could be better solved at queue linux kernel area... at
> elevators or block devices
>
> 2011/3/18 Hari Subramanian :
>> I am hitting an issue when performing RAID1 resync from a replica ho=
sted on a fast disk to one on a slow disk. When resync throughput is se=
t at 20Mbps min and 200Mbps max and we have enough data to resync, I se=
e the kernel running out of memory quickly (within a minute). From the =
crash dumps, I see that a whole lot (12,000+) of biovec-64s that are ac=
tive on the slab cache.
>>
>> Our guess is that MD is allowing data to be read from the fast disk =
at a frequency much higher than what the slow disk is able to write to.=
This continues for a long time (> 1 minute) in an unbounded fashion re=
sulting in buildup of IOs that are waiting to be written to the disk. T=
his eventually causes the machine to panic (we have panic on OOM select=
ed)
>>
>> From reading the MD and RAID1 resync code, I don't see anything that=
would prevent something like this from happening. So, we would like to=
implement something to this effect that adaptively throttles the backg=
round resync.
>>
>> Can someone confirm or deny these claims and also the need for a new=
solution. Maybe I'm missing something that already exists that would g=
ive me the adaptive throttling. We cannot make do with the static throt=
tling (sync_speed_max and min) since that would be too difficult to get=
right for varying IO throughputs form the different RAID1 replicas.
>>
>> Thanks
>> ~ Hari
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Adaptive throttling for RAID1 background resync

am 18.03.2011 22:02:32 von Roberto Spadim

humm, let=B4s wait anothers ideas from list

2011/3/18 Hari Subramanian :
> Roberto, I still think the solution you point out has the potential f=
or throttling foreground IOs issued to MD from the filesystem as well a=
s the MD initiated background resyncs. So, I don't want to limit the IO=
queues, esp since our foreground workload involves a LOT of small rand=
om IO.
>
> Thanks
> ~ Hari
>
> -----Original Message-----
> From: rspadim@gmail.com [mailto:rspadim@gmail.com] On Behalf Of Rober=
to Spadim
> Sent: Friday, March 18, 2011 4:36 PM
> To: Hari Subramanian
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Adaptive throttling for RAID1 background resync
>
> hum, it=B4s not a io queue size (very big ram memory queue) problem?
> maybe getting it smaller could help?
> resync is something like read here write there, if you have write
> problem, read should stop when async writes can=B4t work more (no ram
> memory)
> i=B4m right? if true, that=B4s why i think queue is a point to check
>
> 2011/3/18 Hari Subramanian :
>> Roberto, My use case involves both foreground and background resyncs=
happening at the same time. So, by throttling it at the block or IO qu=
eues, I would be limiting my throughout for foreground IOs as well whic=
h is undesirable.
>>
>> ~ Hari
>>
>> -----Original Message-----
>> From: rspadim@gmail.com [mailto:rspadim@gmail.com] On Behalf Of Robe=
rto Spadim
>> Sent: Friday, March 18, 2011 4:29 PM
>> To: Hari Subramanian
>> Cc: linux-raid@vger.kernel.org
>> Subject: Re: Adaptive throttling for RAID1 background resync
>>
>> maybe this could be better solved at queue linux kernel area... at
>> elevators or block devices
>>
>> 2011/3/18 Hari Subramanian :
>>> I am hitting an issue when performing RAID1 resync from a replica h=
osted on a fast disk to one on a slow disk. When resync throughput is s=
et at 20Mbps min and 200Mbps max and we have enough data to resync, I s=
ee the kernel running out of memory quickly (within a minute). From the=
crash dumps, I see that a whole lot (12,000+) of biovec-64s that are a=
ctive on the slab cache.
>>>
>>> Our guess is that MD is allowing data to be read from the fast disk=
at a frequency much higher than what the slow disk is able to write to=
This continues for a long time (> 1 minute) in an unbounded fashion=
resulting in buildup of IOs that are waiting to be written to the disk=
This eventually causes the machine to panic (we have panic on OOM s=
elected)
>>>
>>> From reading the MD and RAID1 resync code, I don't see anything tha=
t would prevent something like this from happening. So, we would like t=
o implement something to this effect that adaptively throttles the back=
ground resync.
>>>
>>> Can someone confirm or deny these claims and also the need for a ne=
w solution. Maybe I'm missing something that already exists that would =
give me the adaptive throttling. We cannot make do with the static thro=
ttling (sync_speed_max and min) since that would be too difficult to ge=
t right for varying IO throughputs form the different RAID1 replicas.
>>>
>>> Thanks
>>> ~ Hari
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rai=
d" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm=
l
>>>
>>
>>
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Adaptive throttling for RAID1 background resync

am 18.03.2011 23:11:37 von NeilBrown

On Fri, 18 Mar 2011 13:26:52 -0700 Hari Subramanian wrote:

> I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.
>
> Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)
>
> >From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.
>
> Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.

The thing you are missing that already exists is

#define RESYNC_DEPTH 32

which is a limit places on conf->barrier, where conf->barrier is incremented
before submitting a resync IO, and decremented after completing a resync IO.

So there can never be more than 32 bios per device in use for resync.


12,000 active biovec-64s sounds a lot like a memory leak - something isn't
freeing them.
Is there some 'bio-XXX' slab with a similar count. If there isn't, then the
bio was released without releasing the biovec, which would be bad.
If there is - that information would help.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 18.03.2011 23:14:05 von NeilBrown

On Fri, 18 Mar 2011 15:49:20 +0100 Xavier Brochard
wrote:

> Hello
>
> trying to solve my problem with a unusable raid10 array, I discovered that
> disk order is mixed between each boot - even with live-cd.
> Here's an extract from dmesg:
> [ 12.5] sda:
> [ 12.5] sdc:
> [ 12.5] sdd:
> [ 12.5] sde: sdd1
> [ 12.5] sdf: sdc1
> [ 12.5] sda1 sda2
> [ 12.5] sdg: sde1
> [ 12.5] sdf1
>
> is that normal?
> could this be a sign of hardware controler problem?
> could this happen because all disks are sata-3 except 1 SSD which is sata-2?

You are saying that something changes between each boot, but only giving one
example so that we cannot see the change. That is not particularly helpful.

The output above is a bit odd, but I think it is simply that the devices are
all being examined in parallel so the per-device messages are being mingled
together.
Certainly 'sdd1' is on 'sdd', not no 'sde' as the message seems to show.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 18.03.2011 23:22:51 von NeilBrown

On Fri, 18 Mar 2011 21:12:49 +0100 Xavier Brochard rg>
wrote:

> Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez =E9cri=
t :
> > On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard org>=20
> wrote:
> > > disk order is mixed between each boot - even with live-cd.
> > > is that normal?
> >=20
> > If nothing is changing and the order is swapping really every boot,
> > then IMO that is odd.
>=20
> nothing has changed, except kernel minor version

Yet you don't tell us what the kernel minor version changed from or to.

That may not be important, but it might and you obviously don't know wh=
ich.
It is always better to give too much information rather than not enough=



>=20
> >=20
> > Part of my SOP is now to record both mdadm and the boot loader's
> > ordering against serial number and UUID of drives when creating an
> > array, and to put the relevant information on labels securely attac=
hed
> > to the physical drives, along with creating a map of their physical
> > location and taping that inside the case.nt
> >=20
> > It's critical to know what's what in a crisis. . .
>=20
> exactly, in my case mdadm --examine output is somewhat weird as it sh=
ows:
> /dev/sde1
> this 0 8 33 0 active sync /dev/sdd1
> /dev/sdd1
> this 0 8 33 0 active sync /dev/sdc1
> /dev/sdc1
> this 0 8 33 0 active sync /dev/sde1
> and /dev/sdf1 as sdf1

You are hiding lots of details again...

Are these all from different arrays? They all claim to be 'device 0' o=
f some
array.

Infact, "8, 33" is *always* /dev/sdc1, so I think the above lines hav=
e been
edited by hand because I'm 100% certain mdadm didn't output them.


>=20
> I think I can believe mdadm?

Yes, you can believe mdadm - but only if you understand what it is sayi=
ng,
and there are times when that is not as easy as one might like....

> and that /proc/mdstat content comes directly from mdadm (that is with=
"exact"
> sdc,d,e)?
>=20
> what trouble me is that after I removed 2 disk drive from the bay, md=
adm start=20
> to recover:=20
> md0 : active raid10 sdb1[1] sdc1[4] sdd1[3]
> 976767872 blocks 64K chunks 2 near-copies [4/2] [_U_U]
> [=3D>...................] recovery =3D 5.0% (24436736/4883839=
36)=20
> finish=3D56.2min speed=3D137513K/sec

Why exactly does this trouble you? It seems to be doing exactly the ri=
ght
thing.

>=20
> I guess that it is ok, and that it is recovering with the spare. But =
I would=20
> like to be sure...

Sure of what? If you want a clear answer you need to ask a clear quest=
ion.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 19.03.2011 00:06:30 von Xavier Brochard

Le vendredi 18 mars 2011 23:14:05, NeilBrown =E9crivait :
> On Fri, 18 Mar 2011 15:49:20 +0100 Xavier Brochard org>
> > trying to solve my problem with a unusable raid10 array, I discover=
ed
> > that disk order is mixed between each boot - even with live-cd.
> > Here's an extract from dmesg:
> > [ 12.5] sda:
> > [ 12.5] sdc:
> > [ 12.5] sdd:
> > [ 12.5] sde: sdd1
> > [ 12.5] sdf: sdc1
> > [ 12.5] sda1 sda2
> > [ 12.5] sdg: sde1
> > [ 12.5] sdf1
> >=20
> > is that normal?
>=20
> You are saying that something changes between each boot, but only giv=
ing
> one example so that we cannot see the change. That is not particular=
ly
> helpful.

sorry, I didn't want to send too long email
as each dmesg show diffent but similar output


> The output above is a bit odd, but I think it is simply that the devi=
ces
> are all being examined in parallel so the per-device messages are bei=
ng
> mingled together.
> Certainly 'sdd1' is on 'sdd', not no 'sde' as the message seems to sh=
ow.

ok thanks


Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 19.03.2011 00:06:56 von Xavier Brochard

Hello,

Le vendredi 18 mars 2011 23:22:51, NeilBrown =E9crivait :
> On Fri, 18 Mar 2011 21:12:49 +0100 Xavier Brochard org>
> > Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez =E9c=
rit :
> > > On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard
> > >
> >=20
> > wrote:
> > > > disk order is mixed between each boot - even with live-cd.
> > > > is that normal?
> > >=20
> > > If nothing is changing and the order is swapping really every boo=
t,
> > > then IMO that is odd.
> >=20
> > nothing has changed, except kernel minor version
>=20
> Yet you don't tell us what the kernel minor version changed from or t=
o.

Previously it was ubuntu 2.6.32-27-server or 2.6.32-28-server and now i=
t is=20
ubuntu 2.6.32-29.58-server 2.6.32.28+drm33.13

> That may not be important, but it might and you obviously don't know =
which.
> It is always better to give too much information rather than not enou=
gh.

Again sorry, my wednesday email was long and I thought it was too long!=
=20

> > exactly, in my case mdadm --examine output is somewhat weird as it =
shows:
> > /dev/sde1
> > this 0 8 33 0 active sync /dev/sdd1
> > /dev/sdd1
> > this 0 8 33 0 active sync /dev/sdc1
> > /dev/sdc1
> > this 0 8 33 0 active sync /dev/sde1
> > and /dev/sdf1 as sdf1
>=20
> You are hiding lots of details again...
>=20
> Are these all from different arrays? They all claim to be 'device 0'=
of
> some array.

They are all from same md RAID10 array

> Infact, "8, 33" is *always* /dev/sdc1, so I think the above lines h=
ave
> been edited by hand because I'm 100% certain mdadm didn't output them=


You're right, I'm sorry. I have copied this line, just changing the /d=
ev/sd?

Here's full output of mdadm --examine /dev/sd[cdefg]1
As you can see, disks sdc, sdd and sde claims to be different, is it a =
problem?
==================== =====
==============
/dev/sdc1:
Magic : a92b4efc
Version : 0.90.00
UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
Creation Time : Sun Jan 2 16:41:45 2011
Raid Level : raid10
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 976767872 (931.52 GiB 1000.21 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0

Update Time : Wed Mar 16 09:50:03 2011
State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 2
Spare Devices : 0
Checksum : ec151590 - correct
Events : 154

Layout : near=3D2
Chunk Size : 64K

Number Major Minor RaidDevice State
this 2 8 65 2 active sync /dev/sde1

0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 65 2 active sync /dev/sde1
3 3 0 0 3 faulty removed
/dev/sdd1:
Magic : a92b4efc
Version : 0.90.00
UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
Creation Time : Sun Jan 2 16:41:45 2011
Raid Level : raid10
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 976767872 (931.52 GiB 1000.21 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0

Update Time : Wed Mar 16 07:43:45 2011
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : ec14f740 - correct
Events : 102

Layout : near=3D2
Chunk Size : 64K

Number Major Minor RaidDevice State
this 0 8 33 0 active sync /dev/sdc1

0 0 8 33 0 active sync /dev/sdc1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 65 2 active sync /dev/sde1
3 3 8 81 3 active sync /dev/sdf1
4 4 8 97 4 spare /dev/sdg1
/dev/sde1:
Magic : a92b4efc
Version : 0.90.00
UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
Creation Time : Sun Jan 2 16:41:45 2011
Raid Level : raid10
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 976767872 (931.52 GiB 1000.21 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0

Update Time : Wed Mar 16 07:43:45 2011
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : ec14f752 - correct
Events : 102

Layout : near=3D2
Chunk Size : 64K

Number Major Minor RaidDevice State
this 1 8 49 1 active sync /dev/sdd1

0 0 8 33 0 active sync /dev/sdc1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 65 2 active sync /dev/sde1
3 3 8 81 3 active sync /dev/sdf1
4 4 8 97 4 spare /dev/sdg1
/dev/sdf1:
Magic : a92b4efc
Version : 0.90.00
UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
Creation Time : Sun Jan 2 16:41:45 2011
Raid Level : raid10
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 976767872 (931.52 GiB 1000.21 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0

Update Time : Wed Mar 16 07:43:45 2011
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : ec14f776 - correct
Events : 102

Layout : near=3D2
Chunk Size : 64K

Number Major Minor RaidDevice State
this 3 8 81 3 active sync /dev/sdf1

0 0 8 33 0 active sync /dev/sdc1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 65 2 active sync /dev/sde1
3 3 8 81 3 active sync /dev/sdf1
4 4 8 97 4 spare /dev/sdg1
/dev/sdg1:
Magic : a92b4efc
Version : 0.90.00
UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
Creation Time : Sun Jan 2 16:41:45 2011
Raid Level : raid10
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 976767872 (931.52 GiB 1000.21 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0

Update Time : Wed Mar 16 07:43:45 2011
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : ec14f782 - correct
Events : 102

Layout : near=3D2
Chunk Size : 64K

Number Major Minor RaidDevice State
this 4 8 97 4 spare /dev/sdg1

0 0 8 33 0 active sync /dev/sdc1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 65 2 active sync /dev/sde1
3 3 8 81 3 active sync /dev/sdf1
4 4 8 97 4 spare /dev/sdg1
===========3D


> > I think I can believe mdadm?
>=20
> Yes, you can believe mdadm - but only if you understand what it is sa=
ying,
> and there are times when that is not as easy as one might like....

Specially when a raid system is broken! One mind looks broken too and i=
t's a=20
bit hard to think clearly :-)

Thanks for the help

Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 19.03.2011 00:57:15 von Roberto Spadim

did you try to change udev configuration?

2011/3/18 Xavier Brochard :
> Hello,
>
> Le vendredi 18 mars 2011 23:22:51, NeilBrown  écrivait :
>> On Fri, 18 Mar 2011 21:12:49 +0100 Xavier Brochard f.org>
>> > Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez =E9=
crit :
>> > > On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard
>> > >
>> >
>> > wrote:
>> > > > disk order is mixed between each boot - even with live-cd.
>> > > > is that normal?
>> > >
>> > > If nothing is changing and the order is swapping really every bo=
ot,
>> > > then IMO that is odd.
>> >
>> > nothing has changed, except kernel minor version
>>
>> Yet you don't tell us what the kernel minor version changed from or =
to.
>
> Previously it was ubuntu 2.6.32-27-server or 2.6.32-28-server and now=
it is
> ubuntu 2.6.32-29.58-server 2.6.32.28+drm33.13
>
>> That may not be important, but it might and you obviously don't know=
which.
>> It is always better to give too much information rather than not eno=
ugh.
>
> Again sorry, my wednesday email was long and I thought it was too lon=
g!
>
>> > exactly, in my case mdadm --examine output is somewhat weird as it=
shows:
>> > /dev/sde1
>> > this =A0 =A0 0 =A0 =A0 =A0 8 =A0 =A0 =A0 33 =A0 =A0 =A0 =A00 =A0 =A0=
=A0active sync =A0 /dev/sdd1
>> > /dev/sdd1
>> > this =A0 =A0 0 =A0 =A0 =A0 8 =A0 =A0 =A0 33 =A0 =A0 =A0 =A00 =A0 =A0=
=A0active sync =A0 /dev/sdc1
>> > /dev/sdc1
>> > this =A0 =A0 0 =A0 =A0 =A0 8 =A0 =A0 =A0 33 =A0 =A0 =A0 =A00 =A0 =A0=
=A0active sync =A0 /dev/sde1
>> > and /dev/sdf1 as sdf1
>>
>> You are hiding lots of details again...
>>
>> Are these all from different arrays? =A0They all claim to be 'device=
0' of
>> some array.
>
> They are all from same md RAID10 array
>
>> Infact, =A0"8, 33" is *always* /dev/sdc1, =A0so I think the above li=
nes have
>> been edited by hand because I'm 100% certain mdadm didn't output the=
m.
>
> You're right, I'm sorry. I =A0have copied this line, just changing th=
e /dev/sd?
>
> Here's full output of mdadm --examine /dev/sd[cdefg]1
> As you can see, disks sdc, sdd and sde claims to be different, is it =
a problem?
> ==================== ===3D=
===============3D
> /dev/sdc1:
> =A0 =A0 =A0 =A0 =A0Magic : a92b4efc
> =A0 =A0 =A0 =A0Version : 0.90.00
> =A0 =A0 =A0 =A0 =A0 UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
> =A0Creation Time : Sun Jan =A02 16:41:45 2011
> =A0 =A0 Raid Level : raid10
> =A0Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
> =A0 =A0 Array Size : 976767872 (931.52 GiB 1000.21 GB)
> =A0 Raid Devices : 4
> =A0Total Devices : 5
> Preferred Minor : 0
>
> =A0 =A0Update Time : Wed Mar 16 09:50:03 2011
> =A0 =A0 =A0 =A0 =A0State : clean
> =A0Active Devices : 1
> Working Devices : 1
> =A0Failed Devices : 2
> =A0Spare Devices : 0
> =A0 =A0 =A0 Checksum : ec151590 - correct
> =A0 =A0 =A0 =A0 Events : 154
>
> =A0 =A0 =A0 =A0 Layout : near=3D2
> =A0 =A0 Chunk Size : 64K
>
> =A0 =A0 =A0Number =A0 Major =A0 Minor =A0 RaidDevice State
> this =A0 =A0 2 =A0 =A0 =A0 8 =A0 =A0 =A0 65 =A0 =A0 =A0 =A02 =A0 =A0 =
=A0active sync =A0 /dev/sde1
>
> =A0 0 =A0 =A0 0 =A0 =A0 =A0 0 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A00 =A0 =A0=
=A0removed
> =A0 1 =A0 =A0 1 =A0 =A0 =A0 0 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A01 =A0 =A0=
=A0faulty removed
> =A0 2 =A0 =A0 2 =A0 =A0 =A0 8 =A0 =A0 =A0 65 =A0 =A0 =A0 =A02 =A0 =A0=
=A0active sync =A0 /dev/sde1
> =A0 3 =A0 =A0 3 =A0 =A0 =A0 0 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A03 =A0 =A0=
=A0faulty removed
> /dev/sdd1:
> =A0 =A0 =A0 =A0 =A0Magic : a92b4efc
> =A0 =A0 =A0 =A0Version : 0.90.00
> =A0 =A0 =A0 =A0 =A0 UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
> =A0Creation Time : Sun Jan =A02 16:41:45 2011
> =A0 =A0 Raid Level : raid10
> =A0Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
> =A0 =A0 Array Size : 976767872 (931.52 GiB 1000.21 GB)
> =A0 Raid Devices : 4
> =A0Total Devices : 5
> Preferred Minor : 0
>
> =A0 =A0Update Time : Wed Mar 16 07:43:45 2011
> =A0 =A0 =A0 =A0 =A0State : clean
> =A0Active Devices : 4
> Working Devices : 5
> =A0Failed Devices : 0
> =A0Spare Devices : 1
> =A0 =A0 =A0 Checksum : ec14f740 - correct
> =A0 =A0 =A0 =A0 Events : 102
>
> =A0 =A0 =A0 =A0 Layout : near=3D2
> =A0 =A0 Chunk Size : 64K
>
> =A0 =A0 =A0Number =A0 Major =A0 Minor =A0 RaidDevice State
> this =A0 =A0 0 =A0 =A0 =A0 8 =A0 =A0 =A0 33 =A0 =A0 =A0 =A00 =A0 =A0 =
=A0active sync =A0 /dev/sdc1
>
> =A0 0 =A0 =A0 0 =A0 =A0 =A0 8 =A0 =A0 =A0 33 =A0 =A0 =A0 =A00 =A0 =A0=
=A0active sync =A0 /dev/sdc1
> =A0 1 =A0 =A0 1 =A0 =A0 =A0 8 =A0 =A0 =A0 49 =A0 =A0 =A0 =A01 =A0 =A0=
=A0active sync =A0 /dev/sdd1
> =A0 2 =A0 =A0 2 =A0 =A0 =A0 8 =A0 =A0 =A0 65 =A0 =A0 =A0 =A02 =A0 =A0=
=A0active sync =A0 /dev/sde1
> =A0 3 =A0 =A0 3 =A0 =A0 =A0 8 =A0 =A0 =A0 81 =A0 =A0 =A0 =A03 =A0 =A0=
=A0active sync =A0 /dev/sdf1
> =A0 4 =A0 =A0 4 =A0 =A0 =A0 8 =A0 =A0 =A0 97 =A0 =A0 =A0 =A04 =A0 =A0=
=A0spare =A0 /dev/sdg1
> /dev/sde1:
> =A0 =A0 =A0 =A0 =A0Magic : a92b4efc
> =A0 =A0 =A0 =A0Version : 0.90.00
> =A0 =A0 =A0 =A0 =A0 UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
> =A0Creation Time : Sun Jan =A02 16:41:45 2011
> =A0 =A0 Raid Level : raid10
> =A0Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
> =A0 =A0 Array Size : 976767872 (931.52 GiB 1000.21 GB)
> =A0 Raid Devices : 4
> =A0Total Devices : 5
> Preferred Minor : 0
>
> =A0 =A0Update Time : Wed Mar 16 07:43:45 2011
> =A0 =A0 =A0 =A0 =A0State : clean
> =A0Active Devices : 4
> Working Devices : 5
> =A0Failed Devices : 0
> =A0Spare Devices : 1
> =A0 =A0 =A0 Checksum : ec14f752 - correct
> =A0 =A0 =A0 =A0 Events : 102
>
> =A0 =A0 =A0 =A0 Layout : near=3D2
> =A0 =A0 Chunk Size : 64K
>
> =A0 =A0 =A0Number =A0 Major =A0 Minor =A0 RaidDevice State
> this =A0 =A0 1 =A0 =A0 =A0 8 =A0 =A0 =A0 49 =A0 =A0 =A0 =A01 =A0 =A0 =
=A0active sync =A0 /dev/sdd1
>
> =A0 0 =A0 =A0 0 =A0 =A0 =A0 8 =A0 =A0 =A0 33 =A0 =A0 =A0 =A00 =A0 =A0=
=A0active sync =A0 /dev/sdc1
> =A0 1 =A0 =A0 1 =A0 =A0 =A0 8 =A0 =A0 =A0 49 =A0 =A0 =A0 =A01 =A0 =A0=
=A0active sync =A0 /dev/sdd1
> =A0 2 =A0 =A0 2 =A0 =A0 =A0 8 =A0 =A0 =A0 65 =A0 =A0 =A0 =A02 =A0 =A0=
=A0active sync =A0 /dev/sde1
> =A0 3 =A0 =A0 3 =A0 =A0 =A0 8 =A0 =A0 =A0 81 =A0 =A0 =A0 =A03 =A0 =A0=
=A0active sync =A0 /dev/sdf1
> =A0 4 =A0 =A0 4 =A0 =A0 =A0 8 =A0 =A0 =A0 97 =A0 =A0 =A0 =A04 =A0 =A0=
=A0spare =A0 /dev/sdg1
> /dev/sdf1:
> =A0 =A0 =A0 =A0 =A0Magic : a92b4efc
> =A0 =A0 =A0 =A0Version : 0.90.00
> =A0 =A0 =A0 =A0 =A0 UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
> =A0Creation Time : Sun Jan =A02 16:41:45 2011
> =A0 =A0 Raid Level : raid10
> =A0Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
> =A0 =A0 Array Size : 976767872 (931.52 GiB 1000.21 GB)
> =A0 Raid Devices : 4
> =A0Total Devices : 5
> Preferred Minor : 0
>
> =A0 =A0Update Time : Wed Mar 16 07:43:45 2011
> =A0 =A0 =A0 =A0 =A0State : clean
> =A0Active Devices : 4
> Working Devices : 5
> =A0Failed Devices : 0
> =A0Spare Devices : 1
> =A0 =A0 =A0 Checksum : ec14f776 - correct
> =A0 =A0 =A0 =A0 Events : 102
>
> =A0 =A0 =A0 =A0 Layout : near=3D2
> =A0 =A0 Chunk Size : 64K
>
> =A0 =A0 =A0Number =A0 Major =A0 Minor =A0 RaidDevice State
> this =A0 =A0 3 =A0 =A0 =A0 8 =A0 =A0 =A0 81 =A0 =A0 =A0 =A03 =A0 =A0 =
=A0active sync =A0 /dev/sdf1
>
> =A0 0 =A0 =A0 0 =A0 =A0 =A0 8 =A0 =A0 =A0 33 =A0 =A0 =A0 =A00 =A0 =A0=
=A0active sync =A0 /dev/sdc1
> =A0 1 =A0 =A0 1 =A0 =A0 =A0 8 =A0 =A0 =A0 49 =A0 =A0 =A0 =A01 =A0 =A0=
=A0active sync =A0 /dev/sdd1
> =A0 2 =A0 =A0 2 =A0 =A0 =A0 8 =A0 =A0 =A0 65 =A0 =A0 =A0 =A02 =A0 =A0=
=A0active sync =A0 /dev/sde1
> =A0 3 =A0 =A0 3 =A0 =A0 =A0 8 =A0 =A0 =A0 81 =A0 =A0 =A0 =A03 =A0 =A0=
=A0active sync =A0 /dev/sdf1
> =A0 4 =A0 =A0 4 =A0 =A0 =A0 8 =A0 =A0 =A0 97 =A0 =A0 =A0 =A04 =A0 =A0=
=A0spare =A0 /dev/sdg1
> /dev/sdg1:
> =A0 =A0 =A0 =A0 =A0Magic : a92b4efc
> =A0 =A0 =A0 =A0Version : 0.90.00
> =A0 =A0 =A0 =A0 =A0 UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
> =A0Creation Time : Sun Jan =A02 16:41:45 2011
> =A0 =A0 Raid Level : raid10
> =A0Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
> =A0 =A0 Array Size : 976767872 (931.52 GiB 1000.21 GB)
> =A0 Raid Devices : 4
> =A0Total Devices : 5
> Preferred Minor : 0
>
> =A0 =A0Update Time : Wed Mar 16 07:43:45 2011
> =A0 =A0 =A0 =A0 =A0State : clean
> =A0Active Devices : 4
> Working Devices : 5
> =A0Failed Devices : 0
> =A0Spare Devices : 1
> =A0 =A0 =A0 Checksum : ec14f782 - correct
> =A0 =A0 =A0 =A0 Events : 102
>
> =A0 =A0 =A0 =A0 Layout : near=3D2
> =A0 =A0 Chunk Size : 64K
>
> =A0 =A0 =A0Number =A0 Major =A0 Minor =A0 RaidDevice State
> this =A0 =A0 4 =A0 =A0 =A0 8 =A0 =A0 =A0 97 =A0 =A0 =A0 =A04 =A0 =A0 =
=A0spare =A0 /dev/sdg1
>
> =A0 0 =A0 =A0 0 =A0 =A0 =A0 8 =A0 =A0 =A0 33 =A0 =A0 =A0 =A00 =A0 =A0=
=A0active sync =A0 /dev/sdc1
> =A0 1 =A0 =A0 1 =A0 =A0 =A0 8 =A0 =A0 =A0 49 =A0 =A0 =A0 =A01 =A0 =A0=
=A0active sync =A0 /dev/sdd1
> =A0 2 =A0 =A0 2 =A0 =A0 =A0 8 =A0 =A0 =A0 65 =A0 =A0 =A0 =A02 =A0 =A0=
=A0active sync =A0 /dev/sde1
> =A0 3 =A0 =A0 3 =A0 =A0 =A0 8 =A0 =A0 =A0 81 =A0 =A0 =A0 =A03 =A0 =A0=
=A0active sync =A0 /dev/sdf1
> =A0 4 =A0 =A0 4 =A0 =A0 =A0 8 =A0 =A0 =A0 97 =A0 =A0 =A0 =A04 =A0 =A0=
=A0spare =A0 /dev/sdg1
> ===========3D
>
>
>> > I think I can believe mdadm?
>>
>> Yes, you can believe mdadm - but only if you understand what it is s=
aying,
>> and there are times when that is not as easy as one might like....
>
> Specially when a raid system is broken! One mind looks broken too and=
it's a
> bit hard to think clearly :-)
>
> Thanks for the help
>
> Xavier
> xavier@alternatif.org - 09 54 06 16 26
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 19.03.2011 00:59:07 von Xavier Brochard

Le samedi 19 mars 2011 00:20:39 NeilBrown, vous avez =E9crit :
> On Fri, 18 Mar 2011 23:50:18 +0100 Xavier Brochard org>
> > Le vendredi 18 mars 2011 23:22:51, NeilBrown =E9crivait :
> > > On Fri, 18 Mar 2011 21:12:49 +0100 Xavier Brochard
> > > > Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez =
=E9crit :
> > > > > On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard
> > > > >
> > > >=20
> > > > wrote:
> > > > > > disk order is mixed between each boot - even with live-cd.
> > > > > > is that normal?
> > > > >=20
> > > > > If nothing is changing and the order is swapping really every=
boot,
> > > > > then IMO that is odd.
> > > >=20
> > > > nothing has changed, except kernel minor version
> > >=20
> > > Yet you don't tell us what the kernel minor version changed from =
or to.
> >=20
> > Previously it was ubuntu 2.6.32-27-server or 2.6.32-28-server and n=
ow it
> > is ubuntu 2.6.32-29.58-server 2.6.32.28+drm33.13
> >=20
> > > That may not be important, but it might and you obviously don't k=
now
> > > which. It is always better to give too much information rather th=
an
> > > not enough.

> >=20
> > Here's full output of mdadm --examine /dev/sd[cdefg]1
> > As you can see, disks sdc, sdd and sde claims to be different, is i=
t a
> > problem?
>=20
> Where all of these outputs collected at the same time?

yes

> They seem
> inconsistent.

> In particular, sdc1 has a higher 'events' number than the others (154=
vs
> 102) yet an earlier Update Time. It also thinks that the array is
> completely failed.

When I removed that disk (sdc is number 2) and another one (I tried wit=
h=20
different disks), all other disks display (with mdadm -E):
0 Active
1 Active
2 Active
3 Active
4 Spare

But when I removed that disk (#2) and #0, it start to recover and all o=
ther=20
disks display (with mdadm -E):
0 Removed
1 Active
2 Faulty removed
3 Active
4 Spare
That looks coherent for me, now.

> So I suspect that device is badly confused and you probably want to z=
ero
> it's metadata ... but don't do that too hastily.
>=20
> All the other devices think the array is working correctly with a ful=
l
> compliment of devices. However there is no device which claims to
> be "RaidDevice 2" - except sdc1 and it is obviously confused..
>=20
> The device name listed in the table at the end of --examine output.
> It is the name that the device had when the metadata was last written=
And
> device names can change on reboot.
> The fact that the names don't line up suggest that the metadata hasn'=
t been
> written since the last reboot - so presumably you aren't really using=
the
> array.(???)
>=20
> [the newer 1.x metadata format doesn't try to record the names of dev=
ices
> in the superblock so it doesn't result in some of this confusion).
>=20
>=20
> Based on your earlier email, it would appear that the device discover=
y for
> some of your devices is happening in parallel at boot time, so or ord=
ering
> could be random - each time you boot you get a different order. This=
will
> not confuse md or mdadm - they look at the content of the devices rat=
her
> than the name.
> If you want a definitive name for each device, it might be a good ide=
a to
> look in /dev/disk/by-path or /dev/disk/by-id and use names from there=

>=20
> Could you please sent a complete output of:
>=20
> cat /proc/mdstat
> mdadm -D /dev/md0
> mdadm -E /dev/sd?1
>=20
> all collected at the same time. Then I will suggest if there is any =
action
> you should take to repair anything.

Here it is, thankyou for you help

mdstat:
=====3D
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [r=
aid4]=20
[raid10]=20
md0 : inactive sdb1[2](S) sdf1[4](S) sdd1[3](S) sdc1[1](S) sde1[0](S)
2441919680 blocks
=20
unused devices:
====
obviously, mdadm -D /dev/md0 output nothing

mdadm -E /dev/sd?1
====
/dev/sdb1:
Magic : a92b4efc
Version : 0.90.00
UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
Creation Time : Sun Jan 2 16:41:45 2011
Raid Level : raid10
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 976767872 (931.52 GiB 1000.21 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0

Update Time : Wed Mar 16 09:50:03 2011
State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 2
Spare Devices : 0
Checksum : ec151590 - correct
Events : 154

Layout : near=3D2
Chunk Size : 64K

Number Major Minor RaidDevice State
this 2 8 65 2 active sync /dev/sde1

0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 8 65 2 active sync /dev/sde1
3 3 0 0 3 faulty removed
/dev/sdc1:
Magic : a92b4efc
Version : 0.90.00
UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
Creation Time : Sun Jan 2 16:41:45 2011
Raid Level : raid10
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 976767872 (931.52 GiB 1000.21 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0

Update Time : Fri Mar 18 16:37:45 2011
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 1
Spare Devices : 1
Checksum : ec181672 - correct
Events : 107

Layout : near=3D2
Chunk Size : 64K

Number Major Minor RaidDevice State
this 1 8 17 1 active sync /dev/sdb1

0 0 0 0 0 removed
1 1 8 17 1 active sync /dev/sdb1
2 2 0 0 2 faulty removed
3 3 8 49 3 active sync /dev/sdd1
4 4 8 33 4 spare /dev/sdc1
/dev/sdd1:
Magic : a92b4efc
Version : 0.90.00
UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
Creation Time : Sun Jan 2 16:41:45 2011
Raid Level : raid10
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 976767872 (931.52 GiB 1000.21 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0

Update Time : Fri Mar 18 16:37:45 2011
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 1
Spare Devices : 1
Checksum : ec181696 - correct
Events : 107

Layout : near=3D2
Chunk Size : 64K

Number Major Minor RaidDevice State
this 3 8 49 3 active sync /dev/sdd1

0 0 0 0 0 removed
1 1 8 17 1 active sync /dev/sdb1
2 2 0 0 2 faulty removed
3 3 8 49 3 active sync /dev/sdd1
4 4 8 33 4 spare /dev/sdc1
/dev/sde1:
Magic : a92b4efc
Version : 0.90.00
UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
Creation Time : Sun Jan 2 16:41:45 2011
Raid Level : raid10
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 976767872 (931.52 GiB 1000.21 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0

Update Time : Wed Mar 16 07:43:45 2011
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1
Checksum : ec14f740 - correct
Events : 102

Layout : near=3D2
Chunk Size : 64K

Number Major Minor RaidDevice State
this 0 8 33 0 active sync /dev/sdc1

0 0 8 33 0 active sync /dev/sdc1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 65 2 active sync /dev/sde1
3 3 8 81 3 active sync /dev/sdf1
4 4 8 97 4 spare
/dev/sdf1:
Magic : a92b4efc
Version : 0.90.00
UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
Creation Time : Sun Jan 2 16:41:45 2011
Raid Level : raid10
Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 976767872 (931.52 GiB 1000.21 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0

Update Time : Fri Mar 18 16:37:45 2011
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 1
Spare Devices : 1
Checksum : ec181682 - correct
Events : 107

Layout : near=3D2
Chunk Size : 64K

Number Major Minor RaidDevice State
this 4 8 33 4 spare /dev/sdc1

0 0 0 0 0 removed
1 1 8 17 1 active sync /dev/sdb1
2 2 0 0 2 faulty removed
3 3 8 49 3 active sync /dev/sdd1
4 4 8 33 4 spare /dev/sdc1
====



Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 19.03.2011 01:03:22 von Xavier Brochard

Le samedi 19 mars 2011 00:57:15, Roberto Spadim =E9crivait :
> did you try to change udev configuration?

no

But attempting to boot on a 2.6.32-27 or 2.6.32-24 kernel result in a f=
reeze=20
short after:
ata_id[919] : HDIO_GET_IDENTITY failed for '/dev/sdb'
And while rebooting with alt + Sysrq + REISUB keys, I saw that udev was=
stuck=20
on waiting a logitech usb mouse.=20

Xavier
xavier@alternatif.org
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 19.03.2011 01:05:58 von Xavier Brochard

Le samedi 19 mars 2011 00:59:07 Xavier Brochard, vous avez =E9crit :
> > Could you please sent a complete output of:
> >=20
> > cat /proc/mdstat
> > mdadm -D /dev/md0
> > mdadm -E /dev/sd?1
> >=20
> >
> > all collected at the same time. Then I will suggest if there is an=
y
> > action you should take to repair anything.
>=20
> Here it is, thankyou for you help

don't now if it is important, but I collected them from Sysrescue-cd, w=
ith a=20
2.6.35-std163-amd64 kernel

Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 19.03.2011 01:07:59 von Roberto Spadim

hum, can you check if you updated udev ?
i think it=B4s a udev with wrong configuration (maybe, maybe not)

2011/3/18 Xavier Brochard :
> Le samedi 19 mars 2011 00:59:07 Xavier Brochard, vous avez =E9crit :
>> > Could you please sent a complete output of:
>> >
>> > =A0 =A0cat /proc/mdstat
>> > =A0 =A0mdadm -D /dev/md0
>> > =A0 =A0mdadm -E /dev/sd?1
>> >
>> >
>> > all collected at the same time. =A0Then I will suggest if there is=
any
>> > action you should take to repair anything.
>>
>> Here it is, thankyou for you help
>
> don't now if it is important, but I collected them from Sysrescue-cd,=
with a
> 2.6.35-std163-amd64 kernel
>
> Xavier
> xavier@alternatif.org - 09 54 06 16 26
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 19.03.2011 01:25:32 von Xavier Brochard

Le samedi 19 mars 2011 01:07:59, Roberto Spadim =E9crivait :
> hum, can you check if you updated udev ?
> i think it=B4s a udev with wrong configuration (maybe, maybe not)

Unfortunatly /var is on the faulty raid.
But I'm pretty sure that it was not updated because last updates for ud=
ev=20
package in ubuntu lucid and lucid-updates repositories were all before =
my=20
installation of the server (respectively in april and november).


Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 19.03.2011 02:42:47 von NeilBrown

On Sat, 19 Mar 2011 00:59:07 +0100 Xavier Brochard rg>
wrote:

> Le samedi 19 mars 2011 00:20:39 NeilBrown, vous avez =E9crit :
> > On Fri, 18 Mar 2011 23:50:18 +0100 Xavier Brochard if.org>
> > > Le vendredi 18 mars 2011 23:22:51, NeilBrown =E9crivait :
> > > > On Fri, 18 Mar 2011 21:12:49 +0100 Xavier Brochard
> > > > > Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous ave=
z =E9crit :
> > > > > > On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard
> > > > > >
> > > > >=20
> > > > > wrote:
> > > > > > > disk order is mixed between each boot - even with live-cd=

> > > > > > > is that normal?
> > > > > >=20
> > > > > > If nothing is changing and the order is swapping really eve=
ry boot,
> > > > > > then IMO that is odd.
> > > > >=20
> > > > > nothing has changed, except kernel minor version
> > > >=20
> > > > Yet you don't tell us what the kernel minor version changed fro=
m or to.
> > >=20
> > > Previously it was ubuntu 2.6.32-27-server or 2.6.32-28-server and=
now it
> > > is ubuntu 2.6.32-29.58-server 2.6.32.28+drm33.13
> > >=20
> > > > That may not be important, but it might and you obviously don't=
know
> > > > which. It is always better to give too much information rather =
than
> > > > not enough.
>=20
> > >=20
> > > Here's full output of mdadm --examine /dev/sd[cdefg]1
> > > As you can see, disks sdc, sdd and sde claims to be different, is=
it a
> > > problem?
> >=20
> > Where all of these outputs collected at the same time?
>=20
> yes
>=20
> > They seem
> > inconsistent.
>=20
> > In particular, sdc1 has a higher 'events' number than the others (1=
54 vs
> > 102) yet an earlier Update Time. It also thinks that the array is
> > completely failed.
>=20
> When I removed that disk (sdc is number 2) and another one (I tried w=
ith=20
> different disks), all other disks display (with mdadm -E):
> 0 Active
> 1 Active
> 2 Active
> 3 Active
> 4 Spare
>=20
> But when I removed that disk (#2) and #0, it start to recover and all=
other=20
> disks display (with mdadm -E):
> 0 Removed
> 1 Active
> 2 Faulty removed
> 3 Active
> 4 Spare
> That looks coherent for me, now.
>=20
> > So I suspect that device is badly confused and you probably want to=
zero
> > it's metadata ... but don't do that too hastily.
> >=20
> > All the other devices think the array is working correctly with a f=
ull
> > compliment of devices. However there is no device which claims to
> > be "RaidDevice 2" - except sdc1 and it is obviously confused..
> >=20
> > The device name listed in the table at the end of --examine output.
> > It is the name that the device had when the metadata was last writt=
en. And
> > device names can change on reboot.
> > The fact that the names don't line up suggest that the metadata has=
n't been
> > written since the last reboot - so presumably you aren't really usi=
ng the
> > array.(???)
> >=20
> > [the newer 1.x metadata format doesn't try to record the names of d=
evices
> > in the superblock so it doesn't result in some of this confusion).
> >=20
> >=20
> > Based on your earlier email, it would appear that the device discov=
ery for
> > some of your devices is happening in parallel at boot time, so or o=
rdering
> > could be random - each time you boot you get a different order. Th=
is will
> > not confuse md or mdadm - they look at the content of the devices r=
ather
> > than the name.
> > If you want a definitive name for each device, it might be a good i=
dea to
> > look in /dev/disk/by-path or /dev/disk/by-id and use names from the=
re.
> >=20
> > Could you please sent a complete output of:
> >=20
> > cat /proc/mdstat
> > mdadm -D /dev/md0
> > mdadm -E /dev/sd?1
> >=20
> > all collected at the same time. Then I will suggest if there is an=
y action
> > you should take to repair anything.
>=20
> Here it is, thankyou for you help
>=20

I suggest you:

mdadm --zero /dev/sdb1

having first double-checked that sdb1 is the devices with Events of 154=
,

then

mdadm -S /dev/md0
mdadm -As /dev/md0


and let the array rebuild the spare.
Then check the data and make sure it is all good.
Then add /dev/sdb1 back in as the spare
mdadm /dev/md0 --add /dev/sdb1

and everything should be fine - providing you don't hit any hardware er=
rors
etc.


NeilBrown




> mdstat:
> =====3D
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] =
[raid4]=20
> [raid10]=20
> md0 : inactive sdb1[2](S) sdf1[4](S) sdd1[3](S) sdc1[1](S) sde1[0](S)
> 2441919680 blocks
> =20
> unused devices:
> ====
> obviously, mdadm -D /dev/md0 output nothing
>=20
> mdadm -E /dev/sd?1
> ====
> /dev/sdb1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
> Creation Time : Sun Jan 2 16:41:45 2011
> Raid Level : raid10
> Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
> Array Size : 976767872 (931.52 GiB 1000.21 GB)
> Raid Devices : 4
> Total Devices : 5
> Preferred Minor : 0
>=20
> Update Time : Wed Mar 16 09:50:03 2011
> State : clean
> Active Devices : 1
> Working Devices : 1
> Failed Devices : 2
> Spare Devices : 0
> Checksum : ec151590 - correct
> Events : 154
>=20
> Layout : near=3D2
> Chunk Size : 64K
>=20
> Number Major Minor RaidDevice State
> this 2 8 65 2 active sync /dev/sde1
>=20
> 0 0 0 0 0 removed
> 1 1 0 0 1 faulty removed
> 2 2 8 65 2 active sync /dev/sde1
> 3 3 0 0 3 faulty removed
> /dev/sdc1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
> Creation Time : Sun Jan 2 16:41:45 2011
> Raid Level : raid10
> Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
> Array Size : 976767872 (931.52 GiB 1000.21 GB)
> Raid Devices : 4
> Total Devices : 3
> Preferred Minor : 0
>=20
> Update Time : Fri Mar 18 16:37:45 2011
> State : clean
> Active Devices : 2
> Working Devices : 3
> Failed Devices : 1
> Spare Devices : 1
> Checksum : ec181672 - correct
> Events : 107
>=20
> Layout : near=3D2
> Chunk Size : 64K
>=20
> Number Major Minor RaidDevice State
> this 1 8 17 1 active sync /dev/sdb1
>=20
> 0 0 0 0 0 removed
> 1 1 8 17 1 active sync /dev/sdb1
> 2 2 0 0 2 faulty removed
> 3 3 8 49 3 active sync /dev/sdd1
> 4 4 8 33 4 spare /dev/sdc1
> /dev/sdd1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
> Creation Time : Sun Jan 2 16:41:45 2011
> Raid Level : raid10
> Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
> Array Size : 976767872 (931.52 GiB 1000.21 GB)
> Raid Devices : 4
> Total Devices : 3
> Preferred Minor : 0
>=20
> Update Time : Fri Mar 18 16:37:45 2011
> State : clean
> Active Devices : 2
> Working Devices : 3
> Failed Devices : 1
> Spare Devices : 1
> Checksum : ec181696 - correct
> Events : 107
>=20
> Layout : near=3D2
> Chunk Size : 64K
>=20
> Number Major Minor RaidDevice State
> this 3 8 49 3 active sync /dev/sdd1
>=20
> 0 0 0 0 0 removed
> 1 1 8 17 1 active sync /dev/sdb1
> 2 2 0 0 2 faulty removed
> 3 3 8 49 3 active sync /dev/sdd1
> 4 4 8 33 4 spare /dev/sdc1
> /dev/sde1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
> Creation Time : Sun Jan 2 16:41:45 2011
> Raid Level : raid10
> Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
> Array Size : 976767872 (931.52 GiB 1000.21 GB)
> Raid Devices : 4
> Total Devices : 5
> Preferred Minor : 0
>=20
> Update Time : Wed Mar 16 07:43:45 2011
> State : clean
> Active Devices : 4
> Working Devices : 5
> Failed Devices : 0
> Spare Devices : 1
> Checksum : ec14f740 - correct
> Events : 102
>=20
> Layout : near=3D2
> Chunk Size : 64K
>=20
> Number Major Minor RaidDevice State
> this 0 8 33 0 active sync /dev/sdc1
>=20
> 0 0 8 33 0 active sync /dev/sdc1
> 1 1 8 49 1 active sync /dev/sdd1
> 2 2 8 65 2 active sync /dev/sde1
> 3 3 8 81 3 active sync /dev/sdf1
> 4 4 8 97 4 spare
> /dev/sdf1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
> Creation Time : Sun Jan 2 16:41:45 2011
> Raid Level : raid10
> Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
> Array Size : 976767872 (931.52 GiB 1000.21 GB)
> Raid Devices : 4
> Total Devices : 3
> Preferred Minor : 0
>=20
> Update Time : Fri Mar 18 16:37:45 2011
> State : clean
> Active Devices : 2
> Working Devices : 3
> Failed Devices : 1
> Spare Devices : 1
> Checksum : ec181682 - correct
> Events : 107
>=20
> Layout : near=3D2
> Chunk Size : 64K
>=20
> Number Major Minor RaidDevice State
> this 4 8 33 4 spare /dev/sdc1
>=20
> 0 0 0 0 0 removed
> 1 1 8 17 1 active sync /dev/sdb1
> 2 2 0 0 2 faulty removed
> 3 3 8 49 3 active sync /dev/sdd1
> 4 4 8 33 4 spare /dev/sdc1
> ====
>=20
>=20
>=20
> Xavier
> xavier@alternatif.org - 09 54 06 16 26

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 19.03.2011 13:01:29 von Xavier Brochard

Le samedi 19 mars 2011 00:20:39, NeilBrown =E9crivait :
> On Fri, 18 Mar 2011 23:50:18 +0100 Xavier Brochard org>
> > Le vendredi 18 mars 2011 23:22:51, NeilBrown =E9crivait :
> > > On Fri, 18 Mar 2011 21:12:49 +0100 Xavier Brochard
> > > > Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez =
=E9crit :
> > > > > On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard
> > > > > > disk order is mixed between each boot - even with live-cd.
> > > > > > is that normal?
> > > > >=20
> > > > > If nothing is changing and the order is swapping really every=
boot,
> > > > > then IMO that is odd.
> > > >=20
> > > > nothing has changed, except kernel minor version
> > >=20
> > > Yet you don't tell us what the kernel minor version changed from =
or to.
> >=20
> > Previously it was ubuntu 2.6.32-27-server or 2.6.32-28-server and n=
ow it
> > is ubuntu 2.6.32-29.58-server 2.6.32.28+drm33.13
> >=20
> > > That may not be important, but it might and you obviously don't k=
now
> > > which. It is always better to give too much information rather th=
an
> > > not enough.

> > Here's full output of mdadm --examine /dev/sd[cdefg]1
> > As you can see, disks sdc, sdd and sde claims to be different, is i=
t a
> > problem?
>=20
> Where all of these outputs collected at the same time? They seem
> inconsistent.
>=20
> In particular, sdc1 has a higher 'events' number than the others (154=
vs
> 102) yet an earlier Update Time. It also thinks that the array is
> completely failed.
> So I suspect that device is badly confused and you probably want to z=
ero
> it's metadata ... but don't do that too hastily.
>=20
> All the other devices think the array is working correctly with a ful=
l
> compliment of devices. However there is no device which claims to
> be "RaidDevice 2" - except sdc1 and it is obviously confused..
>=20
> The device name listed in the table at the end of --examine output.
> It is the name that the device had when the metadata was last written=
And
> device names can change on reboot.
> The fact that the names don't line up suggest that the metadata hasn'=
t been
> written since the last reboot - so presumably you aren't really using=
the
> array.(???)

The array was in use 24/24.
But the last reboot using it was after the first error (I described it=
=20
extensively in wednesday email). As I first thought it was a file syste=
m error,=20
I've launched fsck to check the /tmp FS with fsck /dev/mapper/tout-tmp =
(it is=20
a Raid10 + lvm setup).=20
Can it be the reason for the metadata not written?

> [the newer 1.x metadata format doesn't try to record the names of dev=
ices
> in the superblock so it doesn't result in some of this confusion).

Yes it's really confusing:
the SAS/SATA controler card gives "numbers" for the hard drives
which doesn't correspond to the /dev/sd? names=20
which doesn't correspond to the drive numer in the array
etc.

> Based on your earlier email, it would appear that the device discover=
y for
> some of your devices is happening in parallel at boot time, so or ord=
ering
> could be random - each time you boot you get a different order. This=
will
> not confuse md or mdadm - they look at the content of the devices rat=
her
> than the name.

ok, thanks for making it very clear

> If you want a definitive name for each device, it might be a good ide=
a to
> look in /dev/disk/by-path or /dev/disk/by-id and use names from there=


I think I can't:=20
With System Rescue CD (2.6.35-std163-amd64 kernel) I have only one path=
=20
available:
pci-0000:00:14.1-scsi-0:0:0:0
which, according to lspci, is not the LSI sas/sata controler, but the I=
DE=20
interface:
00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller
While the LSI controler is at
01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2=
008=20
PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 02)

This make me a bit anxious to start the raid recovery!


Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 19.03.2011 14:44:40 von Xavier Brochard

Le samedi 19 mars 2011 02:42:47 NeilBrown, vous avez =E9crit :
> I suggest you:
>=20
> mdadm --zero /dev/sdb1
>=20
> having first double-checked that sdb1 is the devices with Events of 1=
54,
>=20
> then
>=20
> mdadm -S /dev/md0
> mdadm -As /dev/md0
>=20
>=20
> and let the array rebuild the spare.
> Then check the data and make sure it is all good.
> Then add /dev/sdb1 back in as the spare
> mdadm /dev/md0 --add /dev/sdb1
>=20
> and everything should be fine - providing you don't hit any hardware =
errors
> etc.

It didnt work, until I've stopped the raid array:
mdadm --zero /dev/sdg1
mdadm: Couldn't open /dev/sdg1 for write - not zeroing

is that normal, can I continue?

Xavier
xavier@alternatif.org
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 19.03.2011 16:14:09 von Xavier Brochard

Le samedi 19 mars 2011 14:44:40, Xavier Brochard =E9crivait :
> Le samedi 19 mars 2011 02:42:47 NeilBrown, vous avez =E9crit :
> > I suggest you:
> > mdadm --zero /dev/sdb1
> >=20
> > having first double-checked that sdb1 is the devices with Events of=
154,
> >=20
> > then
> >=20
> > mdadm -S /dev/md0
> > mdadm -As /dev/md0
> >=20
> > and let the array rebuild the spare.
> > Then check the data and make sure it is all good.
> > Then add /dev/sdb1 back in as the spare
> >=20
> > mdadm /dev/md0 --add /dev/sdb1
> >=20
> > and everything should be fine - providing you don't hit any hardwar=
e
> > errors etc.
>=20
> It didnt work, until I've stopped the raid array:
> mdadm --zero /dev/sdg1
> mdadm: Couldn't open /dev/sdg1 for write - not zeroing
>=20
> is that normal, can I continue?

so far I've done:
mdam -S /dev/md0
mdadm --zero /dev/sdg1
mdadm -As /dev/md0 --config=3D/path/to/config

mdadm: /dev/md0 has been started with 2 drives (out of 4) and 1 spare.

ot started to recover:
md0 : active raid10 sdc1[1] sdf1[4] sde1[3]
976767872 blocks 64K chunks 2 near-copies [4/2] [_U_U]
[>....................] recovery =3D 0.3% (1468160/488383936)=20
finish=3D66.3min speed=3D122346K/sec

but why did it started md0 with 2 drives and not with 3 drives?

Xavier
xavier@alternatif.org - 09 54 06 16 26
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 20.03.2011 04:53:23 von NeilBrown

On Sat, 19 Mar 2011 14:44:40 +0100 Xavier Brochard rg>
wrote:

> Le samedi 19 mars 2011 02:42:47 NeilBrown, vous avez =E9crit :
> > I suggest you:
> >=20
> > mdadm --zero /dev/sdb1
> >=20
> > having first double-checked that sdb1 is the devices with Events of=
154,
> >=20
> > then
> >=20
> > mdadm -S /dev/md0
> > mdadm -As /dev/md0
> >=20
> >=20
> > and let the array rebuild the spare.
> > Then check the data and make sure it is all good.
> > Then add /dev/sdb1 back in as the spare
> > mdadm /dev/md0 --add /dev/sdb1
> >=20
> > and everything should be fine - providing you don't hit any hardwar=
e errors
> > etc.
>=20
> It didnt work, until I've stopped the raid array:
> mdadm --zero /dev/sdg1
> mdadm: Couldn't open /dev/sdg1 for write - not zeroing
>=20
> is that normal, can I continue?
>

Yes, you are right. You need to stop the array before you zero things.

So:
=20
mdadm -S /dev/md0
mdadm --zero /dev/the-device-which-thinks-most-of-the-other-devices-=
have-failed
mdadm -As /dev/md0

That last command might need to be
mdadm -As /dev/md0 /dev/sdc1 /dev/sdd1 ..... list of all member devi=
ces.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: disk order problem in a raid 10 array

am 20.03.2011 11:40:20 von Xavier Brochard

Le dimanche 20 mars 2011 04:53:23 NeilBrown, vous avez =E9crit :
> On Sat, 19 Mar 2011 14:44:40 +0100 Xavier Brochard org>
>=20
> wrote:
> > Le samedi 19 mars 2011 02:42:47 NeilBrown, vous avez =E9crit :
> > > I suggest you:
> > > mdadm --zero /dev/sdb1
> > >=20
> > > having first double-checked that sdb1 is the devices with Events =
of
> > > 154,
> > >=20
> > > then
> > >=20
> > > mdadm -S /dev/md0
> > > mdadm -As /dev/md0
> > >=20
> > > and let the array rebuild the spare.
> > > Then check the data and make sure it is all good.
> > > Then add /dev/sdb1 back in as the spare
> > >=20
> > > mdadm /dev/md0 --add /dev/sdb1
> > >=20
> > > and everything should be fine - providing you don't hit any hardw=
are
> > > errors etc.
> >=20
> > It didnt work, until I've stopped the raid array:
> > mdadm --zero /dev/sdg1
> > mdadm: Couldn't open /dev/sdg1 for write - not zeroing
> >=20
> > is that normal, can I continue?
>=20
> Yes, you are right. You need to stop the array before you zero thing=
s.
>=20
> So:
>=20
> mdadm -S /dev/md0
> mdadm --zero
> /dev/the-device-which-thinks-most-of-the-other-devices-have- failed md=
adm
> -As /dev/md0
>=20
> That last command might need to be
> mdadm -As /dev/md0 /dev/sdc1 /dev/sdd1 ..... list of all member de=
vices.

I would like to thank you very much for you help.
You helped me to understand mdadm message and to keep only relevant par=
t of=20
the various errors I could see in the logs. Your explanation and instru=
ctions=20
were a great help to repair .

Everything is working now.
Thanks again.

Xavier
xavier@alternatif.org=20
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: Adaptive throttling for RAID1 background resync

am 21.03.2011 22:02:37 von Hari Subramanian

Hi Neil,

There are an equal number of BIOs as there are biovec-64s. But I understand why that is the case now. Turns out that some changes that were made by one of our performance engineers in the interest of increasing the performance of background resyncs when there are foreground I/Os gets in the way (or effectively neuters) the RESYNC_DEPTH throttle that exists today.

The gist of these changes is that:
- We hold the barrier across the resync window to disallow foreground IOs from interrupting background resyncs and raise_barrier is not being invoked raid1 sync_request
- We increase the resync window to 8M and resync chunk size to 256K

The combination of these factors caused us to have a huge number of IOs outstanding and as much as 256M of resync data pages. We are working on a fix for this. I can share a patch to MD that implements these changes if someone is interested.

Thanks again for your help!
~ Hari

-----Original Message-----
From: NeilBrown [mailto:neilb@suse.de]
Sent: Friday, March 18, 2011 6:12 PM
To: Hari Subramanian
Cc: linux-raid@vger.kernel.org
Subject: Re: Adaptive throttling for RAID1 background resync

On Fri, 18 Mar 2011 13:26:52 -0700 Hari Subramanian wrote:

> I am hitting an issue when performing RAID1 resync from a replica hosted on a fast disk to one on a slow disk. When resync throughput is set at 20Mbps min and 200Mbps max and we have enough data to resync, I see the kernel running out of memory quickly (within a minute). From the crash dumps, I see that a whole lot (12,000+) of biovec-64s that are active on the slab cache.
>
> Our guess is that MD is allowing data to be read from the fast disk at a frequency much higher than what the slow disk is able to write to. This continues for a long time (> 1 minute) in an unbounded fashion resulting in buildup of IOs that are waiting to be written to the disk. This eventually causes the machine to panic (we have panic on OOM selected)
>
> >From reading the MD and RAID1 resync code, I don't see anything that would prevent something like this from happening. So, we would like to implement something to this effect that adaptively throttles the background resync.
>
> Can someone confirm or deny these claims and also the need for a new solution. Maybe I'm missing something that already exists that would give me the adaptive throttling. We cannot make do with the static throttling (sync_speed_max and min) since that would be too difficult to get right for varying IO throughputs form the different RAID1 replicas.

The thing you are missing that already exists is

#define RESYNC_DEPTH 32

which is a limit places on conf->barrier, where conf->barrier is incremented
before submitting a resync IO, and decremented after completing a resync IO.

So there can never be more than 32 bios per device in use for resync.


12,000 active biovec-64s sounds a lot like a memory leak - something isn't
freeing them.
Is there some 'bio-XXX' slab with a similar count. If there isn't, then the
bio was released without releasing the biovec, which would be bad.
If there is - that information would help.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html