reshape failure

am 16.02.2011 16:46:32 von Tobias McNulty

Hi,

I tried to start a reshape over the weekend (RAID6 -> RAID5) and was
dismayed to see that it was going to take roughly 2 weeks to complete:

md0 : active raid6 sdc[0] sdh[5](S) sdg[4] sdf[3] sde[2] sdd[1]
5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5]
[uuuuu] [>....................] reshape = 0.0% (245760/1953514496)
finish=21189.7min speed=1536K/sec

The disk that contained the backup file began experiencing SATA errors
several days into the reshape, due to what turned out to be a faulty
SATA card. The card has since been replaced and the RAID1 device that
contains the backup file successfully resync'ed.

However, when I try to re-start the reshape now, I get the following error:

nas:~# mdadm --assemble /dev/md0 --backup-file=md0.backup /dev/sdc
/dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
mdadm: Failed to restore critical section for reshape, sorry.

Is my data lost for good? Is there anything else I can do?

Thanks,
Tobias
--
Tobias McNulty, Managing Partner
Caktus Consulting Group, LLC
http://www.caktusgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: reshape failure

am 16.02.2011 21:32:47 von NeilBrown

On Wed, 16 Feb 2011 10:46:32 -0500 Tobias McNulty
wrote:

> Hi,
>
> I tried to start a reshape over the weekend (RAID6 -> RAID5) and was
> dismayed to see that it was going to take roughly 2 weeks to complete:
>
> md0 : active raid6 sdc[0] sdh[5](S) sdg[4] sdf[3] sde[2] sdd[1]
> 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5]
> [uuuuu] [>....................] reshape = 0.0% (245760/1953514496)
> finish=21189.7min speed=1536K/sec
>
> The disk that contained the backup file began experiencing SATA errors
> several days into the reshape, due to what turned out to be a faulty
> SATA card. The card has since been replaced and the RAID1 device that
> contains the backup file successfully resync'ed.
>
> However, when I try to re-start the reshape now, I get the following error:
>
> nas:~# mdadm --assemble /dev/md0 --backup-file=md0.backup /dev/sdc
> /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
> mdadm: Failed to restore critical section for reshape, sorry.
>
> Is my data lost for good? Is there anything else I can do?

Try above command with --verbose.
If a message about "too-old timestamp" appears, run

export MDADM_GROW_ALLOW_OLD=1

and run the command again.

In either case, post the output.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: reshape failure

am 16.02.2011 21:41:46 von Tobias McNulty

On Wed, Feb 16, 2011 at 3:32 PM, NeilBrown wrote:
> On Wed, 16 Feb 2011 10:46:32 -0500 Tobias McNulty com>
>> nas:~# mdadm --assemble /dev/md0 --backup-file=3Dmd0.backup /dev/sdc
>> /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
>> mdadm: Failed to restore critical section for reshape, sorry.
>>
>> Is my data lost for good? =A0Is there anything else I can do?
>
> Try above command with --verbose.
> If a message about "too-old timestamp" appears, run
>
> =A0export MDADM_GROW_ALLOW_OLD=3D1
>
> and run the command again.
>
> In either case, post the output.

Wow - it looks like that might have done the trick:

nas:~# mdadm --verbose --assemble /dev/md0 --backup-file=3Dmd0.backup
/dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdc is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sde is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdf is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdg is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdh is identified as a member of /dev/md0, slot 4.
mdadm:/dev/md0 has an active reshape - checking if critical section
needs to be restored
mdadm: too-old timestamp on backup-metadata on md0.backup
mdadm: Failed to find backup of critical section
mdadm: Failed to restore critical section for reshape, sorry.
nas:~# export MDADM_GROW_ALLOW_OLD=3D1
nas:~# mdadm --verbose --assemble /dev/md0 --backup-file=3Dmd0.backup
/dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdc is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sde is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdf is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdg is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdh is identified as a member of /dev/md0, slot 4.
mdadm:/dev/md0 has an active reshape - checking if critical section
needs to be restored
mdadm: accepting backup with timestamp 1297624561 for array with
timestamp 1297692473
mdadm: restoring critical section
mdadm: added /dev/sde to /dev/md0 as 1
mdadm: added /dev/sdd to /dev/md0 as 2
mdadm: added /dev/sdc to /dev/md0 as 3
mdadm: added /dev/sdh to /dev/md0 as 4
mdadm: added /dev/sdg to /dev/md0 as 5
mdadm: added /dev/sdf to /dev/md0 as 0
mdadm: /dev/md0 has been started with 5 drives and 1 spare.

Now I see this in /etc/mdstat:

md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1]
5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5/5=
] [UUUUU]
[=3D>...................] reshape =3D 9.9% (193691648/195351449=
6)
finish=3D97156886.4min speed=3D0K/sec

Is the 0K/sec something I need to worry about?

Thanks!
Tobias
--=20
Tobias McNulty, Managing Partner
Caktus Consulting Group, LLC
http://www.caktusgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: reshape failure

am 16.02.2011 22:06:34 von NeilBrown

On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty om>
wrote:

> On Wed, Feb 16, 2011 at 3:32 PM, NeilBrown wrote:
> > On Wed, 16 Feb 2011 10:46:32 -0500 Tobias McNulty up.com>
> >> nas:~# mdadm --assemble /dev/md0 --backup-file=3Dmd0.backup /dev/s=
dc
> >> /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
> >> mdadm: Failed to restore critical section for reshape, sorry.
> >>
> >> Is my data lost for good? =A0Is there anything else I can do?
> >
> > Try above command with --verbose.
> > If a message about "too-old timestamp" appears, run
> >
> > =A0export MDADM_GROW_ALLOW_OLD=3D1
> >
> > and run the command again.
> >
> > In either case, post the output.
>=20
> Wow - it looks like that might have done the trick:
>=20
> nas:~# mdadm --verbose --assemble /dev/md0 --backup-file=3Dmd0.backup
> /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sdc is identified as a member of /dev/md0, slot 3.
> mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2.
> mdadm: /dev/sde is identified as a member of /dev/md0, slot 1.
> mdadm: /dev/sdf is identified as a member of /dev/md0, slot 0.
> mdadm: /dev/sdg is identified as a member of /dev/md0, slot 5.
> mdadm: /dev/sdh is identified as a member of /dev/md0, slot 4.
> mdadm:/dev/md0 has an active reshape - checking if critical section
> needs to be restored
> mdadm: too-old timestamp on backup-metadata on md0.backup
> mdadm: Failed to find backup of critical section
> mdadm: Failed to restore critical section for reshape, sorry.
> nas:~# export MDADM_GROW_ALLOW_OLD=3D1
> nas:~# mdadm --verbose --assemble /dev/md0 --backup-file=3Dmd0.backup
> /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sdc is identified as a member of /dev/md0, slot 3.
> mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2.
> mdadm: /dev/sde is identified as a member of /dev/md0, slot 1.
> mdadm: /dev/sdf is identified as a member of /dev/md0, slot 0.
> mdadm: /dev/sdg is identified as a member of /dev/md0, slot 5.
> mdadm: /dev/sdh is identified as a member of /dev/md0, slot 4.
> mdadm:/dev/md0 has an active reshape - checking if critical section
> needs to be restored
> mdadm: accepting backup with timestamp 1297624561 for array with
> timestamp 1297692473
> mdadm: restoring critical section
> mdadm: added /dev/sde to /dev/md0 as 1
> mdadm: added /dev/sdd to /dev/md0 as 2
> mdadm: added /dev/sdc to /dev/md0 as 3
> mdadm: added /dev/sdh to /dev/md0 as 4
> mdadm: added /dev/sdg to /dev/md0 as 5
> mdadm: added /dev/sdf to /dev/md0 as 0
> mdadm: /dev/md0 has been started with 5 drives and 1 spare.

That is what I expected..

>=20
> Now I see this in /etc/mdstat:
>=20
> md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1]
> 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2 [5=
/5] [UUUUU]
> [=3D>...................] reshape =3D 9.9% (193691648/1953514=
496)
> finish=3D97156886.4min speed=3D0K/sec
>=20
> Is the 0K/sec something I need to worry about?

Maybe. If the stays at 0K/sec and the 9.9% stays at 9.9%, then yes. I=
t is
something to worry about.

Is there an 'mdadm' running in the background? Can you 'strace' it for=
a few
seconds?

What does
grep . /sys/block/md0/md/*
show? Maybe do it twice, 1 minute apart.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: reshape failure

am 17.02.2011 22:39:37 von Tobias McNulty

On Wed, Feb 16, 2011 at 4:06 PM, NeilBrown wrote:
>
> On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty com>
> wrote:
> >
> > Now I see this in /etc/mdstat:
> >
> > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1]
> > =A0 =A0 =A0 5860543488 blocks super 0.91 level 6, 64k chunk, algori=
thm 2 [5/5] [UUUUU]
> > =A0 =A0 =A0 [=3D>...................] =A0reshape =3D =A09.9% (19369=
1648/1953514496)
> > finish=3D97156886.4min speed=3D0K/sec
> >
> > Is the 0K/sec something I need to worry about?
>
> Maybe. =A0If the stays at 0K/sec and the 9.9% stays at 9.9%, then yes=
=A0It is
> something to worry about.

It seems like it was another buggy SATA HBA?? I moved everything back
to the on-board SATA ports (1 of the 2 drives in the OS RAID1 device
and the 5 non-spare devices in the RAID6 -> RAID5 device) and it's
happily reshaping again (even without the=A0MDADM_GROW_ALLOW_OLD magic
this time):

md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
=A0 =A05860543488 blocks super 0.91 level 6, 64k chunk, algorith=
m 2 [5/5] [UUUUU]
=A0 =A0[==>..................] =A0reshape =3D 10.0% (1969601=
92/1953514496)
finish=3D11376.9min speed=3D2572K/sec

Is it really possible that I had two buggy SATA cards, from different
manufacturers? Perhaps the motherboard is at fault? Or am I missing
something very basic about connecting SATA drives to something other
than the on-board ports?

Currently I'm using a SuperMicro X7SPA-HF [1] motherboard with a
AOC-SASLP-MV8 [2] HBA, and the machine is running Debian squeeze
(2.6.32-5-amd64).

Tobias

[1] http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7S PA.cfm?=
typ=3DH&IPMI=3DY
[2] http://www.supermicro.com/products/accessories/addon/AOC-SAS LP-MV8.=
cfm
--
Tobias McNulty, Managing Partner
Caktus Consulting Group, LLC
http://www.caktusgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: reshape failure

am 11.05.2011 20:06:23 von Tobias McNulty

On Thu, Feb 17, 2011 at 4:39 PM, Tobias McNulty > wrote:
>
> On Wed, Feb 16, 2011 at 4:06 PM, NeilBrown wrote:
> >
> > On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty up.com>
> > wrote:
> > >
> > > Now I see this in /etc/mdstat:
> > >
> > > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1]
> > > =A0 =A0 =A0 5860543488 blocks super 0.91 level 6, 64k chunk, algo=
rithm 2 [5/5] [UUUUU]
> > > =A0 =A0 =A0 [=3D>...................] =A0reshape =3D =A09.9% (193=
691648/1953514496)
> > > finish=3D97156886.4min speed=3D0K/sec
> > >
> > > Is the 0K/sec something I need to worry about?
> >
> > Maybe. =A0If the stays at 0K/sec and the 9.9% stays at 9.9%, then y=
es. =A0It is
> > something to worry about.
>
> It seems like it was another buggy SATA HBA?? I moved everything back
> to the on-board SATA ports (1 of the 2 drives in the OS RAID1 device
> and the 5 non-spare devices in the RAID6 -> RAID5 device) and it's
> happily reshaping again (even without the=A0MDADM_GROW_ALLOW_OLD magi=
c
> this time):
>
> md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
> =A0 =A05860543488 blocks super 0.91 level 6, 64k chunk, algori=
thm 2 [5/5] [UUUUU]
> =A0 =A0[==>..................] =A0reshape =3D 10.0% (19696=
0192/1953514496)
> finish=3D11376.9min speed=3D2572K/sec
>
> Is it really possible that I had two buggy SATA cards, from different
> manufacturers? =A0Perhaps the motherboard is at fault? =A0Or am I mis=
sing
> something very basic about connecting SATA drives to something other
> than the on-board ports?
>
> Currently I'm using a SuperMicro X7SPA-HF [1] motherboard with a
> AOC-SASLP-MV8 [2] HBA, and the machine is running Debian squeeze
> (2.6.32-5-amd64).
>
> Tobias
>
> [1] http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7S PA.cf=
m?typ=3DH&IPMI=3DY
> [2] http://www.supermicro.com/products/accessories/addon/AOC-SAS LP-MV=
8.cfm

So, after figuring out the hardware issues, the reshape appears to
have completed successfully (hurray!), but /proc/mdstat still says
that the array is level 6. Is there another command I have to run to
put the finishing touches on the conversion?

md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
5860543488 blocks level 6, 64k chunk, algorithm 18 [5/5] [UUUUU]

Thank you!
Tobias
--
Tobias McNulty, Managing Partner
Caktus Consulting Group, LLC
http://www.caktusgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: reshape failure

am 11.05.2011 23:12:57 von NeilBrown

On Wed, 11 May 2011 14:06:23 -0400 Tobias McNulty om>
wrote:

> On Thu, Feb 17, 2011 at 4:39 PM, Tobias McNulty om> wrote:
> >
> > On Wed, Feb 16, 2011 at 4:06 PM, NeilBrown wrote:
> > >
> > > On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty roup.com>
> > > wrote:
> > > >
> > > > Now I see this in /etc/mdstat:
> > > >
> > > > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1]
> > > > =A0 =A0 =A0 5860543488 blocks super 0.91 level 6, 64k chunk, al=
gorithm 2 [5/5] [UUUUU]
> > > > =A0 =A0 =A0 [=3D>...................] =A0reshape =3D =A09.9% (1=
93691648/1953514496)
> > > > finish=3D97156886.4min speed=3D0K/sec
> > > >
> > > > Is the 0K/sec something I need to worry about?
> > >
> > > Maybe. =A0If the stays at 0K/sec and the 9.9% stays at 9.9%, then=
yes. =A0It is
> > > something to worry about.
> >
> > It seems like it was another buggy SATA HBA?? I moved everything ba=
ck
> > to the on-board SATA ports (1 of the 2 drives in the OS RAID1 devic=
e
> > and the 5 non-spare devices in the RAID6 -> RAID5 device) and it's
> > happily reshaping again (even without the=A0MDADM_GROW_ALLOW_OLD ma=
gic
> > this time):
> >
> > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
> > =A0 =A05860543488 blocks super 0.91 level 6, 64k chunk, algo=
rithm 2 [5/5] [UUUUU]
> > =A0 =A0[==>..................] =A0reshape =3D 10.0% (196=
960192/1953514496)
> > finish=3D11376.9min speed=3D2572K/sec
> >
> > Is it really possible that I had two buggy SATA cards, from differe=
nt
> > manufacturers? =A0Perhaps the motherboard is at fault? =A0Or am I m=
issing
> > something very basic about connecting SATA drives to something othe=
r
> > than the on-board ports?
> >
> > Currently I'm using a SuperMicro X7SPA-HF [1] motherboard with a
> > AOC-SASLP-MV8 [2] HBA, and the machine is running Debian squeeze
> > (2.6.32-5-amd64).
> >
> > Tobias
> >
> > [1] http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7S PA.=
cfm?typ=3DH&IPMI=3DY
> > [2] http://www.supermicro.com/products/accessories/addon/AOC-SAS LP-=
MV8.cfm
>=20
> So, after figuring out the hardware issues, the reshape appears to
> have completed successfully (hurray!), but /proc/mdstat still says
> that the array is level 6. Is there another command I have to run to
> put the finishing touches on the conversion?
>=20
> md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
> 5860543488 blocks level 6, 64k chunk, algorithm 18 [5/5] [UUUUU=
]
>=20

Just
mdadm --grow /dev/md0 --level=3D5

should complete instantly. (assuming I'm correct in thinking that you w=
ant
this to be a raid5 array - I don't really remember the details anymore =
:-)

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: reshape failure

am 11.05.2011 23:19:48 von Tobias McNulty

On Wed, May 11, 2011 at 5:12 PM, NeilBrown wrote:
>
> On Wed, 11 May 2011 14:06:23 -0400 Tobias McNulty com>
> wrote:
>
> > On Thu, Feb 17, 2011 at 4:39 PM, Tobias McNulty com> wrote:
> > >
> > > On Wed, Feb 16, 2011 at 4:06 PM, NeilBrown wrote:
> > > >
> > > > On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty sgroup.com>
> > > > wrote:
> > > > >
> > > > > Now I see this in /etc/mdstat:
> > > > >
> > > > > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[=
1]
> > > > > =A0 =A0 =A0 5860543488 blocks super 0.91 level 6, 64k chunk, =
algorithm 2 [5/5] [UUUUU]
> > > > > =A0 =A0 =A0 [=3D>...................] =A0reshape =3D =A09.9% =
(193691648/1953514496)
> > > > > finish=3D97156886.4min speed=3D0K/sec
> > > > >
> > > > > Is the 0K/sec something I need to worry about?
> > > >
> > > > Maybe. =A0If the stays at 0K/sec and the 9.9% stays at 9.9%, th=
en yes. =A0It is
> > > > something to worry about.
> > >
> > > It seems like it was another buggy SATA HBA?? I moved everything =
back
> > > to the on-board SATA ports (1 of the 2 drives in the OS RAID1 dev=
ice
> > > and the 5 non-spare devices in the RAID6 -> RAID5 device) and it'=
s
> > > happily reshaping again (even without the=A0MDADM_GROW_ALLOW_OLD =
magic
> > > this time):
> > >
> > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
> > > =A0 =A05860543488 blocks super 0.91 level 6, 64k chunk, al=
gorithm 2 [5/5] [UUUUU]
> > > =A0 =A0[==>..................] =A0reshape =3D 10.0% (1=
96960192/1953514496)
> > > finish=3D11376.9min speed=3D2572K/sec
> > >
> > > Is it really possible that I had two buggy SATA cards, from diffe=
rent
> > > manufacturers? =A0Perhaps the motherboard is at fault? =A0Or am I=
missing
> > > something very basic about connecting SATA drives to something ot=
her
> > > than the on-board ports?
> > >
> > > Currently I'm using a SuperMicro X7SPA-HF [1] motherboard with a
> > > AOC-SASLP-MV8 [2] HBA, and the machine is running Debian squeeze
> > > (2.6.32-5-amd64).
> > >
> > > Tobias
> > >
> > > [1] http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7S P=
A.cfm?typ=3DH&IPMI=3DY
> > > [2] http://www.supermicro.com/products/accessories/addon/AOC-SAS L=
P-MV8.cfm
> >
> > So, after figuring out the hardware issues, the reshape appears to
> > have completed successfully (hurray!), but /proc/mdstat still says
> > that the array is level 6. =A0Is there another command I have to ru=
n to
> > put the finishing touches on the conversion?
> >
> > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
> > =A0 =A0 =A0 5860543488 blocks level 6, 64k chunk, algorithm 18 [5/5=
] [UUUUU]
> >
>
> Just
> =A0 mdadm --grow /dev/md0 --level=3D5
>
> should complete instantly. (assuming I'm correct in thinking that you=
want
> this to be a raid5 array - I don't really remember the details anymor=
e :-)

Bingo! Thanks.

md0 : active raid5 sda[0] sde[4](S) sdd[3] sdc[2] sdb[1]
5860543488 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

And I even ended up with a spare disk (wasn't sure how that part was
going to work).

Do you always have to run that command twice, or only if the reshape
is interrupted? At least, I thought that was the same command I ran
originally to kick it off.

Thanks again.

Tobias
--
Tobias McNulty, Managing Partner
Caktus Consulting Group, LLC
http://www.caktusgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: reshape failure

am 11.05.2011 23:34:46 von NeilBrown

On Wed, 11 May 2011 17:18:14 -0400 Tobias McNulty
wrote:

> On Wed, May 11, 2011 at 5:12 PM, NeilBrown wrote:
>
> > On Wed, 11 May 2011 14:06:23 -0400 Tobias McNulty
> > wrote:
> >
> > > On Thu, Feb 17, 2011 at 4:39 PM, Tobias McNulty
> > wrote:
> > > >
> > > > On Wed, Feb 16, 2011 at 4:06 PM, NeilBrown wrote:
> > > > >
> > > > > On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty <
> > tobias@caktusgroup.com>
> > > > > wrote:
> > > > > >
> > > > > > Now I see this in /etc/mdstat:
> > > > > >
> > > > > > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] sde[1]
> > > > > > 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2
> > [5/5] [UUUUU]
> > > > > > [=>...................] reshape = 9.9%
> > (193691648/1953514496)
> > > > > > finish=97156886.4min speed=0K/sec
> > > > > >
> > > > > > Is the 0K/sec something I need to worry about?
> > > > >
> > > > > Maybe. If the stays at 0K/sec and the 9.9% stays at 9.9%, then yes.
> > It is
> > > > > something to worry about.
> > > >
> > > > It seems like it was another buggy SATA HBA?? I moved everything back
> > > > to the on-board SATA ports (1 of the 2 drives in the OS RAID1 device
> > > > and the 5 non-spare devices in the RAID6 -> RAID5 device) and it's
> > > > happily reshaping again (even without the MDADM_GROW_ALLOW_OLD magic
> > > > this time):
> > > >
> > > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
> > > > 5860543488 blocks super 0.91 level 6, 64k chunk, algorithm 2
> > [5/5] [UUUUU]
> > > > [==>..................] reshape = 10.0% (196960192/1953514496)
> > > > finish=11376.9min speed=2572K/sec
> > > >
> > > > Is it really possible that I had two buggy SATA cards, from different
> > > > manufacturers? Perhaps the motherboard is at fault? Or am I missing
> > > > something very basic about connecting SATA drives to something other
> > > > than the on-board ports?
> > > >
> > > > Currently I'm using a SuperMicro X7SPA-HF [1] motherboard with a
> > > > AOC-SASLP-MV8 [2] HBA, and the machine is running Debian squeeze
> > > > (2.6.32-5-amd64).
> > > >
> > > > Tobias
> > > >
> > > > [1]
> > http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7S PA.cfm?typ=H&IPMI=Y
> > > > [2]
> > http://www.supermicro.com/products/accessories/addon/AOC-SAS LP-MV8.cfm
> > >
> > > So, after figuring out the hardware issues, the reshape appears to
> > > have completed successfully (hurray!), but /proc/mdstat still says
> > > that the array is level 6. Is there another command I have to run to
> > > put the finishing touches on the conversion?
> > >
> > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
> > > 5860543488 blocks level 6, 64k chunk, algorithm 18 [5/5] [UUUUU]
> > >
> >
> > Just
> > mdadm --grow /dev/md0 --level=5
> >
> > should complete instantly. (assuming I'm correct in thinking that you want
> > this to be a raid5 array - I don't really remember the details anymore :-)
>
>
> Bingo! Thanks.
>
> md0 : active raid5 sda[0] sde[4](S) sdd[3] sdc[2] sdb[1]
> 5860543488 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
>
> And I even ended up with a spare disk (wasn't sure how that part was going
> to work).
>
> Do you always have to run that command twice, or only if the reshape is
> interrupted? At least, I thought that was the same command I ran originally
> to kick it off.

Only if it is interrupted. The array doesn't know that a level change is
needed after the layout change is completed, only the mdadm process knows
that. And it has died.

I could probably get the array itself to 'know' this... one day.

NeilBrown

>
> Thanks again.
>
> Tobias

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: reshape failure

am 12.05.2011 02:46:48 von Tobias McNulty

On Wed, May 11, 2011 at 5:34 PM, NeilBrown wrote:
> On Wed, 11 May 2011 17:18:14 -0400 Tobias McNulty com>
> wrote:
>
>> On Wed, May 11, 2011 at 5:12 PM, NeilBrown wrote:
>>
>> > On Wed, 11 May 2011 14:06:23 -0400 Tobias McNulty oup.com>
>> > wrote:
>> >
>> > > On Thu, Feb 17, 2011 at 4:39 PM, Tobias McNulty oup.com>
>> > wrote:
>> > > >
>> > > > On Wed, Feb 16, 2011 at 4:06 PM, NeilBrown wro=
te:
>> > > > >
>> > > > > On Wed, 16 Feb 2011 15:41:46 -0500 Tobias McNulty <
>> > tobias@caktusgroup.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > Now I see this in /etc/mdstat:
>> > > > > >
>> > > > > > md0 : active raid6 sdf[0] sdg[5](S) sdh[4] sdc[3] sdd[2] s=
de[1]
>> > > > > > =A0 =A0 =A0 5860543488 blocks super 0.91 level 6, 64k chun=
k, algorithm 2
>> > [5/5] [UUUUU]
>> > > > > > =A0 =A0 =A0 [=3D>...................] =A0reshape =3D =A09.=
9%
>> > (193691648/1953514496)
>> > > > > > finish=3D97156886.4min speed=3D0K/sec
>> > > > > >
>> > > > > > Is the 0K/sec something I need to worry about?
>> > > > >
>> > > > > Maybe. =A0If the stays at 0K/sec and the 9.9% stays at 9.9%,=
then yes.
>> > =A0It is
>> > > > > something to worry about.
>> > > >
>> > > > It seems like it was another buggy SATA HBA?? I moved everythi=
ng back
>> > > > to the on-board SATA ports (1 of the 2 drives in the OS RAID1 =
device
>> > > > and the 5 non-spare devices in the RAID6 -> RAID5 device) and =
it's
>> > > > happily reshaping again (even without the MDADM_GROW_ALLOW_OLD=
magic
>> > > > this time):
>> > > >
>> > > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
>> > > > =A0 =A0 =A0 5860543488 blocks super 0.91 level 6, 64k chunk, a=
lgorithm 2
>> > [5/5] [UUUUU]
>> > > > =A0 =A0 =A0 [==>..................] =A0reshape =3D 10.0% (=
196960192/1953514496)
>> > > > finish=3D11376.9min speed=3D2572K/sec
>> > > >
>> > > > Is it really possible that I had two buggy SATA cards, from di=
fferent
>> > > > manufacturers? =A0Perhaps the motherboard is at fault? =A0Or a=
m I missing
>> > > > something very basic about connecting SATA drives to something=
other
>> > > > than the on-board ports?
>> > > >
>> > > > Currently I'm using a SuperMicro X7SPA-HF [1] motherboard with=
a
>> > > > AOC-SASLP-MV8 [2] HBA, and the machine is running Debian squee=
ze
>> > > > (2.6.32-5-amd64).
>> > > >
>> > > > Tobias
>> > > >
>> > > > [1]
>> > http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7S PA.cfm=
?typ=3DH&IPMI=3DY
>> > > > [2]
>> > http://www.supermicro.com/products/accessories/addon/AOC-SAS LP-MV8=
cfm
>> > >
>> > > So, after figuring out the hardware issues, the reshape appears =
to
>> > > have completed successfully (hurray!), but /proc/mdstat still sa=
ys
>> > > that the array is level 6. =A0Is there another command I have to=
run to
>> > > put the finishing touches on the conversion?
>> > >
>> > > md0 : active raid6 sda[0] sde[4] sdd[3] sdc[2] sdb[1]
>> > > =A0 =A0 =A0 5860543488 blocks level 6, 64k chunk, algorithm 18 [=
5/5] [UUUUU]
>> > >
>> >
>> > Just
>> > =A0 mdadm --grow /dev/md0 --level=3D5
>> >
>> > should complete instantly. (assuming I'm correct in thinking that =
you want
>> > this to be a raid5 array - I don't really remember the details any=
more :-)
>>
>>
>> Bingo! =A0Thanks.
>>
>> md0 : active raid5 sda[0] sde[4](S) sdd[3] sdc[2] sdb[1]
>> =A0 =A0 =A0 5860543488 blocks level 5, 64k chunk, algorithm 2 [4/4] =
[UUUU]
>>
>> And I even ended up with a spare disk (wasn't sure how that part was=
going
>> to work).
>>
>> Do you always have to run that command twice, or only if the reshape=
is
>> interrupted? =A0At least, I thought that was the same command I ran =
originally
>> to kick it off.
>
> Only if it is interrupted. =A0The array doesn't know that a level cha=
nge is
> needed after the layout change is completed, only the mdadm process k=
nows
> that. =A0And it has died.
>
> I could probably get the array itself to 'know' this... one day.
>
> NeilBrown

Hey, it makes perfect sense to me know that I know it's the expected
behavior. I might have even tried it myself if I wasn't worried about
screwing up the array, again. :-)

Thanks
Tobias
--=20
Tobias McNulty, Managing Partner
Caktus Consulting Group, LLC
http://www.caktusgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html