New RAID causing system lockups

am 11.09.2010 20:20:40 von Mike Hartman

PART 3:

Update:

I'm even more concerned about this now, because I just started the
newest reshaping to add a new drive with:

mdadm --grow -c 256 --raid-devices=5 --backup-file=/grow_md0.bak /dev/md0

And the system output:

mdadm: Need to backup 768K of critical section..

cat /proc/mdstat shows the reshaping is proceeding,

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid6 sdi1[0] sdf1[5] md1p1[4] sdj1[3] sdh1[1]
2929691136 blocks super 1.2 level 6, 128k chunk, algorithm 2 [5/5] [UUUUU]
[>....................] reshape = 0.0% (56576/1464845568)
finish=2156.9min speed=11315K/sec

md1 : active raid0 sdg1[0] sdk1[1]
1465141760 blocks super 1.2 128k chunks

unused devices:

but I've checked for /grow_md0.bak and it's not there. So it looks
like for some reason it ignored my backup file option.

This scares me, because if I experience the lockup again and am forced
to reboot, without a backup file I'm afraid my array will be hosed.
I'm also afraid to stop it cleanly right now for the same reason.

So in addition to fixing the lockup itself, does anyone know if
there's a way to either cancel this reshaping or belatedly add the
backup file in a different way so it will be recoverable? It's only at
1% and says it will take another 2193 minutes.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: New RAID causing system lockups

am 11.09.2010 20:45:04 von Mike Hartman

Every time I try to send the list a message with the relevant
attachments the message never gets there, so I've given up and posted
them here:

http://www.hartmanipulation.com/raid/

Includes lspci -v, kernel config, dmesg outputs from both lockups.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: New RAID causing system lockups

am 11.09.2010 22:43:08 von NeilBrown

On Sat, 11 Sep 2010 14:20:40 -0400
Mike Hartman wrote:

> PART 3:
>
> Update:
>
> I'm even more concerned about this now, because I just started the
> newest reshaping to add a new drive with:
>
> mdadm --grow -c 256 --raid-devices=5 --backup-file=/grow_md0.bak /dev/md0
>
> And the system output:
>
> mdadm: Need to backup 768K of critical section..
>
> cat /proc/mdstat shows the reshaping is proceeding,
>
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> md0 : active raid6 sdi1[0] sdf1[5] md1p1[4] sdj1[3] sdh1[1]
> 2929691136 blocks super 1.2 level 6, 128k chunk, algorithm 2 [5/5] [UUUUU]
> [>....................] reshape = 0.0% (56576/1464845568)
> finish=2156.9min speed=11315K/sec
>
> md1 : active raid0 sdg1[0] sdk1[1]
> 1465141760 blocks super 1.2 128k chunks
>
> unused devices:
>
> but I've checked for /grow_md0.bak and it's not there. So it looks
> like for some reason it ignored my backup file option.

It didn't.

When you making an array larger, you only need the backup file for a small
'critical region' at the beginning of the reshape - 768K worth in your case.

Once that is complete the backup-file is not needed and so is removed.

So your current situation is no worse that before.

[When making an array smaller, the critical section happen and the very end,
so mdadm keeps the backup file around - unused - until then. Then uses it
quickly and completes. When reshaping an array without changing the size the
'critical section' lasts for the entire time so a backup file is needed and
is very heavily used]

I don't know yet what is causing the lock-up. A quick look at your logs
suggest that it could be related to the barrier handling. Maybe trying to
handle a barrier during a reshape is prone to races of some sort - I wouldn't
be very surprised by that.

I'll have a look at the code and see what I can find.

Thanks for the report,
NeilBrown

>
> This scares me, because if I experience the lockup again and am forced
> to reboot, without a backup file I'm afraid my array will be hosed.
> I'm also afraid to stop it cleanly right now for the same reason.
>
> So in addition to fixing the lockup itself, does anyone know if
> there's a way to either cancel this reshaping or belatedly add the
> backup file in a different way so it will be recoverable? It's only at
> 1% and says it will take another 2193 minutes.
>
> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: New RAID causing system lockups

am 11.09.2010 22:56:56 von Mike Hartman

On Sat, Sep 11, 2010 at 4:43 PM, Neil Brown wrote:
> On Sat, 11 Sep 2010 14:20:40 -0400
> Mike Hartman wrote:
>
>> PART 3:
>>
>> Update:
>>
>> I'm even more concerned about this now, because I just started the
>> newest reshaping to add a new drive with:
>>
>> mdadm --grow -c 256 --raid-devices=3D5 --backup-file=3D/grow_md0.bak=
/dev/md0
>>
>> And the system output:
>>
>> mdadm: Need to backup 768K of critical section..
>>
>> cat /proc/mdstat shows the reshaping is proceeding,
>>
>> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [r=
aid4]
>> md0 : active raid6 sdi1[0] sdf1[5] md1p1[4] sdj1[3] sdh1[1]
>> =A0 =A0 =A0 2929691136 blocks super 1.2 level 6, 128k chunk, algorit=
hm 2 [5/5] [UUUUU]
>> =A0 =A0 =A0 [>....................] =A0reshape =3D =A00.0% (56576/14=
64845568)
>> finish=3D2156.9min speed=3D11315K/sec
>>
>> md1 : active raid0 sdg1[0] sdk1[1]
>> =A0 =A0 =A0 1465141760 blocks super 1.2 128k chunks
>>
>> unused devices:
>>
>> but I've checked for /grow_md0.bak and it's not there. So it looks
>> like for some reason it ignored my backup file option.
>
> It didn't.
>
> When you making an array larger, you only need the backup file for a =
small
> 'critical region' at the beginning of the reshape - 768K worth in you=
r case.
>
> Once that is complete the backup-file is not needed and so is removed=

>
> So your current situation is no worse that before.

Ok. When I did the reshape from RAID 5 to RAID 6 (moving from 3 disks
to 4) it kept the backup file around until at least 13% (since that's
when it locked and I had to restart it with the backup) but I imagine
that's a less common case than just growing an array. Your comments
give me renewed confidence.

>
> [When making an array smaller, the critical section happen and the ve=
ry end,
> so mdadm keeps the backup file around - unused - until then. =A0Then =
uses it
> quickly and completes. =A0When reshaping an array without changing th=
e size the
> 'critical section' lasts for the entire time so a backup file is need=
ed and
> is very heavily used]
>
> I don't know yet what is causing the lock-up. =A0A quick look at your=
logs
> suggest that it could be related to the barrier handling. =A0Maybe tr=
ying to
> handle a barrier during a reshape is prone to races of some sort - I =
wouldn't
> be very surprised by that.

Just note that during the second lockup no reshape or resync was going
on. The array state was stable, I was just writing to it.

>
> I'll have a look at the code and see what I can find.

Thanks a lot. If it was only a risk when I was growing/reshaping the
array, and covered by the backup file, it would just be an
inconvenience. But since it can seemingly happen at any time it's a
problem.

>
> Thanks for the report,
> NeilBrown
>
>
>>
>> This scares me, because if I experience the lockup again and am forc=
ed
>> to reboot, without a backup file I'm afraid my array will be hosed.
>> I'm also afraid to stop it cleanly right now for the same reason.
>>
>> So in addition to fixing the lockup itself, does anyone know if
>> there's a way to either cancel this reshaping or belatedly add the
>> backup file in a different way so it will be recoverable? It's only =
at
>> 1% and says it will take another 2193 minutes.
>>
>> Mike
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: New RAID causing system lockups

am 13.09.2010 08:28:13 von Mike Hartman

>> I don't know yet what is causing the lock-up. =A0A quick look at you=
r logs
>> suggest that it could be related to the barrier handling. =A0Maybe t=
rying to
>> handle a barrier during a reshape is prone to races of some sort - I=
wouldn't
>> be very surprised by that.
>
> Just note that during the second lockup no reshape or resync was goin=
g
> on. The array state was stable, I was just writing to it.
>
>>
>> I'll have a look at the code and see what I can find.
>
> Thanks a lot. If it was only a risk when I was growing/reshaping the
> array, and covered by the backup file, it would just be an
> inconvenience. But since it can seemingly happen at any time it's a
> problem.
>

The lockup just happened again. I wasn't doing any
growing/reshaping/anything like that. Just copying some data into the
partition that lives on md0. dmesg_3.txt has been uploaded alongside
the other files at http://www.hartmanipulation.com/raid/. The trace
looks pretty similar to me.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: New RAID causing system lockups

am 13.09.2010 17:57:03 von Mike Hartman

>>> I don't know yet what is causing the lock-up. =A0A quick look at yo=
ur logs
>>> suggest that it could be related to the barrier handling. =A0Maybe =
trying to
>>> handle a barrier during a reshape is prone to races of some sort - =
I wouldn't
>>> be very surprised by that.
>>
>> Just note that during the second lockup no reshape or resync was goi=
ng
>> on. The array state was stable, I was just writing to it.
>>
>>>
>>> I'll have a look at the code and see what I can find.
>>
>> Thanks a lot. If it was only a risk when I was growing/reshaping the
>> array, and covered by the backup file, it would just be an
>> inconvenience. But since it can seemingly happen at any time it's a
>> problem.
>>
>
> The lockup just happened again. I wasn't doing any
> growing/reshaping/anything like that. Just copying some data into the
> partition that lives on md0. dmesg_3.txt has been uploaded alongside
> the other files at http://www.hartmanipulation.com/raid/. The trace
> looks pretty similar to me.
>

The lockup just happened for the fourth time, less than an hour after
I rebooted to clear the previous lockup from last night. All I did was
boot the system, start the RAID, and start copying some files onto it.
The problem seems to be getting worse - up until now I got at least a
full day of fairly heavy usage out of the system before it happened.
dmesg_4.txt has been uploaded alongside the other files. Let me know
if there's any other system information that would be useful.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: New RAID causing system lockups

am 14.09.2010 01:51:11 von NeilBrown

On Mon, 13 Sep 2010 11:57:03 -0400
Mike Hartman wrote:

> >>> I don't know yet what is causing the lock-up. Â A quick look =
at your logs
> >>> suggest that it could be related to the barrier handling. Â M=
aybe trying to
> >>> handle a barrier during a reshape is prone to races of some sort =
- I wouldn't
> >>> be very surprised by that.
> >>
> >> Just note that during the second lockup no reshape or resync was g=
oing
> >> on. The array state was stable, I was just writing to it.
> >>
> >>>
> >>> I'll have a look at the code and see what I can find.
> >>
> >> Thanks a lot. If it was only a risk when I was growing/reshaping t=
he
> >> array, and covered by the backup file, it would just be an
> >> inconvenience. But since it can seemingly happen at any time it's =
a
> >> problem.
> >>
> >
> > The lockup just happened again. I wasn't doing any
> > growing/reshaping/anything like that. Just copying some data into t=
he
> > partition that lives on md0. dmesg_3.txt has been uploaded alongsid=
e
> > the other files at http://www.hartmanipulation.com/raid/. The trace
> > looks pretty similar to me.
> >
>=20
> The lockup just happened for the fourth time, less than an hour after
> I rebooted to clear the previous lockup from last night. All I did wa=
s
> boot the system, start the RAID, and start copying some files onto it=

> The problem seems to be getting worse - up until now I got at least a
> full day of fairly heavy usage out of the system before it happened.
> dmesg_4.txt has been uploaded alongside the other files. Let me know
> if there's any other system information that would be useful.
>=20
> Mike

Hi Mike,
thanks for the updates.

I'm not entirely clear what is happening (in fact, due to a cold that I=
am
still fighting off, nothing is entirely clear at the moment), but it lo=
oks
very likely that the problem is due to an interplay between barrier han=
dling,
and the multi-level structure of your array (a raid0 being a member of =
a
raid5).

When a barrier request is processed, both arrays will schedule 'work' t=
o be
done by the 'event' thread and I'm guess that you can get into a situat=
ion
where one work time is wait for the other, but the other is behind the =
one on
the single queue (I wonder if that make sense...)

Anyway, this patch might make a difference, It reduced the number of w=
ork
items schedule in a way that could conceivably fix the problem.

If you can test this, please report the results. I cannot easily repro=
duce
the problem so there is limited testing that I can do.

Thanks,
NeilBrown

diff --git a/drivers/md/md.c b/drivers/md/md.c
index f20d13e..7f2785c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -294,6 +294,23 @@ EXPORT_SYMBOL(mddev_congested);
=20
#define POST_REQUEST_BARRIER ((void*)1)
=20
+static void md_barrier_done(mddev_t *mddev)
+{
+ struct bio *bio =3D mddev->barrier;
+
+ if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
+ bio_endio(bio, -EOPNOTSUPP);
+ else if (bio->bi_size == 0)
+ bio_endio(bio, 0);
+ else {
+ /* other options need to be handled from process context */
+ schedule_work(&mddev->barrier_work);
+ return;
+ }
+ mddev->barrier =3D NULL;
+ wake_up(&mddev->sb_wait);
+}
+
static void md_end_barrier(struct bio *bio, int err)
{
mdk_rdev_t *rdev =3D bio->bi_private;
@@ -310,7 +327,7 @@ static void md_end_barrier(struct bio *bio, int err=
)
wake_up(&mddev->sb_wait);
} else
/* The pre-request barrier has finished */
- schedule_work(&mddev->barrier_work);
+ md_barrier_done(mddev);
}
bio_put(bio);
}
@@ -350,18 +367,12 @@ static void md_submit_barrier(struct work_struct =
*ws)
=20
atomic_set(&mddev->flush_pending, 1);
=20
- if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
- bio_endio(bio, -EOPNOTSUPP);
- else if (bio->bi_size == 0)
- /* an empty barrier - all done */
- bio_endio(bio, 0);
- else {
- bio->bi_rw &=3D ~REQ_HARDBARRIER;
- if (mddev->pers->make_request(mddev, bio))
- generic_make_request(bio);
- mddev->barrier =3D POST_REQUEST_BARRIER;
- submit_barriers(mddev);
- }
+ bio->bi_rw &=3D ~REQ_HARDBARRIER;
+ if (mddev->pers->make_request(mddev, bio))
+ generic_make_request(bio);
+ mddev->barrier =3D POST_REQUEST_BARRIER;
+ submit_barriers(mddev);
+
if (atomic_dec_and_test(&mddev->flush_pending)) {
mddev->barrier =3D NULL;
wake_up(&mddev->sb_wait);
@@ -383,7 +394,7 @@ void md_barrier_request(mddev_t *mddev, struct bio =
*bio)
submit_barriers(mddev);
=20
if (atomic_dec_and_test(&mddev->flush_pending))
- schedule_work(&mddev->barrier_work);
+ md_barrier_done(mddev);
}
EXPORT_SYMBOL(md_barrier_request);
=20

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: New RAID causing system lockups

am 14.09.2010 03:11:30 von Mike Hartman

=46orgot to include the mailing list on this.

> Hi Mike,
> =A0thanks for the updates.
>
> I'm not entirely clear what is happening (in fact, due to a cold that=
I am
> still fighting off, nothing is entirely clear at the moment), but it =
looks
> very likely that the problem is due to an interplay between barrier h=
andling,
> and the multi-level structure of your array (a raid0 being a member o=
f a
> raid5).
>
> When a barrier request is processed, both arrays will schedule 'work'=
to be
> done by the 'event' thread and I'm guess that you can get into a situ=
ation
> where one work time is wait for the other, but the other is behind th=
e one on
> the single queue (I wonder if that make sense...)
>
> Anyway, this patch might make a difference, =A0It reduced the number =
of work
> items schedule in a way that could conceivably fix the problem.
>
> If you can test this, please report the results. =A0I cannot easily r=
eproduce
> the problem so there is limited testing that I can do.
>
> Thanks,
> NeilBrown
>
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index f20d13e..7f2785c 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -294,6 +294,23 @@ EXPORT_SYMBOL(mddev_congested);
>
> =A0#define POST_REQUEST_BARRIER ((void*)1)
>
> +static void md_barrier_done(mddev_t *mddev)
> +{
> + =A0 =A0 =A0 struct bio *bio =3D mddev->barrier;
> +
> + =A0 =A0 =A0 if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, -EOPNOTSUPP);
> + =A0 =A0 =A0 else if (bio->bi_size == 0)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, 0);
> + =A0 =A0 =A0 else {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* other options need to be handled fro=
m process context */
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 schedule_work(&mddev->barrier_work);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
> + =A0 =A0 =A0 }
> + =A0 =A0 =A0 mddev->barrier =3D NULL;
> + =A0 =A0 =A0 wake_up(&mddev->sb_wait);
> +}
> +
> =A0static void md_end_barrier(struct bio *bio, int err)
> =A0{
> =A0 =A0 =A0 =A0mdk_rdev_t *rdev =3D bio->bi_private;
> @@ -310,7 +327,7 @@ static void md_end_barrier(struct bio *bio, int e=
rr)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0wake_up(&mddev->sb_wai=
t);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} else
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* The pre-request bar=
rier has finished */
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 schedule_work(&mddev->b=
arrier_work);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 md_barrier_done(mddev);
> =A0 =A0 =A0 =A0}
> =A0 =A0 =A0 =A0bio_put(bio);
> =A0}
> @@ -350,18 +367,12 @@ static void md_submit_barrier(struct work_struc=
t *ws)
>
> =A0 =A0 =A0 =A0atomic_set(&mddev->flush_pending, 1);
>
> - =A0 =A0 =A0 if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, -EOPNOTSUPP);
> - =A0 =A0 =A0 else if (bio->bi_size == 0)
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* an empty barrier - all done */
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, 0);
> - =A0 =A0 =A0 else {
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio->bi_rw &=3D ~REQ_HARDBARRIER;
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (mddev->pers->make_request(mddev, bi=
o))
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 generic_make_request(bi=
o);
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 mddev->barrier =3D POST_REQUEST_BARRIER=
;
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 submit_barriers(mddev);
> - =A0 =A0 =A0 }
> + =A0 =A0 =A0 bio->bi_rw &=3D ~REQ_HARDBARRIER;
> + =A0 =A0 =A0 if (mddev->pers->make_request(mddev, bio))
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 generic_make_request(bio);
> + =A0 =A0 =A0 mddev->barrier =3D POST_REQUEST_BARRIER;
> + =A0 =A0 =A0 submit_barriers(mddev);
> +
> =A0 =A0 =A0 =A0if (atomic_dec_and_test(&mddev->flush_pending)) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mddev->barrier =3D NULL;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0wake_up(&mddev->sb_wait);
> @@ -383,7 +394,7 @@ void md_barrier_request(mddev_t *mddev, struct bi=
o *bio)
> =A0 =A0 =A0 =A0submit_barriers(mddev);
>
> =A0 =A0 =A0 =A0if (atomic_dec_and_test(&mddev->flush_pending))
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 schedule_work(&mddev->barrier_work);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 md_barrier_done(mddev);
> =A0}
> =A0EXPORT_SYMBOL(md_barrier_request);
>
>
>

Neil, thanks for the patch. I experienced the lockup for the 5th time
an hour ago (about 3 hours after the last hard reboot) so I thought it
would be a good time to try your patch. Unfortunately I'm getting an
error:

patching file drivers/md/md.c
Hunk #1 succeeded at 291 with fuzz 1 (offset -3 lines).
Hunk #2 FAILED at 324.
Hunk #3 FAILED at 364.
Hunk #4 FAILED at 391.
3 out of 4 hunks FAILED -- saving rejects to file drivers/md/md.c.rej

"uname -r" gives "2.6.35-gentoo-r4", so I suspect that's why. I guess
the standard gentoo patchset does something with that file. I'm
skimming through md.c to see if I can understand it well enough to
apply the patch functionality manually. I've also uploaded my
2.6.35-gentoo-r4 md.c to www.hartmanipulation.com/raid/ with the other
files in case you or someone else wants to take a look at it.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: New RAID causing system lockups

am 14.09.2010 03:35:16 von NeilBrown

On Mon, 13 Sep 2010 21:11:30 -0400
Mike Hartman wrote:

> Forgot to include the mailing list on this.
>=20
> > Hi Mike,
> > Â thanks for the updates.
> >
> > I'm not entirely clear what is happening (in fact, due to a cold th=
at I am
> > still fighting off, nothing is entirely clear at the moment), but i=
t looks
> > very likely that the problem is due to an interplay between barrier=
handling,
> > and the multi-level structure of your array (a raid0 being a member=
of a
> > raid5).
> >
> > When a barrier request is processed, both arrays will schedule 'wor=
k' to be
> > done by the 'event' thread and I'm guess that you can get into a si=
tuation
> > where one work time is wait for the other, but the other is behind =
the one on
> > the single queue (I wonder if that make sense...)
> >
> > Anyway, this patch might make a difference, Â It reduced the nu=
mber of work
> > items schedule in a way that could conceivably fix the problem.
> >
> > If you can test this, please report the results. Â I cannot eas=
ily reproduce
> > the problem so there is limited testing that I can do.
> >
> > Thanks,
> > NeilBrown
> >
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index f20d13e..7f2785c 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -294,6 +294,23 @@ EXPORT_SYMBOL(mddev_congested);
> >
> > Â #define POST_REQUEST_BARRIER ((void*)1)
> >
> > +static void md_barrier_done(mddev_t *mddev)
> > +{
> > + Â Â Â struct bio *bio =3D mddev->barrier;
> > +
> > + Â Â Â if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags)=
)
> > + Â Â Â Â Â Â Â bio_endio(bio, -=
EOPNOTSUPP);
> > + Â Â Â else if (bio->bi_size == 0)
> > + Â Â Â Â Â Â Â bio_endio(bio, 0=
);
> > + Â Â Â else {
> > + Â Â Â Â Â Â Â /* other options=
need to be handled from process context */
> > + Â Â Â Â Â Â Â schedule_work(&m=
ddev->barrier_work);
> > + Â Â Â Â Â Â Â return;
> > + Â Â Â }
> > + Â Â Â mddev->barrier =3D NULL;
> > + Â Â Â wake_up(&mddev->sb_wait);
> > +}
> > +
> > Â static void md_end_barrier(struct bio *bio, int err)
> > Â {
> > Â Â Â Â mdk_rdev_t *rdev =3D bio->bi_private;
> > @@ -310,7 +327,7 @@ static void md_end_barrier(struct bio *bio, int=
err)
> > Â Â Â Â Â Â Â Â Â Â =
Â Â wake_up(&mddev->sb_wait);
> > Â Â Â Â Â Â Â Â } else
> > Â Â Â Â Â Â Â Â Â Â =
Â Â /* The pre-request barrier has finished */
> > - Â Â Â Â Â Â Â Â Â =C2=
=A0 Â schedule_work(&mddev->barrier_work);
> > + Â Â Â Â Â Â Â Â Â =C2=
=A0 Â md_barrier_done(mddev);
> > Â Â Â Â }
> > Â Â Â Â bio_put(bio);
> > Â }
> > @@ -350,18 +367,12 @@ static void md_submit_barrier(struct work_str=
uct *ws)
> >
> > Â Â Â Â atomic_set(&mddev->flush_pending, 1);
> >
> > - Â Â Â if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags)=
)
> > - Â Â Â Â Â Â Â bio_endio(bio, -=
EOPNOTSUPP);
> > - Â Â Â else if (bio->bi_size == 0)
> > - Â Â Â Â Â Â Â /* an empty barr=
ier - all done */
> > - Â Â Â Â Â Â Â bio_endio(bio, 0=
);
> > - Â Â Â else {
> > - Â Â Â Â Â Â Â bio->bi_rw &=3D =
~REQ_HARDBARRIER;
> > - Â Â Â Â Â Â Â if (mddev->pers-=
>make_request(mddev, bio))
> > - Â Â Â Â Â Â Â Â Â =C2=
=A0 Â generic_make_request(bio);
> > - Â Â Â Â Â Â Â mddev->barrier =3D=
POST_REQUEST_BARRIER;
> > - Â Â Â Â Â Â Â submit_barriers(=
mddev);
> > - Â Â Â }
> > + Â Â Â bio->bi_rw &=3D ~REQ_HARDBARRIER;
> > + Â Â Â if (mddev->pers->make_request(mddev, bio))
> > + Â Â Â Â Â Â Â generic_make_req=
uest(bio);
> > + Â Â Â mddev->barrier =3D POST_REQUEST_BARRIER;
> > + Â Â Â submit_barriers(mddev);
> > +
> > Â Â Â Â if (atomic_dec_and_test(&mddev->flush_pe=
nding)) {
> > Â Â Â Â Â Â Â Â mddev->barri=
er =3D NULL;
> > Â Â Â Â Â Â Â Â wake_up(&mdd=
ev->sb_wait);
> > @@ -383,7 +394,7 @@ void md_barrier_request(mddev_t *mddev, struct =
bio *bio)
> > Â Â Â Â submit_barriers(mddev);
> >
> > Â Â Â Â if (atomic_dec_and_test(&mddev->flush_pe=
nding))
> > - Â Â Â Â Â Â Â schedule_work(&m=
ddev->barrier_work);
> > + Â Â Â Â Â Â Â md_barrier_done(=
mddev);
> > Â }
> > Â EXPORT_SYMBOL(md_barrier_request);
> >
> >
> >
>=20
> Neil, thanks for the patch. I experienced the lockup for the 5th time
> an hour ago (about 3 hours after the last hard reboot) so I thought i=
t
> would be a good time to try your patch. Unfortunately I'm getting an
> error:
>=20
> patching file drivers/md/md.c
> Hunk #1 succeeded at 291 with fuzz 1 (offset -3 lines).
> Hunk #2 FAILED at 324.
> Hunk #3 FAILED at 364.
> Hunk #4 FAILED at 391.
> 3 out of 4 hunks FAILED -- saving rejects to file drivers/md/md.c.rej

That is odd.
I took the md.c that you posted on the web site, use "patch" to apply m=
y
patch to it, and only Hunk #3 failed.

I used 'wiggle' to apply the patch and it applied perfectly, properly
replacing (1< ound).

Try this version. You will need to be in drivers/md/, or use

patch drivers/md/md.c < this-patch

NeilBrown

--- md.c.orig 2010-09-14 11:29:15.000000000 +1000
+++ md.c 2010-09-14 11:29:50.000000000 +1000
@@ -291,6 +291,23 @@
=20
#define POST_REQUEST_BARRIER ((void*)1)
=20
+static void md_barrier_done(mddev_t *mddev)
+{
+ struct bio *bio =3D mddev->barrier;
+
+ if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
+ bio_endio(bio, -EOPNOTSUPP);
+ else if (bio->bi_size == 0)
+ bio_endio(bio, 0);
+ else {
+ /* other options need to be handled from process context */
+ schedule_work(&mddev->barrier_work);
+ return;
+ }
+ mddev->barrier =3D NULL;
+ wake_up(&mddev->sb_wait);
+}
+
static void md_end_barrier(struct bio *bio, int err)
{
mdk_rdev_t *rdev =3D bio->bi_private;
@@ -307,7 +324,7 @@
wake_up(&mddev->sb_wait);
} else
/* The pre-request barrier has finished */
- schedule_work(&mddev->barrier_work);
+ md_barrier_done(mddev);
}
bio_put(bio);
}
@@ -347,18 +364,12 @@
=20
atomic_set(&mddev->flush_pending, 1);
=20
- if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
- bio_endio(bio, -EOPNOTSUPP);
- else if (bio->bi_size == 0)
- /* an empty barrier - all done */
- bio_endio(bio, 0);
- else {
- bio->bi_rw &=3D ~(1< - if (mddev->pers->make_request(mddev, bio))
- generic_make_request(bio);
- mddev->barrier =3D POST_REQUEST_BARRIER;
- submit_barriers(mddev);
- }
+ bio->bi_rw &=3D ~(1< + if (mddev->pers->make_request(mddev, bio))
+ generic_make_request(bio);
+ mddev->barrier =3D POST_REQUEST_BARRIER;
+ submit_barriers(mddev);
+
if (atomic_dec_and_test(&mddev->flush_pending)) {
mddev->barrier =3D NULL;
wake_up(&mddev->sb_wait);
@@ -380,7 +391,7 @@
submit_barriers(mddev);
=20
if (atomic_dec_and_test(&mddev->flush_pending))
- schedule_work(&mddev->barrier_work);
+ md_barrier_done(mddev);
}
EXPORT_SYMBOL(md_barrier_request);
=20
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: New RAID causing system lockups

am 14.09.2010 04:50:10 von Mike Hartman

>>
>> > Hi Mike,
>> > =A0thanks for the updates.
>> >
>> > I'm not entirely clear what is happening (in fact, due to a cold t=
hat I am
>> > still fighting off, nothing is entirely clear at the moment), but =
it looks
>> > very likely that the problem is due to an interplay between barrie=
r handling,
>> > and the multi-level structure of your array (a raid0 being a membe=
r of a
>> > raid5).
>> >
>> > When a barrier request is processed, both arrays will schedule 'wo=
rk' to be
>> > done by the 'event' thread and I'm guess that you can get into a s=
ituation
>> > where one work time is wait for the other, but the other is behind=
the one on
>> > the single queue (I wonder if that make sense...)
>> >
>> > Anyway, this patch might make a difference, =A0It reduced the numb=
er of work
>> > items schedule in a way that could conceivably fix the problem.
>> >
>> > If you can test this, please report the results. =A0I cannot easil=
y reproduce
>> > the problem so there is limited testing that I can do.
>> >
>> > Thanks,
>> > NeilBrown
>> >
>> >
>> > diff --git a/drivers/md/md.c b/drivers/md/md.c
>> > index f20d13e..7f2785c 100644
>> > --- a/drivers/md/md.c
>> > +++ b/drivers/md/md.c
>> > @@ -294,6 +294,23 @@ EXPORT_SYMBOL(mddev_congested);
>> >
>> > =A0#define POST_REQUEST_BARRIER ((void*)1)
>> >
>> > +static void md_barrier_done(mddev_t *mddev)
>> > +{
>> > + =A0 =A0 =A0 struct bio *bio =3D mddev->barrier;
>> > +
>> > + =A0 =A0 =A0 if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, -EOPNOTSUPP);
>> > + =A0 =A0 =A0 else if (bio->bi_size == 0)
>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, 0);
>> > + =A0 =A0 =A0 else {
>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* other options need to be handled =
from process context */
>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 schedule_work(&mddev->barrier_work);
>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
>> > + =A0 =A0 =A0 }
>> > + =A0 =A0 =A0 mddev->barrier =3D NULL;
>> > + =A0 =A0 =A0 wake_up(&mddev->sb_wait);
>> > +}
>> > +
>> > =A0static void md_end_barrier(struct bio *bio, int err)
>> > =A0{
>> > =A0 =A0 =A0 =A0mdk_rdev_t *rdev =3D bio->bi_private;
>> > @@ -310,7 +327,7 @@ static void md_end_barrier(struct bio *bio, in=
t err)
>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0wake_up(&mddev->sb_=
wait);
>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} else
>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* The pre-request =
barrier has finished */
>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 schedule_work(&mddev=
->barrier_work);
>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 md_barrier_done(mdde=
v);
>> > =A0 =A0 =A0 =A0}
>> > =A0 =A0 =A0 =A0bio_put(bio);
>> > =A0}
>> > @@ -350,18 +367,12 @@ static void md_submit_barrier(struct work_st=
ruct *ws)
>> >
>> > =A0 =A0 =A0 =A0atomic_set(&mddev->flush_pending, 1);
>> >
>> > - =A0 =A0 =A0 if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, -EOPNOTSUPP);
>> > - =A0 =A0 =A0 else if (bio->bi_size == 0)
>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* an empty barrier - all done */
>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, 0);
>> > - =A0 =A0 =A0 else {
>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio->bi_rw &=3D ~REQ_HARDBARRIER;
>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (mddev->pers->make_request(mddev,=
bio))
>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 generic_make_request=
(bio);
>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 mddev->barrier =3D POST_REQUEST_BARR=
IER;
>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 submit_barriers(mddev);
>> > - =A0 =A0 =A0 }
>> > + =A0 =A0 =A0 bio->bi_rw &=3D ~REQ_HARDBARRIER;
>> > + =A0 =A0 =A0 if (mddev->pers->make_request(mddev, bio))
>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 generic_make_request(bio);
>> > + =A0 =A0 =A0 mddev->barrier =3D POST_REQUEST_BARRIER;
>> > + =A0 =A0 =A0 submit_barriers(mddev);
>> > +
>> > =A0 =A0 =A0 =A0if (atomic_dec_and_test(&mddev->flush_pending)) {
>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mddev->barrier =3D NULL;
>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0wake_up(&mddev->sb_wait);
>> > @@ -383,7 +394,7 @@ void md_barrier_request(mddev_t *mddev, struct=
bio *bio)
>> > =A0 =A0 =A0 =A0submit_barriers(mddev);
>> >
>> > =A0 =A0 =A0 =A0if (atomic_dec_and_test(&mddev->flush_pending))
>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 schedule_work(&mddev->barrier_work);
>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 md_barrier_done(mddev);
>> > =A0}
>> > =A0EXPORT_SYMBOL(md_barrier_request);
>> >
>> >
>> >
>>
>> Neil, thanks for the patch. I experienced the lockup for the 5th tim=
e
>> an hour ago (about 3 hours after the last hard reboot) so I thought =
it
>> would be a good time to try your patch. Unfortunately I'm getting an
>> error:
>>
>> patching file drivers/md/md.c
>> Hunk #1 succeeded at 291 with fuzz 1 (offset -3 lines).
>> Hunk #2 FAILED at 324.
>> Hunk #3 FAILED at 364.
>> Hunk #4 FAILED at 391.
>> 3 out of 4 hunks FAILED -- saving rejects to file drivers/md/md.c.re=
j
>
> That is odd.
> I took the md.c that you posted on the web site, use "patch" to apply=
my
> patch to it, and only Hunk #3 failed.
>
> I used 'wiggle' to apply the patch and it applied perfectly, properly
> replacing (1< around).
>
> Try this version. =A0You will need to be in drivers/md/, or use
>
> =A0patch drivers/md/md.c < this-patch
>
>
> NeilBrown
>
> --- md.c.orig =A0 2010-09-14 11:29:15.000000000 +1000
> +++ md.c =A0 =A0 =A0 =A02010-09-14 11:29:50.000000000 +1000
> @@ -291,6 +291,23 @@
>
> =A0#define POST_REQUEST_BARRIER ((void*)1)
>
> +static void md_barrier_done(mddev_t *mddev)
> +{
> + =A0 =A0 =A0 struct bio *bio =3D mddev->barrier;
> +
> + =A0 =A0 =A0 if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, -EOPNOTSUPP);
> + =A0 =A0 =A0 else if (bio->bi_size == 0)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, 0);
> + =A0 =A0 =A0 else {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* other options need to be handled fro=
m process context */
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 schedule_work(&mddev->barrier_work);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
> + =A0 =A0 =A0 }
> + =A0 =A0 =A0 mddev->barrier =3D NULL;
> + =A0 =A0 =A0 wake_up(&mddev->sb_wait);
> +}
> +
> =A0static void md_end_barrier(struct bio *bio, int err)
> =A0{
> =A0 =A0 =A0 =A0mdk_rdev_t *rdev =3D bio->bi_private;
> @@ -307,7 +324,7 @@
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0wake_up(&mddev->sb_wai=
t);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} else
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* The pre-request bar=
rier has finished */
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 schedule_work(&mddev->b=
arrier_work);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 md_barrier_done(mddev);
> =A0 =A0 =A0 =A0}
> =A0 =A0 =A0 =A0bio_put(bio);
> =A0}
> @@ -347,18 +364,12 @@
>
> =A0 =A0 =A0 =A0atomic_set(&mddev->flush_pending, 1);
>
> - =A0 =A0 =A0 if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, -EOPNOTSUPP);
> - =A0 =A0 =A0 else if (bio->bi_size == 0)
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* an empty barrier - all done */
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, 0);
> - =A0 =A0 =A0 else {
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio->bi_rw &=3D ~(1< > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (mddev->pers->make_request(mddev, bi=
o))
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 generic_make_request(bi=
o);
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 mddev->barrier =3D POST_REQUEST_BARRIER=
;
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 submit_barriers(mddev);
> - =A0 =A0 =A0 }
> + =A0 =A0 =A0 bio->bi_rw &=3D ~(1< > + =A0 =A0 =A0 if (mddev->pers->make_request(mddev, bio))
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 generic_make_request(bio);
> + =A0 =A0 =A0 mddev->barrier =3D POST_REQUEST_BARRIER;
> + =A0 =A0 =A0 submit_barriers(mddev);
> +
> =A0 =A0 =A0 =A0if (atomic_dec_and_test(&mddev->flush_pending)) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mddev->barrier =3D NULL;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0wake_up(&mddev->sb_wait);
> @@ -380,7 +391,7 @@
> =A0 =A0 =A0 =A0submit_barriers(mddev);
>
> =A0 =A0 =A0 =A0if (atomic_dec_and_test(&mddev->flush_pending))
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 schedule_work(&mddev->barrier_work);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 md_barrier_done(mddev);
> =A0}
> =A0EXPORT_SYMBOL(md_barrier_request);
>
>

Sorry about that Neil, it was my fault. I copied your patch out of the
email and I think it picked up some unintended characters. I tried
copying it from the mailing list archive website instead and it
patched in fine. The kernel compiled with no trouble and I'm booted
into it now. No unexpected side-effects yet. I'll continue with my
copying and we'll see if it locks up again. Thanks for all your help!

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: New RAID causing system lockups

am 14.09.2010 05:35:50 von Mike Hartman

>>>
>>> > Hi Mike,
>>> > =A0thanks for the updates.
>>> >
>>> > I'm not entirely clear what is happening (in fact, due to a cold =
that I am
>>> > still fighting off, nothing is entirely clear at the moment), but=
it looks
>>> > very likely that the problem is due to an interplay between barri=
er handling,
>>> > and the multi-level structure of your array (a raid0 being a memb=
er of a
>>> > raid5).
>>> >
>>> > When a barrier request is processed, both arrays will schedule 'w=
ork' to be
>>> > done by the 'event' thread and I'm guess that you can get into a =
situation
>>> > where one work time is wait for the other, but the other is behin=
d the one on
>>> > the single queue (I wonder if that make sense...)
>>> >
>>> > Anyway, this patch might make a difference, =A0It reduced the num=
ber of work
>>> > items schedule in a way that could conceivably fix the problem.
>>> >
>>> > If you can test this, please report the results. =A0I cannot easi=
ly reproduce
>>> > the problem so there is limited testing that I can do.
>>> >
>>> > Thanks,
>>> > NeilBrown
>>> >
>>> >
>>> > diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> > index f20d13e..7f2785c 100644
>>> > --- a/drivers/md/md.c
>>> > +++ b/drivers/md/md.c
>>> > @@ -294,6 +294,23 @@ EXPORT_SYMBOL(mddev_congested);
>>> >
>>> > =A0#define POST_REQUEST_BARRIER ((void*)1)
>>> >
>>> > +static void md_barrier_done(mddev_t *mddev)
>>> > +{
>>> > + =A0 =A0 =A0 struct bio *bio =3D mddev->barrier;
>>> > +
>>> > + =A0 =A0 =A0 if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, -EOPNOTSUPP);
>>> > + =A0 =A0 =A0 else if (bio->bi_size == 0)
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, 0);
>>> > + =A0 =A0 =A0 else {
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* other options need to be handled=
from process context */
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 schedule_work(&mddev->barrier_work)=
;
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
>>> > + =A0 =A0 =A0 }
>>> > + =A0 =A0 =A0 mddev->barrier =3D NULL;
>>> > + =A0 =A0 =A0 wake_up(&mddev->sb_wait);
>>> > +}
>>> > +
>>> > =A0static void md_end_barrier(struct bio *bio, int err)
>>> > =A0{
>>> > =A0 =A0 =A0 =A0mdk_rdev_t *rdev =3D bio->bi_private;
>>> > @@ -310,7 +327,7 @@ static void md_end_barrier(struct bio *bio, i=
nt err)
>>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0wake_up(&mddev->sb=
_wait);
>>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} else
>>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* The pre-request=
barrier has finished */
>>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 schedule_work(&mdde=
v->barrier_work);
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 md_barrier_done(mdd=
ev);
>>> > =A0 =A0 =A0 =A0}
>>> > =A0 =A0 =A0 =A0bio_put(bio);
>>> > =A0}
>>> > @@ -350,18 +367,12 @@ static void md_submit_barrier(struct work_s=
truct *ws)
>>> >
>>> > =A0 =A0 =A0 =A0atomic_set(&mddev->flush_pending, 1);
>>> >
>>> > - =A0 =A0 =A0 if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
>>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, -EOPNOTSUPP);
>>> > - =A0 =A0 =A0 else if (bio->bi_size == 0)
>>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* an empty barrier - all done */
>>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, 0);
>>> > - =A0 =A0 =A0 else {
>>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio->bi_rw &=3D ~REQ_HARDBARRIER;
>>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (mddev->pers->make_request(mddev=
, bio))
>>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 generic_make_reques=
t(bio);
>>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 mddev->barrier =3D POST_REQUEST_BAR=
RIER;
>>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 submit_barriers(mddev);
>>> > - =A0 =A0 =A0 }
>>> > + =A0 =A0 =A0 bio->bi_rw &=3D ~REQ_HARDBARRIER;
>>> > + =A0 =A0 =A0 if (mddev->pers->make_request(mddev, bio))
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 generic_make_request(bio);
>>> > + =A0 =A0 =A0 mddev->barrier =3D POST_REQUEST_BARRIER;
>>> > + =A0 =A0 =A0 submit_barriers(mddev);
>>> > +
>>> > =A0 =A0 =A0 =A0if (atomic_dec_and_test(&mddev->flush_pending)) {
>>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mddev->barrier =3D NULL;
>>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0wake_up(&mddev->sb_wait);
>>> > @@ -383,7 +394,7 @@ void md_barrier_request(mddev_t *mddev, struc=
t bio *bio)
>>> > =A0 =A0 =A0 =A0submit_barriers(mddev);
>>> >
>>> > =A0 =A0 =A0 =A0if (atomic_dec_and_test(&mddev->flush_pending))
>>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 schedule_work(&mddev->barrier_work)=
;
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 md_barrier_done(mddev);
>>> > =A0}
>>> > =A0EXPORT_SYMBOL(md_barrier_request);
>>> >
>>> >
>>> >
>>>
>>> Neil, thanks for the patch. I experienced the lockup for the 5th ti=
me
>>> an hour ago (about 3 hours after the last hard reboot) so I thought=
it
>>> would be a good time to try your patch. Unfortunately I'm getting a=
n
>>> error:
>>>
>>> patching file drivers/md/md.c
>>> Hunk #1 succeeded at 291 with fuzz 1 (offset -3 lines).
>>> Hunk #2 FAILED at 324.
>>> Hunk #3 FAILED at 364.
>>> Hunk #4 FAILED at 391.
>>> 3 out of 4 hunks FAILED -- saving rejects to file drivers/md/md.c.r=
ej
>>
>> That is odd.
>> I took the md.c that you posted on the web site, use "patch" to appl=
y my
>> patch to it, and only Hunk #3 failed.
>>
>> I used 'wiggle' to apply the patch and it applied perfectly, properl=
y
>> replacing (1< around).
>>
>> Try this version. =A0You will need to be in drivers/md/, or use
>>
>> =A0patch drivers/md/md.c < this-patch
>>
>>
>> NeilBrown
>>
>> --- md.c.orig =A0 2010-09-14 11:29:15.000000000 +1000
>> +++ md.c =A0 =A0 =A0 =A02010-09-14 11:29:50.000000000 +1000
>> @@ -291,6 +291,23 @@
>>
>> =A0#define POST_REQUEST_BARRIER ((void*)1)
>>
>> +static void md_barrier_done(mddev_t *mddev)
>> +{
>> + =A0 =A0 =A0 struct bio *bio =3D mddev->barrier;
>> +
>> + =A0 =A0 =A0 if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, -EOPNOTSUPP);
>> + =A0 =A0 =A0 else if (bio->bi_size == 0)
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, 0);
>> + =A0 =A0 =A0 else {
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* other options need to be handled fr=
om process context */
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 schedule_work(&mddev->barrier_work);
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
>> + =A0 =A0 =A0 }
>> + =A0 =A0 =A0 mddev->barrier =3D NULL;
>> + =A0 =A0 =A0 wake_up(&mddev->sb_wait);
>> +}
>> +
>> =A0static void md_end_barrier(struct bio *bio, int err)
>> =A0{
>> =A0 =A0 =A0 =A0mdk_rdev_t *rdev =3D bio->bi_private;
>> @@ -307,7 +324,7 @@
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0wake_up(&mddev->sb_wa=
it);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} else
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* The pre-request ba=
rrier has finished */
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 schedule_work(&mddev->=
barrier_work);
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 md_barrier_done(mddev)=
;
>> =A0 =A0 =A0 =A0}
>> =A0 =A0 =A0 =A0bio_put(bio);
>> =A0}
>> @@ -347,18 +364,12 @@
>>
>> =A0 =A0 =A0 =A0atomic_set(&mddev->flush_pending, 1);
>>
>> - =A0 =A0 =A0 if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, -EOPNOTSUPP);
>> - =A0 =A0 =A0 else if (bio->bi_size == 0)
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* an empty barrier - all done */
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio_endio(bio, 0);
>> - =A0 =A0 =A0 else {
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 bio->bi_rw &=3D ~(1< >> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (mddev->pers->make_request(mddev, b=
io))
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 generic_make_request(b=
io);
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 mddev->barrier =3D POST_REQUEST_BARRIE=
R;
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 submit_barriers(mddev);
>> - =A0 =A0 =A0 }
>> + =A0 =A0 =A0 bio->bi_rw &=3D ~(1< >> + =A0 =A0 =A0 if (mddev->pers->make_request(mddev, bio))
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 generic_make_request(bio);
>> + =A0 =A0 =A0 mddev->barrier =3D POST_REQUEST_BARRIER;
>> + =A0 =A0 =A0 submit_barriers(mddev);
>> +
>> =A0 =A0 =A0 =A0if (atomic_dec_and_test(&mddev->flush_pending)) {
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mddev->barrier =3D NULL;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0wake_up(&mddev->sb_wait);
>> @@ -380,7 +391,7 @@
>> =A0 =A0 =A0 =A0submit_barriers(mddev);
>>
>> =A0 =A0 =A0 =A0if (atomic_dec_and_test(&mddev->flush_pending))
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 schedule_work(&mddev->barrier_work);
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 md_barrier_done(mddev);
>> =A0}
>> =A0EXPORT_SYMBOL(md_barrier_request);
>>
>>
>
> Sorry about that Neil, it was my fault. I copied your patch out of th=
e
> email and I think it picked up some unintended characters. I tried
> copying it from the mailing list archive website instead and it
> patched in fine. The kernel compiled with no trouble and I'm booted
> into it now. No unexpected side-effects yet. I'll continue with my
> copying and we'll see if it locks up again. Thanks for all your help!
>
> Mike
>

Sorry Neil, locked up again in less than an hour. I've uploaded
dmesg_5.txt in case the trace shows something different/useful in
light of your patch.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: New RAID causing system lockups

am 14.09.2010 05:48:37 von NeilBrown

On Mon, 13 Sep 2010 23:35:50 -0400
Mike Hartman wrote:

> Sorry Neil, locked up again in less than an hour. I've uploaded
> dmesg_5.txt in case the trace shows something different/useful in
> light of your patch.
>
> Mike

Hmmm..
Can you try mounting with
-o barrier=0

just to see if my theory is at all correct?

Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: New RAID causing system lockups

am 15.09.2010 23:49:44 von Mike Hartman

>> Hmmm..
>> =A0Can you try mounting with
>> =A0 =A0-o barrier=3D0
>>
>> just to see if my theory is at all correct?
>>
>> Thanks,
>> NeilBrown
>>
>

Progress report:

I made the barrier change shortly after sending my last message (about
40 hours ago). With that in place, I was able to finish emptying one
of the non-assimilated drives onto the array, after which I added that
drive as a hot spare and started the process to grow the array onto it
- the same procedure I've been applying since I created the RAID the
other week. No problems so far, and the reshape is at 46%.

It's hard to be positive that the barrier deactivation is responsible
yet though - while the last few lockups have only been 1-16 hours
apart, I believe the first two had at least 2 or 3 days between them.
I'll keep the array busy to enhance the chances of a lockup though -
each one so far has been during a reshape or a large batch of writing
to the array's partition. If I make it another couple days (meaning
time for this reshape to complete, another drive to be emptied onto
the array, and another reshape at least started) I'll be pretty
confident the problem has been identified.

Assuming the barrier is the culprit (and I'm pretty sure you're right)
what are the consequences of just leaving it off? I gather the idea of
the barrier is to prevent journal corruption in the event of a power
failure or other sudden shutdown, which seems pretty important, but it
also doesn't seem like it was enabled by default in ext3/4 until 2008,
which makes it seem less critical.

Even if the ultimate solution for me is to just leave it disabled I'm
happy to keep trying patches if you want to get it properly fixed in
md. We may have to come up with an alternate way to work the array
hard enough to trigger the lockups though - my last 1.5TB drive is
what's being merged in now. After that completes I only have one more
pair of 750GBs (that will have to be shoehorned in using RAID0 again).
I do have a single 750GB left over, so I'll probably find a mate for
it and get it added to. After that we're maxed out on hardware for a
while.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: New RAID causing system lockups

am 21.09.2010 04:26:44 von NeilBrown

On Wed, 15 Sep 2010 17:49:44 -0400
Mike Hartman wrote:

> >> Hmmm..
> >> Â Can you try mounting with
> >> Â Â -o barrier=3D0
> >>
> >> just to see if my theory is at all correct?
> >>
> >> Thanks,
> >> NeilBrown
> >>
> >
>=20
> Progress report:
>=20
> I made the barrier change shortly after sending my last message (abou=
t
> 40 hours ago). With that in place, I was able to finish emptying one
> of the non-assimilated drives onto the array, after which I added tha=
t
> drive as a hot spare and started the process to grow the array onto i=
t
> - the same procedure I've been applying since I created the RAID the
> other week. No problems so far, and the reshape is at 46%.
>=20
> It's hard to be positive that the barrier deactivation is responsible
> yet though - while the last few lockups have only been 1-16 hours
> apart, I believe the first two had at least 2 or 3 days between them.
> I'll keep the array busy to enhance the chances of a lockup though -
> each one so far has been during a reshape or a large batch of writing
> to the array's partition. If I make it another couple days (meaning
> time for this reshape to complete, another drive to be emptied onto
> the array, and another reshape at least started) I'll be pretty
> confident the problem has been identified.

Thanks for the update.

>=20
> Assuming the barrier is the culprit (and I'm pretty sure you're right=
)
> what are the consequences of just leaving it off? I gather the idea o=
f
> the barrier is to prevent journal corruption in the event of a power
> failure or other sudden shutdown, which seems pretty important, but i=
t
> also doesn't seem like it was enabled by default in ext3/4 until 2008=
,
> which makes it seem less critical.

Correct. Without the barriers the chance of corruption during powerfai=
l is
higher. I don't really know how much higher, it depends a lot on the
filesystem design and the particular implementation. I think ext4 tend=
s to
be fairly safe - after all some devices don't support barriers and it h=
as to
do best-effort on those too.

>=20
> Even if the ultimate solution for me is to just leave it disabled I'm
> happy to keep trying patches if you want to get it properly fixed in
> md. We may have to come up with an alternate way to work the array
> hard enough to trigger the lockups though - my last 1.5TB drive is
> what's being merged in now. After that completes I only have one more
> pair of 750GBs (that will have to be shoehorned in using RAID0 again)=

> I do have a single 750GB left over, so I'll probably find a mate for
> it and get it added to. After that we're maxed out on hardware for a
> while.
>=20
> Mike

I'll stare at the code a bit more and see if anything jumps out at me.

Thanks,
NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: New RAID causing system lockups

am 21.09.2010 13:28:39 von Mike Hartman

On Mon, Sep 20, 2010 at 10:26 PM, Neil Brown wrote:
> On Wed, 15 Sep 2010 17:49:44 -0400
> Mike Hartman wrote:
>
>> >> Hmmm..
>> >> =A0Can you try mounting with
>> >> =A0 =A0-o barrier=3D0
>> >>
>> >> just to see if my theory is at all correct?
>> >>
>> >> Thanks,
>> >> NeilBrown
>> >>
>> >
>>
>> Progress report:
>>
>> I made the barrier change shortly after sending my last message (abo=
ut
>> 40 hours ago). With that in place, I was able to finish emptying one
>> of the non-assimilated drives onto the array, after which I added th=
at
>> drive as a hot spare and started the process to grow the array onto =
it
>> - the same procedure I've been applying since I created the RAID the
>> other week. No problems so far, and the reshape is at 46%.
>>
>> It's hard to be positive that the barrier deactivation is responsibl=
e
>> yet though - while the last few lockups have only been 1-16 hours
>> apart, I believe the first two had at least 2 or 3 days between them=

>> I'll keep the array busy to enhance the chances of a lockup though -
>> each one so far has been during a reshape or a large batch of writin=
g
>> to the array's partition. If I make it another couple days (meaning
>> time for this reshape to complete, another drive to be emptied onto
>> the array, and another reshape at least started) I'll be pretty
>> confident the problem has been identified.
>
> Thanks for the update.
>
>>
>> Assuming the barrier is the culprit (and I'm pretty sure you're righ=
t)
>> what are the consequences of just leaving it off? I gather the idea =
of
>> the barrier is to prevent journal corruption in the event of a power
>> failure or other sudden shutdown, which seems pretty important, but =
it
>> also doesn't seem like it was enabled by default in ext3/4 until 200=
8,
>> which makes it seem less critical.
>
> Correct. =A0Without the barriers the chance of corruption during powe=
rfail is
> higher. =A0I don't really know how much higher, it depends a lot on t=
he
> filesystem design and the particular implementation. =A0I think ext4 =
tends to
> be fairly safe - after all some devices don't support barriers and it=
has to
> do best-effort on those too.
>
>>
>> Even if the ultimate solution for me is to just leave it disabled I'=
m
>> happy to keep trying patches if you want to get it properly fixed in
>> md. We may have to come up with an alternate way to work the array
>> hard enough to trigger the lockups though - my last 1.5TB drive is
>> what's being merged in now. After that completes I only have one mor=
e
>> pair of 750GBs (that will have to be shoehorned in using RAID0 again=
).
>> I do have a single 750GB left over, so I'll probably find a mate for
>> it and get it added to. After that we're maxed out on hardware for a
>> while.
>>
>> Mike
>
> I'll stare at the code a bit more and see if anything jumps out at me=

>
> Thanks,
> NeilBrown
>
>

I've just finished my last grow-and-copy with no problems. The only
drive that's not part of the array now is the leftover 750GB, which is
now empty. I haven't experienced any further lockups so your barrier
diagnosis seems to be spot on. I'm planning to just leave that option
turned off, but as I said, I'm happy to test any patches you come up
with. Thanks for all your help.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html