Problem regarding RAID10 on kernel 2.6.31

Problem regarding RAID10 on kernel 2.6.31

am 06.08.2010 11:41:58 von ravichandra

Hi everyone,
I used 2 (1 TB disks) disks each with 3
partitions(sda[1-3] and sdb[1-3]).Using sda[1-2] and sdb[1-2] i have
created a RAID10 array say md2. Then i was reading and writing to the
array and simultaneously removing a disk and adding it to the same
array. In the process i got a hang causing recovery process to halt. The
array was not operational after.These were done on kernel 2.6.31.

I am working on the RAID10 for the first time. Can someone
help in this so that i can proceed further??

Thanks in advance.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Problem regarding RAID10 on kernel 2.6.31

am 06.08.2010 12:14:35 von NeilBrown

On Fri, 06 Aug 2010 15:11:58 +0530
ravichandra wrote:

> Hi everyone,
> I used 2 (1 TB disks) disks each with 3
> partitions(sda[1-3] and sdb[1-3]).Using sda[1-2] and sdb[1-2] i have
> created a RAID10 array say md2. Then i was reading and writing to the
> array and simultaneously removing a disk and adding it to the same
> array. In the process i got a hang causing recovery process to halt. The
> array was not operational after.These were done on kernel 2.6.31.
>
> I am working on the RAID10 for the first time. Can someone
> help in this so that i can proceed further??
>
> Thanks in advance.

Known problem. I'll be submitting the fix upstream shortly. I include it
below.
Thanks for the report
NeilBrown


>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html



diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 42e64e4..d1d6891 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -825,11 +825,29 @@ static int make_request(mddev_t *mddev, struct bio * bio)
*/
bp = bio_split(bio,
chunk_sects - (bio->bi_sector & (chunk_sects - 1)) );
+
+ /* Each of these 'make_request' calls will call 'wait_barrier'.
+ * If the first succeeds but the second blocks due to the resync
+ * thread raising the barrier, we will deadlock because the
+ * IO to the underlying device will be queued in generic_make_request
+ * and will never complete, so will never reduce nr_pending.
+ * So increment nr_waiting here so no new raise_barriers will
+ * succeed, and so the second wait_barrier cannot block.
+ */
+ spin_lock_irq(&conf->resync_lock);
+ conf->nr_waiting++;
+ spin_unlock_irq(&conf->resync_lock);
+
if (make_request(mddev, &bp->bio1))
generic_make_request(&bp->bio1);
if (make_request(mddev, &bp->bio2))
generic_make_request(&bp->bio2);

+ spin_lock_irq(&conf->resync_lock);
+ conf->nr_waiting--;
+ wake_up(&conf->wait_barrier);
+ spin_unlock_irq(&conf->resync_lock);
+
bio_pair_release(bp);
return 0;
bad_map:
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Problem regarding RAID10 on kernel 2.6.31

am 09.08.2010 09:39:56 von ravichandra

Hi,
Thanks.The patch you have sent is working.There is no hanging up
after the patch is applied.can you elaborate on the problem which was
there earlier??

Thanks and Regards.

On Fri, 2010-08-06 at 20:14 +1000, Neil Brown wrote:
> On Fri, 06 Aug 2010 15:11:58 +0530
> ravichandra wrote:
>
> > Hi everyone,
> > I used 2 (1 TB disks) disks each with 3
> > partitions(sda[1-3] and sdb[1-3]).Using sda[1-2] and sdb[1-2] i have
> > created a RAID10 array say md2. Then i was reading and writing to the
> > array and simultaneously removing a disk and adding it to the same
> > array. In the process i got a hang causing recovery process to halt. The
> > array was not operational after.These were done on kernel 2.6.31.
> >
> > I am working on the RAID10 for the first time. Can someone
> > help in this so that i can proceed further??
> >
> > Thanks in advance.
>
> Known problem. I'll be submitting the fix upstream shortly. I include it
> below.
> Thanks for the report
> NeilBrown
>
>
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 42e64e4..d1d6891 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -825,11 +825,29 @@ static int make_request(mddev_t *mddev, struct bio * bio)
> */
> bp = bio_split(bio,
> chunk_sects - (bio->bi_sector & (chunk_sects - 1)) );
> +
> + /* Each of these 'make_request' calls will call 'wait_barrier'.
> + * If the first succeeds but the second blocks due to the resync
> + * thread raising the barrier, we will deadlock because the
> + * IO to the underlying device will be queued in generic_make_request
> + * and will never complete, so will never reduce nr_pending.
> + * So increment nr_waiting here so no new raise_barriers will
> + * succeed, and so the second wait_barrier cannot block.
> + */
> + spin_lock_irq(&conf->resync_lock);
> + conf->nr_waiting++;
> + spin_unlock_irq(&conf->resync_lock);
> +
> if (make_request(mddev, &bp->bio1))
> generic_make_request(&bp->bio1);
> if (make_request(mddev, &bp->bio2))
> generic_make_request(&bp->bio2);
>
> + spin_lock_irq(&conf->resync_lock);
> + conf->nr_waiting--;
> + wake_up(&conf->wait_barrier);
> + spin_unlock_irq(&conf->resync_lock);
> +
> bio_pair_release(bp);
> return 0;
> bad_map:


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Problem regarding RAID10 on kernel 2.6.31

am 09.08.2010 10:10:42 von NeilBrown

On Mon, 09 Aug 2010 13:09:56 +0530
ravichandra wrote:

> Hi,
> Thanks.The patch you have sent is working.There is no hanging up
> after the patch is applied.can you elaborate on the problem which was
> there earlier??
>

It's .... complicated.

An important fact is that generic_make_request queues recursive requests
rather than issuing them immediately. This avoids excessive stack usage with
stacked block devices.

So in the case where a read crosses a chunk boundary, raid10:make_request
issues two separate generic_make_request calls to two different devices, each
preceded by a wait_barrier call (Which is cancelled with allow_barrer() when
the request completes).
The first is queued and will not be issued until the second is also queued and
the raid10:make_request call completes.

The wait_barrier call increments nr_pending.
If the resync/recovery thread tries to 'raise_barrier' between these calls,
it will find nr_pending set and will wait with ->barrier incremented so when
the next wait_barrier is attempted, is will block - forever.

If generic_make_request didn't queue things, the first request would
complete, nr_pending would decrement, resync would proceed with a single
request, then the second wait_barrier would complete and the second request
could be submitted.

The fix was to elevate conf->nr_waiting for the duration of both submissions
so raise_barrier holds off setting ->barrier until both submissions are
complete.

Hope that makes sense.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: Problem regarding RAID10 on kernel 2.6.31

am 18.10.2010 23:23:51 von Hari Subramanian

Is this bug found in the raid1 personality driver as well?

Thanks
~ Hari


-----Original Message-----
From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Neil Brown
Sent: Monday, August 09, 2010 4:11 AM
To: ravichandra
Cc: linux-raid@vger.kernel.org
Subject: Re: Problem regarding RAID10 on kernel 2.6.31

On Mon, 09 Aug 2010 13:09:56 +0530
ravichandra wrote:

> Hi,
> Thanks.The patch you have sent is working.There is no hanging up
> after the patch is applied.can you elaborate on the problem which was
> there earlier??
>

It's .... complicated.

An important fact is that generic_make_request queues recursive requests
rather than issuing them immediately. This avoids excessive stack usage with
stacked block devices.

So in the case where a read crosses a chunk boundary, raid10:make_request
issues two separate generic_make_request calls to two different devices, each
preceded by a wait_barrier call (Which is cancelled with allow_barrer() when
the request completes).
The first is queued and will not be issued until the second is also queued and
the raid10:make_request call completes.

The wait_barrier call increments nr_pending.
If the resync/recovery thread tries to 'raise_barrier' between these calls,
it will find nr_pending set and will wait with ->barrier incremented so when
the next wait_barrier is attempted, is will block - forever.

If generic_make_request didn't queue things, the first request would
complete, nr_pending would decrement, resync would proceed with a single
request, then the second wait_barrier would complete and the second request
could be submitted.

The fix was to elevate conf->nr_waiting for the duration of both submissions
so raise_barrier holds off setting ->barrier until both submissions are
complete.

Hope that makes sense.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Problem regarding RAID10 on kernel 2.6.31

am 19.10.2010 00:38:44 von NeilBrown

On Mon, 18 Oct 2010 14:23:51 -0700
Hari Subramanian wrote:

> Is this bug found in the raid1 personality driver as well?

No.
It is possible there are other problems in raid1 which I am investigating at
the moment. But this bug isn't in raid1.

NeilBrown


>
> Thanks
> ~ Hari
>
>
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Neil Brown
> Sent: Monday, August 09, 2010 4:11 AM
> To: ravichandra
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Problem regarding RAID10 on kernel 2.6.31
>
> On Mon, 09 Aug 2010 13:09:56 +0530
> ravichandra wrote:
>
> > Hi,
> > Thanks.The patch you have sent is working.There is no hanging up
> > after the patch is applied.can you elaborate on the problem which was
> > there earlier??
> >
>
> It's .... complicated.
>
> An important fact is that generic_make_request queues recursive requests
> rather than issuing them immediately. This avoids excessive stack usage with
> stacked block devices.
>
> So in the case where a read crosses a chunk boundary, raid10:make_request
> issues two separate generic_make_request calls to two different devices, each
> preceded by a wait_barrier call (Which is cancelled with allow_barrer() when
> the request completes).
> The first is queued and will not be issued until the second is also queued and
> the raid10:make_request call completes.
>
> The wait_barrier call increments nr_pending.
> If the resync/recovery thread tries to 'raise_barrier' between these calls,
> it will find nr_pending set and will wait with ->barrier incremented so when
> the next wait_barrier is attempted, is will block - forever.
>
> If generic_make_request didn't queue things, the first request would
> complete, nr_pending would decrement, resync would proceed with a single
> request, then the second wait_barrier would complete and the second request
> could be submitted.
>
> The fix was to elevate conf->nr_waiting for the duration of both submissions
> so raise_barrier holds off setting ->barrier until both submissions are
> complete.
>
> Hope that makes sense.
>
> NeilBrown
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html