Re: [PATCH 05/10] block: remove per-queue plugging

Re: [PATCH 05/10] block: remove per-queue plugging

am 11.04.2011 13:55:21 von NeilBrown

On Mon, 11 Apr 2011 20:59:28 +1000 NeilBrown wrote:

> On Mon, 11 Apr 2011 11:19:58 +0200 Jens Axboe wrote:
>
> > On 2011-04-11 06:50, NeilBrown wrote:
>
> > > The only explanation I can come up with is that very occasionally schedule on
> > > 2 separate cpus calls blk_flush_plug for the same task. I don't understand
> > > the scheduler nearly well enough to know if or how that can happen.
> > > However with this patch in place I can write to a RAID1 constantly for half
> > > an hour, and without it, the write rarely lasts for 3 minutes.
> >
> > Or perhaps if the request_fn blocks, that would be problematic. So the
> > patch is likely a good idea even for that case.
> >
> > I'll merge it, changing it to list_splice_init() as I think that would
> > be more clear.
>
> OK - though I'm not 100% the patch fixes the problem - just that it hides the
> symptom for me.
> I might try instrumenting the code a bit more and see if I can find exactly
> where it is re-entering flush_plug_list - as that seems to be what is
> happening.

OK, I found how it re-enters.

The request_fn doesn't exactly block, but when scsi_request_fn calls
spin_unlock_irq, this calls preempt_enable which can call schedule, which is
a recursive call.

The patch I provided will stop that from recursing again as the blk_plug.list
will be empty.

So it is almost what you suggested, however the request_fn doesn't block, it
just enabled preempt.


So the comment I would put at the top of that patch would be something like:


From: NeilBrown

As request_fn called by __blk_run_queue is allowed to 'schedule()' (after
dropping the queue lock of course), it is possible to get a recursive call:

schedule -> blk_flush_plug -> __blk_finish_plug -> flush_plug_list
-> __blk_run_queue -> request_fn -> schedule

We must make sure that the second schedule does not call into blk_flush_plug
again. So instead of leaving the list of requests on blk_plug->list, move
them to a separate list leaving blk_plug->list empty.

Signed-off-by: NeilBrown

Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/10] block: remove per-queue plugging

am 11.04.2011 14:12:29 von Jens Axboe

On 2011-04-11 13:55, NeilBrown wrote:
> On Mon, 11 Apr 2011 20:59:28 +1000 NeilBrown wrote:
>
>> On Mon, 11 Apr 2011 11:19:58 +0200 Jens Axboe wrote:
>>
>>> On 2011-04-11 06:50, NeilBrown wrote:
>>
>>>> The only explanation I can come up with is that very occasionally schedule on
>>>> 2 separate cpus calls blk_flush_plug for the same task. I don't understand
>>>> the scheduler nearly well enough to know if or how that can happen.
>>>> However with this patch in place I can write to a RAID1 constantly for half
>>>> an hour, and without it, the write rarely lasts for 3 minutes.
>>>
>>> Or perhaps if the request_fn blocks, that would be problematic. So the
>>> patch is likely a good idea even for that case.
>>>
>>> I'll merge it, changing it to list_splice_init() as I think that would
>>> be more clear.
>>
>> OK - though I'm not 100% the patch fixes the problem - just that it hides the
>> symptom for me.
>> I might try instrumenting the code a bit more and see if I can find exactly
>> where it is re-entering flush_plug_list - as that seems to be what is
>> happening.
>
> OK, I found how it re-enters.
>
> The request_fn doesn't exactly block, but when scsi_request_fn calls
> spin_unlock_irq, this calls preempt_enable which can call schedule, which is
> a recursive call.
>
> The patch I provided will stop that from recursing again as the blk_plug.list
> will be empty.
>
> So it is almost what you suggested, however the request_fn doesn't block, it
> just enabled preempt.
>
>
> So the comment I would put at the top of that patch would be something like:

Ah, so it was pretty close. That does explain it. I've already queued up
the patch, I'll ammend the commit message.

--
Jens Axboe

Re: [PATCH 05/10] block: remove per-queue plugging

am 12.04.2011 00:58:12 von Christoph Hellwig

Looking at the patch
(http://git.kernel.dk/?p=linux-2.6-block.git;a=commitdiff;h= 761e433f3de6fb8e369af9e5c08beb86286d023f)

I'm not sure it's an optimal design. The flush callback really
is a per-queue thing. Why isn't it a function pointer in the request
queue when doing the blk_run_queue call once we're done with a given
queue before moving on to the next one?

Re: [PATCH 05/10] block: remove per-queue plugging

am 12.04.2011 08:20:43 von Jens Axboe

On 2011-04-12 00:58, hch@infradead.org wrote:
> Looking at the patch
> (http://git.kernel.dk/?p=linux-2.6-block.git;a=commitdiff;h= 761e433f3de6fb8e369af9e5c08beb86286d023f)
>
> I'm not sure it's an optimal design. The flush callback really
> is a per-queue thing. Why isn't it a function pointer in the request
> queue when doing the blk_run_queue call once we're done with a given
> queue before moving on to the next one?

I was thinking about this yesterday as well, the design didn't quite
feel just right. Additionally the user now must track this state too,
and whether he's plugged on that task or not.

I'll rewrite this.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html