Re: [PATCH 05/10] block: remove per-queue plugging
Re: [PATCH 05/10] block: remove per-queue plugging
am 11.04.2011 12:59:28 von NeilBrown
On Mon, 11 Apr 2011 11:19:58 +0200 Jens Axboe wrote:
> On 2011-04-11 06:50, NeilBrown wrote:
> > The only explanation I can come up with is that very occasionally schedule on
> > 2 separate cpus calls blk_flush_plug for the same task. I don't understand
> > the scheduler nearly well enough to know if or how that can happen.
> > However with this patch in place I can write to a RAID1 constantly for half
> > an hour, and without it, the write rarely lasts for 3 minutes.
>
> Or perhaps if the request_fn blocks, that would be problematic. So the
> patch is likely a good idea even for that case.
>
> I'll merge it, changing it to list_splice_init() as I think that would
> be more clear.
OK - though I'm not 100% the patch fixes the problem - just that it hides the
symptom for me.
I might try instrumenting the code a bit more and see if I can find exactly
where it is re-entering flush_plug_list - as that seems to be what is
happening.
And yeah - list_split_init is probably better. I just never remember exactly
what list_split means and have to look it up every time, where as
list_add/list_del are very clear to me.
>
> > From 687b189c02276887dd7d5b87a817da9f67ed3c2c Mon Sep 17 00:00:00 2001
> > From: NeilBrown
> > Date: Thu, 7 Apr 2011 13:16:59 +1000
> > Subject: [PATCH] Enhance new plugging support to support general callbacks.
> >
> > md/raid requires an unplug callback, but as it does not uses
> > requests the current code cannot provide one.
> >
> > So allow arbitrary callbacks to be attached to the blk_plug.
> >
> > Cc: Jens Axboe
> > Signed-off-by: NeilBrown
> > ---
> > block/blk-core.c | 13 +++++++++++++
> > include/linux/blkdev.h | 7 ++++++-
> > 2 files changed, 19 insertions(+), 1 deletions(-)
> >
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 725091d..273d60b 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -2644,6 +2644,7 @@ void blk_start_plug(struct blk_plug *plug)
> >
> > plug->magic = PLUG_MAGIC;
> > INIT_LIST_HEAD(&plug->list);
> > + INIT_LIST_HEAD(&plug->cb_list);
> > plug->should_sort = 0;
> >
> > /*
> > @@ -2717,9 +2718,21 @@ static void flush_plug_list(struct blk_plug *plug)
> > local_irq_restore(flags);
> > }
> >
> > +static void flush_plug_callbacks(struct blk_plug *plug)
> > +{
> > + while (!list_empty(&plug->cb_list)) {
> > + struct blk_plug_cb *cb = list_first_entry(&plug->cb_list,
> > + struct blk_plug_cb,
> > + list);
> > + list_del(&cb->list);
> > + cb->callback(cb);
> > + }
> > +}
> > +
> > static void __blk_finish_plug(struct task_struct *tsk, struct blk_plug *plug)
> > {
> > flush_plug_list(plug);
> > + flush_plug_callbacks(plug);
> >
> > if (plug == tsk->plug)
> > tsk->plug = NULL;
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index 32176cc..3e5e604 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -857,8 +857,13 @@ extern void blk_put_queue(struct request_queue *);
> > struct blk_plug {
> > unsigned long magic;
> > struct list_head list;
> > + struct list_head cb_list;
> > unsigned int should_sort;
> > };
> > +struct blk_plug_cb {
> > + struct list_head list;
> > + void (*callback)(struct blk_plug_cb *);
> > +};
> >
> > extern void blk_start_plug(struct blk_plug *);
> > extern void blk_finish_plug(struct blk_plug *);
> > @@ -876,7 +881,7 @@ static inline bool blk_needs_flush_plug(struct task_struct *tsk)
> > {
> > struct blk_plug *plug = tsk->plug;
> >
> > - return plug && !list_empty(&plug->list);
> > + return plug && (!list_empty(&plug->list) || !list_empty(&plug->cb_list));
> > }
> >
> > /*
>
> Maybe I'm missing something, but why do you need those callbacks? If
> it's to use plugging yourself, perhaps we can just ensure that those
> don't get assigned in the task - so it would be have to used with care.
>
> It's not that I disagree to these callbacks, I just want to ensure I
> understand why you need them.
>
I'm sure one of us is missing something (probably both) but I'm not sure what.
The callback is central.
It is simply to use plugging in md.
Just like blk-core, md will notice that a blk_plug is active and will put
requests aside. I then need something to call in to md when blk_finish_plug
is called so that put-aside requests can be released.
As md can be built as a module, that call must be a call-back of some sort.
blk-core doesn't need to register blk_plug_flush because that is never in a
module, so it can be called directly. But the md equivalent could be in a
module, so I need to be able to register a call back.
Does that help?
Thanks,
NeilBrown
Re: [PATCH 05/10] block: remove per-queue plugging
am 11.04.2011 13:04:26 von Jens Axboe
On 2011-04-11 12:59, NeilBrown wrote:
> On Mon, 11 Apr 2011 11:19:58 +0200 Jens Axboe wrote:
>
>> On 2011-04-11 06:50, NeilBrown wrote:
>
>>> The only explanation I can come up with is that very occasionally schedule on
>>> 2 separate cpus calls blk_flush_plug for the same task. I don't understand
>>> the scheduler nearly well enough to know if or how that can happen.
>>> However with this patch in place I can write to a RAID1 constantly for half
>>> an hour, and without it, the write rarely lasts for 3 minutes.
>>
>> Or perhaps if the request_fn blocks, that would be problematic. So the
>> patch is likely a good idea even for that case.
>>
>> I'll merge it, changing it to list_splice_init() as I think that would
>> be more clear.
>
> OK - though I'm not 100% the patch fixes the problem - just that it hides the
> symptom for me.
> I might try instrumenting the code a bit more and see if I can find exactly
> where it is re-entering flush_plug_list - as that seems to be what is
> happening.
It's definitely a good thing to add, to avoid the list fudging on
schedule. Whether it's your exact problem, I can't tell.
> And yeah - list_split_init is probably better. I just never remember exactly
> what list_split means and have to look it up every time, where as
> list_add/list_del are very clear to me.
splice, no split :-)
>>> From 687b189c02276887dd7d5b87a817da9f67ed3c2c Mon Sep 17 00:00:00 2001
>>> From: NeilBrown
>>> Date: Thu, 7 Apr 2011 13:16:59 +1000
>>> Subject: [PATCH] Enhance new plugging support to support general callbacks.
>>>
>>> md/raid requires an unplug callback, but as it does not uses
>>> requests the current code cannot provide one.
>>>
>>> So allow arbitrary callbacks to be attached to the blk_plug.
>>>
>>> Cc: Jens Axboe
>>> Signed-off-by: NeilBrown
>>> ---
>>> block/blk-core.c | 13 +++++++++++++
>>> include/linux/blkdev.h | 7 ++++++-
>>> 2 files changed, 19 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>> index 725091d..273d60b 100644
>>> --- a/block/blk-core.c
>>> +++ b/block/blk-core.c
>>> @@ -2644,6 +2644,7 @@ void blk_start_plug(struct blk_plug *plug)
>>>
>>> plug->magic = PLUG_MAGIC;
>>> INIT_LIST_HEAD(&plug->list);
>>> + INIT_LIST_HEAD(&plug->cb_list);
>>> plug->should_sort = 0;
>>>
>>> /*
>>> @@ -2717,9 +2718,21 @@ static void flush_plug_list(struct blk_plug *plug)
>>> local_irq_restore(flags);
>>> }
>>>
>>> +static void flush_plug_callbacks(struct blk_plug *plug)
>>> +{
>>> + while (!list_empty(&plug->cb_list)) {
>>> + struct blk_plug_cb *cb = list_first_entry(&plug->cb_list,
>>> + struct blk_plug_cb,
>>> + list);
>>> + list_del(&cb->list);
>>> + cb->callback(cb);
>>> + }
>>> +}
>>> +
>>> static void __blk_finish_plug(struct task_struct *tsk, struct blk_plug *plug)
>>> {
>>> flush_plug_list(plug);
>>> + flush_plug_callbacks(plug);
>>>
>>> if (plug == tsk->plug)
>>> tsk->plug = NULL;
>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>>> index 32176cc..3e5e604 100644
>>> --- a/include/linux/blkdev.h
>>> +++ b/include/linux/blkdev.h
>>> @@ -857,8 +857,13 @@ extern void blk_put_queue(struct request_queue *);
>>> struct blk_plug {
>>> unsigned long magic;
>>> struct list_head list;
>>> + struct list_head cb_list;
>>> unsigned int should_sort;
>>> };
>>> +struct blk_plug_cb {
>>> + struct list_head list;
>>> + void (*callback)(struct blk_plug_cb *);
>>> +};
>>>
>>> extern void blk_start_plug(struct blk_plug *);
>>> extern void blk_finish_plug(struct blk_plug *);
>>> @@ -876,7 +881,7 @@ static inline bool blk_needs_flush_plug(struct task_struct *tsk)
>>> {
>>> struct blk_plug *plug = tsk->plug;
>>>
>>> - return plug && !list_empty(&plug->list);
>>> + return plug && (!list_empty(&plug->list) || !list_empty(&plug->cb_list));
>>> }
>>>
>>> /*
>>
>> Maybe I'm missing something, but why do you need those callbacks? If
>> it's to use plugging yourself, perhaps we can just ensure that those
>> don't get assigned in the task - so it would be have to used with care.
>>
>> It's not that I disagree to these callbacks, I just want to ensure I
>> understand why you need them.
>>
>
> I'm sure one of us is missing something (probably both) but I'm not
> sure what.
>
> The callback is central.
>
> It is simply to use plugging in md.
> Just like blk-core, md will notice that a blk_plug is active and will put
> requests aside. I then need something to call in to md when blk_finish_plug
But this is done in __make_request(), so md devices should not be
affected at all. This is the part of your explanation that I do not
connect with the code.
If md itself is putting things on the plug list, why is it doing that?
> is called so that put-aside requests can be released.
> As md can be built as a module, that call must be a call-back of some sort.
> blk-core doesn't need to register blk_plug_flush because that is never in a
> module, so it can be called directly. But the md equivalent could be in a
> module, so I need to be able to register a call back.
>
> Does that help?
Not really. Is the problem that _you_ would like to stash things aside,
not the fact that __make_request() puts things on a task plug list?
--
Jens Axboe
Re: [PATCH 05/10] block: remove per-queue plugging
am 19.04.2011 00:38:13 von NeilBrown
On Mon, 18 Apr 2011 17:30:48 -0400 "hch@infradead.org"
wrote:
> > md: provide generic support for handling unplug callbacks.
>
> This looks like some horribly ugly code to me. The real fix is to do
> the plugging in the block layers for bios instead of requests. The
> effect should be about the same, except that merging will become a
> little easier as all bios will be on the list now when calling into
> __make_request or it's equivalent, and even better if we extent the
> list sort callback to also sort by the start block it will actually
> simplify the merge algorithm a lot as it only needs to do front merges
> and no back merges for the on-stack merging.
>
> In addition it should also allow for much more optimal queue_lock
> roundtrips - we can keep it locked at the end of what's currently
> __make_request to have it available for the next bio that's been
> on the list. If it either can be merged now that we have the lock
> and/or we optimize get_request_wait not to sleep in the fast path
> we could get down to a single queue_lock roundtrip for each unplug.
Does the following match with your thinking? I'm trying to make for a more
concrete understanding...
- We change the ->make_request_fn interface so that it takes a list of
bios rather than a single bio - linked on ->bi_next.
These bios must all have the same ->bi_bdev. They *might* be sorted
by bi_sector (that needs to be decided).
- generic_make_request currently queues bios if there is already an active
request (this limits recursion). We enhance this to also queue requests
when code calls blk_start_plug.
In effect, generic_make_request becomes:
if (current->plug)
blk_add_to_plug(current->plug, bio);
else {
struct blk_plug plug;
blk_start_plug(&plug);
__generic_make_request(bio);
blk_finish_plug(&plug);
}
- __generic_make_request would sort the list of bios by bi_bdev (and maybe
bi_sector) and pass them along to the different ->make_request_fn
functions.
As there are likely to be only a few different bi_bdev values (often 1) but
hopefully lots and lots of bios it might be more efficient to do a linear
bucket sort based on bi_bdev, and only sort those buckets on bi_sector if
required.
Then make_request_fn handlers can expect to get lots of bios at once, can
optimise their handling as seems appropriate, and not require any further
plugging.
Is that at all close to what you are thinking?
NeilBrown
Re: [PATCH 05/10] block: remove per-queue plugging
am 20.04.2011 12:55:09 von Christoph Hellwig
On Tue, Apr 19, 2011 at 08:38:13AM +1000, NeilBrown wrote:
> Is that at all close to what you are thinking?
Yes, pretty much like that.