Re: [PATCH 05/10] block: remove per-queue plugging

Re: [PATCH 05/10] block: remove per-queue plugging

am 11.04.2011 14:05:05 von NeilBrown

On Mon, 11 Apr 2011 13:37:20 +0200 Jens Axboe wrote:

> On 2011-04-11 13:26, NeilBrown wrote:
> > On Mon, 11 Apr 2011 13:04:26 +0200 Jens Axboe wrote:
> >
> >>>
> >>> I'm sure one of us is missing something (probably both) but I'm not
> >>> sure what.
> >>>
> >>> The callback is central.
> >>>
> >>> It is simply to use plugging in md.
> >>> Just like blk-core, md will notice that a blk_plug is active and will put
> >>> requests aside. I then need something to call in to md when blk_finish_plug
> >>
> >> But this is done in __make_request(), so md devices should not be
> >> affected at all. This is the part of your explanation that I do not
> >> connect with the code.
> >>
> >> If md itself is putting things on the plug list, why is it doing that?
> >
> > Yes. Exactly. md itself want to put things aside on some list.
> > e.g. in RAID1 when using a write-intent bitmap I want to gather as many write
> > requests as possible so I can update the bits for all of them at once.
> > So when a plug is in effect I just queue the bios somewhere and record the
> > bits that need to be set.
> > Then when the unplug happens I write out the bitmap updates in a single write
> > and when that completes, I write out the data (to all devices).
> >
> > Also in RAID5 it is good if I can wait for lots of write request to arrive
> > before committing any of them to increase the possibility of getting a
> > full-stripe write.
> >
> > Previously I used ->unplug_fn to release the queued requests. Now that has
> > gone I need a different way to register a callback when an unplug happens.
>
> Ah, so this is what I was hinting at. But why use the task->plug for
> that? Seems a bit counter intuitive. Why can't you just store these
> internally?
>
> >
> >>
> >>> is called so that put-aside requests can be released.
> >>> As md can be built as a module, that call must be a call-back of some sort.
> >>> blk-core doesn't need to register blk_plug_flush because that is never in a
> >>> module, so it can be called directly. But the md equivalent could be in a
> >>> module, so I need to be able to register a call back.
> >>>
> >>> Does that help?
> >>
> >> Not really. Is the problem that _you_ would like to stash things aside,
> >> not the fact that __make_request() puts things on a task plug list?
> >>
> >
> > Yes, exactly. I (in md) want to stash things aside.
> >
> > (I don't actually put the stashed things on the blk_plug, though it might
> > make sense to do that later in some cases - I'm not sure. Currently I stash
> > things in my own internal lists and just need a call back to say "ok, flush
> > those lists now").
>
> So we are making some progress... The thing I then don't understand is
> why you want to make it associated with the plug? Seems you don't have
> any scheduling restrictions, and in which case just storing them in md
> seems like a much better option.
>

Yes. But I need to know when to release the requests that I have stored.
I need to know when ->write_pages or ->read_pages or whatever has finished
submitting a pile of pages so that I can start processing the request that I
have put aside. So I need a callback from blk_finish_plug.

(and I also need to know if a thread that was plugging schedules for the same
reason that you do).

NeilBrown


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/10] block: remove per-queue plugging

am 11.04.2011 14:11:58 von Jens Axboe

On 2011-04-11 14:05, NeilBrown wrote:
> On Mon, 11 Apr 2011 13:37:20 +0200 Jens Axboe wrote:
>
>> On 2011-04-11 13:26, NeilBrown wrote:
>>> On Mon, 11 Apr 2011 13:04:26 +0200 Jens Axboe wrote:
>>>
>>>>>
>>>>> I'm sure one of us is missing something (probably both) but I'm not
>>>>> sure what.
>>>>>
>>>>> The callback is central.
>>>>>
>>>>> It is simply to use plugging in md.
>>>>> Just like blk-core, md will notice that a blk_plug is active and will put
>>>>> requests aside. I then need something to call in to md when blk_finish_plug
>>>>
>>>> But this is done in __make_request(), so md devices should not be
>>>> affected at all. This is the part of your explanation that I do not
>>>> connect with the code.
>>>>
>>>> If md itself is putting things on the plug list, why is it doing that?
>>>
>>> Yes. Exactly. md itself want to put things aside on some list.
>>> e.g. in RAID1 when using a write-intent bitmap I want to gather as many write
>>> requests as possible so I can update the bits for all of them at once.
>>> So when a plug is in effect I just queue the bios somewhere and record the
>>> bits that need to be set.
>>> Then when the unplug happens I write out the bitmap updates in a single write
>>> and when that completes, I write out the data (to all devices).
>>>
>>> Also in RAID5 it is good if I can wait for lots of write request to arrive
>>> before committing any of them to increase the possibility of getting a
>>> full-stripe write.
>>>
>>> Previously I used ->unplug_fn to release the queued requests. Now that has
>>> gone I need a different way to register a callback when an unplug happens.
>>
>> Ah, so this is what I was hinting at. But why use the task->plug for
>> that? Seems a bit counter intuitive. Why can't you just store these
>> internally?
>>
>>>
>>>>
>>>>> is called so that put-aside requests can be released.
>>>>> As md can be built as a module, that call must be a call-back of some sort.
>>>>> blk-core doesn't need to register blk_plug_flush because that is never in a
>>>>> module, so it can be called directly. But the md equivalent could be in a
>>>>> module, so I need to be able to register a call back.
>>>>>
>>>>> Does that help?
>>>>
>>>> Not really. Is the problem that _you_ would like to stash things aside,
>>>> not the fact that __make_request() puts things on a task plug list?
>>>>
>>>
>>> Yes, exactly. I (in md) want to stash things aside.
>>>
>>> (I don't actually put the stashed things on the blk_plug, though it might
>>> make sense to do that later in some cases - I'm not sure. Currently I stash
>>> things in my own internal lists and just need a call back to say "ok, flush
>>> those lists now").
>>
>> So we are making some progress... The thing I then don't understand is
>> why you want to make it associated with the plug? Seems you don't have
>> any scheduling restrictions, and in which case just storing them in md
>> seems like a much better option.
>>
>
> Yes. But I need to know when to release the requests that I have stored.
> I need to know when ->write_pages or ->read_pages or whatever has finished
> submitting a pile of pages so that I can start processing the request that I
> have put aside. So I need a callback from blk_finish_plug.

OK fair enough, I'll add your callback patch.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 05/10] block: remove per-queue plugging

am 15.04.2011 06:26:11 von Christoph Hellwig

Btw, "block: move queue run on unplug to kblockd" currently moves
the __blk_run_queue call to kblockd unconditionally currently. But
I'm not sure that's correct - if we do an explicit blk_finish_plug
there's no point in forcing the context switch.

Re: [PATCH 05/10] block: remove per-queue plugging

am 15.04.2011 08:34:10 von Jens Axboe

On 2011-04-15 06:26, hch@infradead.org wrote:
> Btw, "block: move queue run on unplug to kblockd" currently moves
> the __blk_run_queue call to kblockd unconditionally currently. But
> I'm not sure that's correct - if we do an explicit blk_finish_plug
> there's no point in forcing the context switch.

It's correct, but yes it's not optimal for the explicit unplug. Well I
think it really depends - for the single sync case, it's not ideal to
punt to kblockd. But if you have a bunch of threads doing IO, you
probably DO want to punt it to kblockd to avoid too many threads
hammering on the queue lock at the same time. Would need testing to be
sure, the below would a way to accomplish that.


diff --git a/block/blk-core.c b/block/blk-core.c
index b598fa7..995e995 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2662,16 +2662,16 @@ static int plug_rq_cmp(void *priv, struct list_head *a, struct list_head *b)
return !(rqa->q <= rqb->q);
}

-static void queue_unplugged(struct request_queue *q, unsigned int depth)
+static void queue_unplugged(struct request_queue *q, unsigned int depth, bool run_from_wq)
{
trace_block_unplug_io(q, depth);
- __blk_run_queue(q, true);
+ __blk_run_queue(q, run_from_wq);

if (q->unplugged_fn)
q->unplugged_fn(q);
}

-void blk_flush_plug_list(struct blk_plug *plug)
+void blk_flush_plug_list(struct blk_plug *plug, bool run_from_wq)
{
struct request_queue *q;
unsigned long flags;
@@ -2706,7 +2706,7 @@ void blk_flush_plug_list(struct blk_plug *plug)
BUG_ON(!rq->q);
if (rq->q != q) {
if (q) {
- queue_unplugged(q, depth);
+ queue_unplugged(q, depth, run_from_wq);
spin_unlock(q->queue_lock);
}
q = rq->q;
@@ -2727,7 +2727,7 @@ void blk_flush_plug_list(struct blk_plug *plug)
}

if (q) {
- queue_unplugged(q, depth);
+ queue_unplugged(q, depth, run_from_wq);
spin_unlock(q->queue_lock);
}

@@ -2737,7 +2737,7 @@ EXPORT_SYMBOL(blk_flush_plug_list);

void blk_finish_plug(struct blk_plug *plug)
{
- blk_flush_plug_list(plug);
+ blk_flush_plug_list(plug, false);

if (plug == current->plug)
current->plug = NULL;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ffe48ff..1c76506 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -865,14 +865,14 @@ struct blk_plug {

extern void blk_start_plug(struct blk_plug *);
extern void blk_finish_plug(struct blk_plug *);
-extern void blk_flush_plug_list(struct blk_plug *);
+extern void blk_flush_plug_list(struct blk_plug *, bool);

static inline void blk_flush_plug(struct task_struct *tsk)
{
struct blk_plug *plug = tsk->plug;

if (plug)
- blk_flush_plug_list(plug);
+ blk_flush_plug_list(plug, true);
}

static inline bool blk_needs_flush_plug(struct task_struct *tsk)


--
Jens Axboe