Re: [PATCH 4/5] dm: implement REQ_FLUSH/FUA support forrequest-based dm

Re: [PATCH 4/5] dm: implement REQ_FLUSH/FUA support forrequest-based dm

am 30.08.2010 15:28:36 von Mike Snitzer

On Mon, Aug 30 2010 at 5:58am -0400,
Tejun Heo wrote:

> This patch converts request-based dm to support the new REQ_FLUSH/FUA.
....
> This patch rips out special flush code path and deals handles
> REQ_FLUSH/FUA requests the same way as other requests. The only
> special treatment is that REQ_FLUSH requests use the block address 0
> when finding target, which is enough for now.

Looks very comparable to the patch I prepared but I have 2 observations
below (based on my findings from testing my patch).

> @@ -1562,22 +1483,15 @@ static int setup_clone(struct request *clone, struct request *rq,
> {
> int r;
>
> - if (dm_rq_is_flush_request(rq)) {
> - blk_rq_init(NULL, clone);
> - clone->cmd_type = REQ_TYPE_FS;
> - clone->cmd_flags |= (REQ_HARDBARRIER | WRITE);
> - } else {
> - r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
> - dm_rq_bio_constructor, tio);
> - if (r)
> - return r;
> -
> - clone->cmd = rq->cmd;
> - clone->cmd_len = rq->cmd_len;
> - clone->sense = rq->sense;
> - clone->buffer = rq->buffer;
> - }
> + r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
> + dm_rq_bio_constructor, tio);
> + if (r)
> + return r;
>
> + clone->cmd = rq->cmd;
> + clone->cmd_len = rq->cmd_len;
> + clone->sense = rq->sense;
> + clone->buffer = rq->buffer;
> clone->end_io = end_clone_request;
> clone->end_io_data = tio;

blk_rq_prep_clone() of a REQ_FLUSH request will result in a
rq_data_dir(clone) of read.

I still had the following:

if (rq->cmd_flags & REQ_FLUSH) {
blk_rq_init(NULL, clone);
clone->cmd_type = REQ_TYPE_FS;
/* without this the clone has a rq_data_dir of 0 */
clone->cmd_flags |= WRITE_FLUSH;
} else {
r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
dm_rq_bio_constructor, tio);
...

Request-based DM's REQ_FLUSH still works without this special casing but
I figured I'd raise this to ask: what is the proper rq_data_dir() is for
a REQ_FLUSH?

> @@ -1709,15 +1621,12 @@ static void dm_request_fn(struct request_queue *q)
> if (!rq)
> goto plug_and_out;
>
> - if (unlikely(dm_rq_is_flush_request(rq))) {
> - BUG_ON(md->flush_request);
> - md->flush_request = rq;
> - blk_start_request(rq);
> - queue_work(md->wq, &md->barrier_work);
> - goto out;
> - }
> + /* always use block 0 to find the target for flushes for now */
> + pos = 0;
> + if (!(rq->cmd_flags & REQ_FLUSH))
> + pos = blk_rq_pos(rq);
>
> - ti = dm_table_find_target(map, blk_rq_pos(rq));
> + ti = dm_table_find_target(map, pos);

I added the following here: BUG_ON(!dm_target_is_valid(ti));

> if (ti->type->busy && ti->type->busy(ti))
> goto plug_and_out;

I also needed to avoid the ->busy call for REQ_FLUSH:

if (!(rq->cmd_flags & REQ_FLUSH)) {
ti = dm_table_find_target(map, blk_rq_pos(rq));
BUG_ON(!dm_target_is_valid(ti));
if (ti->type->busy && ti->type->busy(ti))
goto plug_and_out;
} else {
/* rq-based only ever has one target! leverage this for FLUSH */
ti = dm_table_get_target(map, 0);
}

If I allowed ->busy to be called for REQ_FLUSH it would result in a
deadlock. I haven't identified where/why yet.

Other than these remaining issues this patch looks good.

Thanks,
Mike

Re: [PATCH 4/5] dm: implement REQ_FLUSH/FUA support for request-baseddm

am 30.08.2010 15:59:14 von Tejun Heo

Hello,

On 08/30/2010 03:28 PM, Mike Snitzer wrote:
>> + clone->cmd = rq->cmd;
>> + clone->cmd_len = rq->cmd_len;
>> + clone->sense = rq->sense;
>> + clone->buffer = rq->buffer;
>> clone->end_io = end_clone_request;
>> clone->end_io_data = tio;
>
> blk_rq_prep_clone() of a REQ_FLUSH request will result in a
> rq_data_dir(clone) of read.

Hmmm... why? blk_rq_prep_clone() copies all REQ_* flags in
REQ_CLONE_MASK and REQ_WRITE is definitely there. I'll check.

> I still had the following:
>
> if (rq->cmd_flags & REQ_FLUSH) {
> blk_rq_init(NULL, clone);
> clone->cmd_type = REQ_TYPE_FS;
> /* without this the clone has a rq_data_dir of 0 */
> clone->cmd_flags |= WRITE_FLUSH;
> } else {
> r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
> dm_rq_bio_constructor, tio);
> ...
>
> Request-based DM's REQ_FLUSH still works without this special casing but
> I figured I'd raise this to ask: what is the proper rq_data_dir() is for
> a REQ_FLUSH?

Technically block layer doesn't care one way or the other but WRITE
definitely. Maybe it would be a good idea to enforce that from block
layer.

>> @@ -1709,15 +1621,12 @@ static void dm_request_fn(struct request_queue *q)
>> if (!rq)
>> goto plug_and_out;
>>
>> - if (unlikely(dm_rq_is_flush_request(rq))) {
>> - BUG_ON(md->flush_request);
>> - md->flush_request = rq;
>> - blk_start_request(rq);
>> - queue_work(md->wq, &md->barrier_work);
>> - goto out;
>> - }
>> + /* always use block 0 to find the target for flushes for now */
>> + pos = 0;
>> + if (!(rq->cmd_flags & REQ_FLUSH))
>> + pos = blk_rq_pos(rq);
>>
>> - ti = dm_table_find_target(map, blk_rq_pos(rq));
>> + ti = dm_table_find_target(map, pos);
>
> I added the following here: BUG_ON(!dm_target_is_valid(ti));

I'll add it.

>> if (ti->type->busy && ti->type->busy(ti))
>> goto plug_and_out;
>
> I also needed to avoid the ->busy call for REQ_FLUSH:
>
> if (!(rq->cmd_flags & REQ_FLUSH)) {
> ti = dm_table_find_target(map, blk_rq_pos(rq));
> BUG_ON(!dm_target_is_valid(ti));
> if (ti->type->busy && ti->type->busy(ti))
> goto plug_and_out;
> } else {
> /* rq-based only ever has one target! leverage this for FLUSH */
> ti = dm_table_get_target(map, 0);
> }
>
> If I allowed ->busy to be called for REQ_FLUSH it would result in a
> deadlock. I haven't identified where/why yet.

Ah... that's probably from "if (!elv_queue_empty(q))" check below,
flushes are on a separate queue but I forgot to update
elv_queue_empty() to check the flush queue. elv_queue_empty() can
return %true spuriously in which case the queue won't be plugged and
restarted later leading to queue hang. I'll fix elv_queue_empty().

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/5] dm: implement REQ_FLUSH/FUA support for request-baseddm

am 30.08.2010 17:07:46 von Tejun Heo

On 08/30/2010 03:59 PM, Tejun Heo wrote:
> Ah... that's probably from "if (!elv_queue_empty(q))" check below,
> flushes are on a separate queue but I forgot to update
> elv_queue_empty() to check the flush queue. elv_queue_empty() can
> return %true spuriously in which case the queue won't be plugged and
> restarted later leading to queue hang. I'll fix elv_queue_empty().

I think I was too quick to blame elv_queue_empty(). Can you please
test whether the following patch fixes the hang?

Thanks.

---
block/blk-flush.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

Index: block/block/blk-flush.c
============================================================ =======
--- block.orig/block/blk-flush.c
+++ block/block/blk-flush.c
@@ -28,7 +28,8 @@ unsigned blk_flush_cur_seq(struct reques
}

static struct request *blk_flush_complete_seq(struct request_queue *q,
- unsigned seq, int error)
+ unsigned seq, int error,
+ bool from_end_io)
{
struct request *next_rq = NULL;

@@ -51,6 +52,13 @@ static struct request *blk_flush_complet
if (!list_empty(&q->pending_flushes)) {
next_rq = list_entry_rq(q->pending_flushes.next);
list_move(&next_rq->queuelist, &q->queue_head);
+ /*
+ * Moving a request silently to queue_head may
+ * stall the queue, kick the queue if we
+ * aren't in the issue path already.
+ */
+ if (from_end_io)
+ __blk_run_queue(q);
}
}
return next_rq;
@@ -59,19 +67,19 @@ static struct request *blk_flush_complet
static void pre_flush_end_io(struct request *rq, int error)
{
elv_completed_request(rq->q, rq);
- blk_flush_complete_seq(rq->q, QUEUE_FSEQ_PREFLUSH, error);
+ blk_flush_complete_seq(rq->q, QUEUE_FSEQ_PREFLUSH, error, true);
}

static void flush_data_end_io(struct request *rq, int error)
{
elv_completed_request(rq->q, rq);
- blk_flush_complete_seq(rq->q, QUEUE_FSEQ_DATA, error);
+ blk_flush_complete_seq(rq->q, QUEUE_FSEQ_DATA, error, true);
}

static void post_flush_end_io(struct request *rq, int error)
{
elv_completed_request(rq->q, rq);
- blk_flush_complete_seq(rq->q, QUEUE_FSEQ_POSTFLUSH, error);
+ blk_flush_complete_seq(rq->q, QUEUE_FSEQ_POSTFLUSH, error, true);
}

static void init_flush_request(struct request *rq, struct gendisk *disk)
@@ -165,7 +173,7 @@ struct request *blk_do_flush(struct requ
skip |= QUEUE_FSEQ_DATA;
if (!do_postflush)
skip |= QUEUE_FSEQ_POSTFLUSH;
- return blk_flush_complete_seq(q, skip, 0);
+ return blk_flush_complete_seq(q, skip, 0, false);
}

static void bio_end_flush(struct bio *bio, int err)

[PATCH] block: initialize flush request with WRITE_FLUSH insteadof REQ_FLUSH

am 30.08.2010 17:42:59 von Tejun Heo

init_flush_request() only set REQ_FLUSH when initializing flush
requests making them READ requests. Use WRITE_FLUSH instead.

Signed-off-by: Tejun Heo
Reported-by: Mike Snitzer
---
So, this was the culprit for the incorrect data direction for flush
requests.

Thanks.

block/blk-flush.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: block/block/blk-flush.c
============================================================ =======
--- block.orig/block/blk-flush.c
+++ block/block/blk-flush.c
@@ -77,7 +77,7 @@ static void post_flush_end_io(struct req
static void init_flush_request(struct request *rq, struct gendisk *disk)
{
rq->cmd_type = REQ_TYPE_FS;
- rq->cmd_flags = REQ_FLUSH;
+ rq->cmd_flags = WRITE_FLUSH;
rq->rq_disk = disk;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-baseddm

am 30.08.2010 17:45:06 von Tejun Heo

This patch converts request-based dm to support the new REQ_FLUSH/FUA.

The original request-based flush implementation depended on
request_queue blocking other requests while a barrier sequence is in
progress, which is no longer true for the new REQ_FLUSH/FUA.

In general, request-based dm doesn't have infrastructure for cloning
one source request to multiple targets, but the original flush
implementation had a special mostly independent path which can issue
flushes to multiple targets and sequence them. However, the
capability isn't currently in use and adds a lot of complexity.
Moreoever, it's unlikely to be useful in its current form as it
doesn't make sense to be able to send out flushes to multiple targets
when write requests can't be.

This patch rips out special flush code path and deals handles
REQ_FLUSH/FUA requests the same way as other requests. The only
special treatment is that REQ_FLUSH requests use the block address 0
when finding target, which is enough for now.

* added BUG_ON(!dm_target_is_valid(ti)) in dm_request_fn() as
suggested by Mike Snitzer

Signed-off-by: Tejun Heo
Cc: Mike Snitzer
---
Here's a version w/ BUG_ON() added. Once the queue hang issue is
tracked down, I'll refresh the whole series and repost.

Thanks.

drivers/md/dm.c | 206 +++++---------------------------------------------------
1 file changed, 22 insertions(+), 184 deletions(-)

Index: block/drivers/md/dm.c
============================================================ =======
--- block.orig/drivers/md/dm.c
+++ block/drivers/md/dm.c
@@ -143,20 +143,9 @@ struct mapped_device {
spinlock_t deferred_lock;

/*
- * Protect barrier_error from concurrent endio processing
- * in request-based dm.
- */
- spinlock_t barrier_error_lock;
- int barrier_error;
-
- /*
- * Processing queue (flush/barriers)
+ * Processing queue (flush)
*/
struct workqueue_struct *wq;
- struct work_struct barrier_work;
-
- /* A pointer to the currently processing pre/post flush request */
- struct request *flush_request;

/*
* The current mapping.
@@ -732,23 +721,6 @@ static void end_clone_bio(struct bio *cl
blk_update_request(tio->orig, 0, nr_bytes);
}

-static void store_barrier_error(struct mapped_device *md, int error)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&md->barrier_error_lock, flags);
- /*
- * Basically, the first error is taken, but:
- * -EOPNOTSUPP supersedes any I/O error.
- * Requeue request supersedes any I/O error but -EOPNOTSUPP.
- */
- if (!md->barrier_error || error == -EOPNOTSUPP ||
- (md->barrier_error != -EOPNOTSUPP &&
- error == DM_ENDIO_REQUEUE))
- md->barrier_error = error;
- spin_unlock_irqrestore(&md->barrier_error_lock, flags);
-}
-
/*
* Don't touch any member of the md after calling this function because
* the md may be freed in dm_put() at the end of this function.
@@ -786,13 +758,11 @@ static void free_rq_clone(struct request
static void dm_end_request(struct request *clone, int error)
{
int rw = rq_data_dir(clone);
- int run_queue = 1;
- bool is_barrier = clone->cmd_flags & REQ_HARDBARRIER;
struct dm_rq_target_io *tio = clone->end_io_data;
struct mapped_device *md = tio->md;
struct request *rq = tio->orig;

- if (rq->cmd_type == REQ_TYPE_BLOCK_PC && !is_barrier) {
+ if (rq->cmd_type == REQ_TYPE_BLOCK_PC) {
rq->errors = clone->errors;
rq->resid_len = clone->resid_len;

@@ -806,15 +776,8 @@ static void dm_end_request(struct reques
}

free_rq_clone(clone);
-
- if (unlikely(is_barrier)) {
- if (unlikely(error))
- store_barrier_error(md, error);
- run_queue = 0;
- } else
- blk_end_request_all(rq, error);
-
- rq_completed(md, rw, run_queue);
+ blk_end_request_all(rq, error);
+ rq_completed(md, rw, true);
}

static void dm_unprep_request(struct request *rq)
@@ -839,16 +802,6 @@ void dm_requeue_unmapped_request(struct
struct request_queue *q = rq->q;
unsigned long flags;

- if (unlikely(clone->cmd_flags & REQ_HARDBARRIER)) {
- /*
- * Barrier clones share an original request.
- * Leave it to dm_end_request(), which handles this special
- * case.
- */
- dm_end_request(clone, DM_ENDIO_REQUEUE);
- return;
- }
-
dm_unprep_request(rq);

spin_lock_irqsave(q->queue_lock, flags);
@@ -938,19 +891,6 @@ static void dm_complete_request(struct r
struct dm_rq_target_io *tio = clone->end_io_data;
struct request *rq = tio->orig;

- if (unlikely(clone->cmd_flags & REQ_HARDBARRIER)) {
- /*
- * Barrier clones share an original request. So can't use
- * softirq_done with the original.
- * Pass the clone to dm_done() directly in this special case.
- * It is safe (even if clone->q->queue_lock is held here)
- * because there is no I/O dispatching during the completion
- * of barrier clone.
- */
- dm_done(clone, error, true);
- return;
- }
-
tio->error = error;
rq->completion_data = clone;
blk_complete_request(rq);
@@ -967,17 +907,6 @@ void dm_kill_unmapped_request(struct req
struct dm_rq_target_io *tio = clone->end_io_data;
struct request *rq = tio->orig;

- if (unlikely(clone->cmd_flags & REQ_HARDBARRIER)) {
- /*
- * Barrier clones share an original request.
- * Leave it to dm_end_request(), which handles this special
- * case.
- */
- BUG_ON(error > 0);
- dm_end_request(clone, error);
- return;
- }
-
rq->cmd_flags |= REQ_FAILED;
dm_complete_request(clone, error);
}
@@ -1507,14 +1436,6 @@ static int dm_request(struct request_que
return _dm_request(q, bio);
}

-static bool dm_rq_is_flush_request(struct request *rq)
-{
- if (rq->cmd_flags & REQ_FLUSH)
- return true;
- else
- return false;
-}
-
void dm_dispatch_request(struct request *rq)
{
int r;
@@ -1562,22 +1483,15 @@ static int setup_clone(struct request *c
{
int r;

- if (dm_rq_is_flush_request(rq)) {
- blk_rq_init(NULL, clone);
- clone->cmd_type = REQ_TYPE_FS;
- clone->cmd_flags |= (REQ_HARDBARRIER | WRITE);
- } else {
- r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
- dm_rq_bio_constructor, tio);
- if (r)
- return r;
-
- clone->cmd = rq->cmd;
- clone->cmd_len = rq->cmd_len;
- clone->sense = rq->sense;
- clone->buffer = rq->buffer;
- }
+ r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
+ dm_rq_bio_constructor, tio);
+ if (r)
+ return r;

+ clone->cmd = rq->cmd;
+ clone->cmd_len = rq->cmd_len;
+ clone->sense = rq->sense;
+ clone->buffer = rq->buffer;
clone->end_io = end_clone_request;
clone->end_io_data = tio;

@@ -1618,9 +1532,6 @@ static int dm_prep_fn(struct request_que
struct mapped_device *md = q->queuedata;
struct request *clone;

- if (unlikely(dm_rq_is_flush_request(rq)))
- return BLKPREP_OK;
-
if (unlikely(rq->special)) {
DMWARN("Already has something in rq->special.");
return BLKPREP_KILL;
@@ -1697,6 +1608,7 @@ static void dm_request_fn(struct request
struct dm_table *map = dm_get_live_table(md);
struct dm_target *ti;
struct request *rq, *clone;
+ sector_t pos;

/*
* For suspend, check blk_queue_stopped() and increment
@@ -1709,15 +1621,14 @@ static void dm_request_fn(struct request
if (!rq)
goto plug_and_out;

- if (unlikely(dm_rq_is_flush_request(rq))) {
- BUG_ON(md->flush_request);
- md->flush_request = rq;
- blk_start_request(rq);
- queue_work(md->wq, &md->barrier_work);
- goto out;
- }
+ /* always use block 0 to find the target for flushes for now */
+ pos = 0;
+ if (!(rq->cmd_flags & REQ_FLUSH))
+ pos = blk_rq_pos(rq);
+
+ ti = dm_table_find_target(map, pos);
+ BUG_ON(!dm_target_is_valid(ti));

- ti = dm_table_find_target(map, blk_rq_pos(rq));
if (ti->type->busy && ti->type->busy(ti))
goto plug_and_out;

@@ -1888,7 +1799,6 @@ out:
static const struct block_device_operations dm_blk_dops;

static void dm_wq_work(struct work_struct *work);
-static void dm_rq_barrier_work(struct work_struct *work);

static void dm_init_md_queue(struct mapped_device *md)
{
@@ -1943,7 +1853,6 @@ static struct mapped_device *alloc_dev(i
mutex_init(&md->suspend_lock);
mutex_init(&md->type_lock);
spin_lock_init(&md->deferred_lock);
- spin_lock_init(&md->barrier_error_lock);
rwlock_init(&md->map_lock);
atomic_set(&md->holders, 1);
atomic_set(&md->open_count, 0);
@@ -1966,7 +1875,6 @@ static struct mapped_device *alloc_dev(i
atomic_set(&md->pending[1], 0);
init_waitqueue_head(&md->wait);
INIT_WORK(&md->work, dm_wq_work);
- INIT_WORK(&md->barrier_work, dm_rq_barrier_work);
init_waitqueue_head(&md->eventq);

md->disk->major = _major;
@@ -2220,8 +2128,6 @@ static int dm_init_request_based_queue(s
blk_queue_softirq_done(md->queue, dm_softirq_done);
blk_queue_prep_rq(md->queue, dm_prep_fn);
blk_queue_lld_busy(md->queue, dm_lld_busy);
- /* no flush support for request based dm yet */
- blk_queue_flush(md->queue, 0);

elv_register_queue(md->queue);

@@ -2421,73 +2327,6 @@ static void dm_queue_flush(struct mapped
queue_work(md->wq, &md->work);
}

-static void dm_rq_set_target_request_nr(struct request *clone, unsigned request_nr)
-{
- struct dm_rq_target_io *tio = clone->end_io_data;
-
- tio->info.target_request_nr = request_nr;
-}
-
-/* Issue barrier requests to targets and wait for their completion. */
-static int dm_rq_barrier(struct mapped_device *md)
-{
- int i, j;
- struct dm_table *map = dm_get_live_table(md);
- unsigned num_targets = dm_table_get_num_targets(map);
- struct dm_target *ti;
- struct request *clone;
-
- md->barrier_error = 0;
-
- for (i = 0; i < num_targets; i++) {
- ti = dm_table_get_target(map, i);
- for (j = 0; j < ti->num_flush_requests; j++) {
- clone = clone_rq(md->flush_request, md, GFP_NOIO);
- dm_rq_set_target_request_nr(clone, j);
- atomic_inc(&md->pending[rq_data_dir(clone)]);
- map_request(ti, clone, md);
- }
- }
-
- dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE);
- dm_table_put(map);
-
- return md->barrier_error;
-}
-
-static void dm_rq_barrier_work(struct work_struct *work)
-{
- int error;
- struct mapped_device *md = container_of(work, struct mapped_device,
- barrier_work);
- struct request_queue *q = md->queue;
- struct request *rq;
- unsigned long flags;
-
- /*
- * Hold the md reference here and leave it at the last part so that
- * the md can't be deleted by device opener when the barrier request
- * completes.
- */
- dm_get(md);
-
- error = dm_rq_barrier(md);
-
- rq = md->flush_request;
- md->flush_request = NULL;
-
- if (error == DM_ENDIO_REQUEUE) {
- spin_lock_irqsave(q->queue_lock, flags);
- blk_requeue_request(q, rq);
- spin_unlock_irqrestore(q->queue_lock, flags);
- } else
- blk_end_request_all(rq, error);
-
- blk_run_queue(q);
-
- dm_put(md);
-}
-
/*
* Swap in a new table, returning the old one for the caller to destroy.
*/
@@ -2619,9 +2458,8 @@ int dm_suspend(struct mapped_device *md,
up_write(&md->io_lock);

/*
- * Request-based dm uses md->wq for barrier (dm_rq_barrier_work) which
- * can be kicked until md->queue is stopped. So stop md->queue before
- * flushing md->wq.
+ * Stop md->queue before flushing md->wq in case request-based
+ * dm defers requests to md->wq from md->queue.
*/
if (dm_request_based(md))
stop_queue(md->queue);
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/5] dm: implement REQ_FLUSH/FUA support forrequest-based dm

am 30.08.2010 21:08:35 von Mike Snitzer

On Mon, Aug 30 2010 at 11:07am -0400,
Tejun Heo wrote:

> On 08/30/2010 03:59 PM, Tejun Heo wrote:
> > Ah... that's probably from "if (!elv_queue_empty(q))" check below,
> > flushes are on a separate queue but I forgot to update
> > elv_queue_empty() to check the flush queue. elv_queue_empty() can
> > return %true spuriously in which case the queue won't be plugged and
> > restarted later leading to queue hang. I'll fix elv_queue_empty().
>
> I think I was too quick to blame elv_queue_empty(). Can you please
> test whether the following patch fixes the hang?

It does, thanks!

Tested-by: Mike Snitzer
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support forrequest-based dm

am 30.08.2010 21:18:30 von Mike Snitzer

On Mon, Aug 30 2010 at 11:45am -0400,
Tejun Heo wrote:

> This patch converts request-based dm to support the new REQ_FLUSH/FUA.
>
> The original request-based flush implementation depended on
> request_queue blocking other requests while a barrier sequence is in
> progress, which is no longer true for the new REQ_FLUSH/FUA.
>
> In general, request-based dm doesn't have infrastructure for cloning
> one source request to multiple targets, but the original flush
> implementation had a special mostly independent path which can issue
> flushes to multiple targets and sequence them. However, the
> capability isn't currently in use and adds a lot of complexity.
> Moreoever, it's unlikely to be useful in its current form as it
> doesn't make sense to be able to send out flushes to multiple targets
> when write requests can't be.
>
> This patch rips out special flush code path and deals handles
> REQ_FLUSH/FUA requests the same way as other requests. The only
> special treatment is that REQ_FLUSH requests use the block address 0
> when finding target, which is enough for now.
>
> * added BUG_ON(!dm_target_is_valid(ti)) in dm_request_fn() as
> suggested by Mike Snitzer
>
> Signed-off-by: Tejun Heo

Looks good.

Acked-by: Mike Snitzer

Junichi and/or Kiyoshi,
Could you please review this patch and add your Acked-by if it is OK?
(Alasdair will want to see NEC's Ack to accept this patch).

Thanks,
Mike

Re: [PATCH 4/5] dm: implement REQ_FLUSH/FUA support forrequest-based dm

am 30.08.2010 23:28:20 von Mike Snitzer

On Mon, Aug 30 2010 at 3:08pm -0400,
Mike Snitzer wrote:

> On Mon, Aug 30 2010 at 11:07am -0400,
> Tejun Heo wrote:
>
> > On 08/30/2010 03:59 PM, Tejun Heo wrote:
> > > Ah... that's probably from "if (!elv_queue_empty(q))" check below,
> > > flushes are on a separate queue but I forgot to update
> > > elv_queue_empty() to check the flush queue. elv_queue_empty() can
> > > return %true spuriously in which case the queue won't be plugged and
> > > restarted later leading to queue hang. I'll fix elv_queue_empty().
> >
> > I think I was too quick to blame elv_queue_empty(). Can you please
> > test whether the following patch fixes the hang?
>
> It does, thanks!

Hmm, but unfortunately I was too quick to say the patch fixed the hang.

It is much more rare, but I can still get a hang. I just got the
following running vgcreate against an DM mpath (rq-based) device:

INFO: task vgcreate:3517 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
vgcreate D ffff88003d677a00 5168 3517 3361 0x00000080
ffff88003d677998 0000000000000046 ffff880000000000 ffff88003d677fd8
ffff880039c84860 ffff88003d677fd8 00000000001d3880 ffff880039c84c30
ffff880039c84c28 00000000001d3880 00000000001d3880 ffff88003d677fd8
Call Trace:
[] io_schedule+0x73/0xb5
[] get_request_wait+0xef/0x17d
[] ? autoremove_wake_function+0x0/0x39
[] __make_request+0x333/0x467
[] ? pvclock_clocksource_read+0x50/0xb9
[] generic_make_request+0x342/0x3bf
[] ? trace_hardirqs_off+0xd/0xf
[] ? local_clock+0x41/0x5a
[] submit_bio+0xdb/0xf8
[] ? trace_hardirqs_on+0xd/0xf
[] dio_bio_submit+0x7b/0x9c
[] __blockdev_direct_IO+0x7f3/0x97d
[] ? pvclock_clocksource_read+0x50/0xb9
[] blkdev_direct_IO+0x57/0x59
[] ? blkdev_get_blocks+0x0/0x90
[] generic_file_aio_read+0xed/0x5b4
[] ? lock_release_non_nested+0xd5/0x23b
[] ? might_fault+0x5c/0xac
[] ? pvclock_clocksource_read+0x50/0xb9
[] do_sync_read+0xcb/0x108
[] ? trace_hardirqs_off_caller+0x1f/0x9e
[] ? __mutex_unlock_slowpath+0x120/0x132
[] ? fsnotify_perm+0x4a/0x50
[] ? security_file_permission+0x2e/0x33
[] vfs_read+0xab/0x107
[] ? trace_hardirqs_on_caller+0x11d/0x141
[] sys_read+0x4d/0x74
[] system_call_fastpath+0x16/0x1b
no locks held by vgcreate/3517.

I was then able to reproduce it after reboot and another ~5 attempts
(all against 2.6.36-rc2 + your latest FLUSH+FUA patchset and DM
patches).

crash> bt -l 3893
PID: 3893 TASK: ffff88003e65a430 CPU: 0 COMMAND: "vgcreate"
#0 [ffff88003a5298d8] schedule at ffffffff813891d3
/root/git/linux-2.6/kernel/sched.c: 2873
#1 [ffff88003a5299a0] io_schedule at ffffffff81389308
/root/git/linux-2.6/kernel/sched.c: 5128
#2 [ffff88003a5299c0] get_request_wait at ffffffff811c7304
/root/git/linux-2.6/block/blk-core.c: 879
#3 [ffff88003a529a50] __make_request at ffffffff811c7890
/root/git/linux-2.6/block/blk-core.c: 1301
#4 [ffff88003a529ac0] generic_make_request at ffffffff811c5e91
/root/git/linux-2.6/block/blk-core.c: 1536
#5 [ffff88003a529b70] submit_bio at ffffffff811c5fe9
/root/git/linux-2.6/block/blk-core.c: 1632
#6 [ffff88003a529bc0] dio_bio_submit at ffffffff811381a6
/root/git/linux-2.6/fs/direct-io.c: 375
#7 [ffff88003a529bf0] __blockdev_direct_IO at ffffffff81138dbe
/root/git/linux-2.6/fs/direct-io.c: 1087
#8 [ffff88003a529cd0] blkdev_direct_IO at ffffffff81136d7a
/root/git/linux-2.6/fs/block_dev.c: 177
#9 [ffff88003a529d10] generic_file_aio_read at ffffffff810ce301
/root/git/linux-2.6/mm/filemap.c: 1303
#10 [ffff88003a529df0] do_sync_read at ffffffff8110e131
/root/git/linux-2.6/fs/read_write.c: 282
#11 [ffff88003a529f00] vfs_read at ffffffff8110e7a3
/root/git/linux-2.6/fs/read_write.c: 310
#12 [ffff88003a529f40] sys_read at ffffffff8110e8c2
/root/git/linux-2.6/fs/read_write.c: 388
#13 [ffff88003a529f80] system_call_fastpath at ffffffff81002c32
/root/git/linux-2.6/arch/x86/kernel/entry_64.S: 488
RIP: 0000003b602d41a0 RSP: 00007fff55d5b928 RFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffffff81002c32 RCX: 00007fff55d5b960
RDX: 0000000000001000 RSI: 00007fff55d5a000 RDI: 0000000000000005
RBP: 0000000000000000 R8: 0000000000494ecd R9: 0000000000001000
R10: 000000315c41c160 R11: 0000000000000246 R12: 00007fff55d5a000
R13: 00007fff55d5b0a0 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b

Re: [PATCH 4/5] dm: implement REQ_FLUSH/FUA support for request-baseddm

am 31.08.2010 12:29:59 von Tejun Heo

On 08/30/2010 11:28 PM, Mike Snitzer wrote:
> On Mon, Aug 30 2010 at 3:08pm -0400,
> Mike Snitzer wrote:
>
>> On Mon, Aug 30 2010 at 11:07am -0400,
>> Tejun Heo wrote:
>>
>>> On 08/30/2010 03:59 PM, Tejun Heo wrote:
>>>> Ah... that's probably from "if (!elv_queue_empty(q))" check below,
>>>> flushes are on a separate queue but I forgot to update
>>>> elv_queue_empty() to check the flush queue. elv_queue_empty() can
>>>> return %true spuriously in which case the queue won't be plugged and
>>>> restarted later leading to queue hang. I'll fix elv_queue_empty().
>>>
>>> I think I was too quick to blame elv_queue_empty(). Can you please
>>> test whether the following patch fixes the hang?
>>
>> It does, thanks!
>
> Hmm, but unfortunately I was too quick to say the patch fixed the hang.
>
> It is much more rare, but I can still get a hang. I just got the
> following running vgcreate against an DM mpath (rq-based) device:

Can you please try this one instead?

Thanks.

---
block/blk-flush.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)

Index: block/block/blk-flush.c
============================================================ =======
--- block.orig/block/blk-flush.c
+++ block/block/blk-flush.c
@@ -56,22 +56,38 @@ static struct request *blk_flush_complet
return next_rq;
}

+static void blk_flush_complete_seq_end_io(struct request_queue *q,
+ unsigned seq, int error)
+{
+ bool was_empty = elv_queue_empty(q);
+ struct request *next_rq;
+
+ next_rq = blk_flush_complete_seq(q, seq, error);
+
+ /*
+ * Moving a request silently to empty queue_head may stall the
+ * queue. Kick the queue in those cases.
+ */
+ if (next_rq && was_empty)
+ __blk_run_queue(q);
+}
+
static void pre_flush_end_io(struct request *rq, int error)
{
elv_completed_request(rq->q, rq);
- blk_flush_complete_seq(rq->q, QUEUE_FSEQ_PREFLUSH, error);
+ blk_flush_complete_seq_end_io(rq->q, QUEUE_FSEQ_PREFLUSH, error);
}

static void flush_data_end_io(struct request *rq, int error)
{
elv_completed_request(rq->q, rq);
- blk_flush_complete_seq(rq->q, QUEUE_FSEQ_DATA, error);
+ blk_flush_complete_seq_end_io(rq->q, QUEUE_FSEQ_DATA, error);
}

static void post_flush_end_io(struct request *rq, int error)
{
elv_completed_request(rq->q, rq);
- blk_flush_complete_seq(rq->q, QUEUE_FSEQ_POSTFLUSH, error);
+ blk_flush_complete_seq_end_io(rq->q, QUEUE_FSEQ_POSTFLUSH, error);
}

static void init_flush_request(struct request *rq, struct gendisk *disk)


--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/5] dm: implement REQ_FLUSH/FUA support forrequest-based dm

am 31.08.2010 15:02:42 von Mike Snitzer

On Tue, Aug 31 2010 at 6:29am -0400,
Tejun Heo wrote:

> On 08/30/2010 11:28 PM, Mike Snitzer wrote:
> > Hmm, but unfortunately I was too quick to say the patch fixed the hang.
> >
> > It is much more rare, but I can still get a hang. I just got the
> > following running vgcreate against an DM mpath (rq-based) device:
>
> Can you please try this one instead?

Still hit the hang on the 5th iteration of my test:
while true ; do ./test_dm_discard_mpath.sh && sleep 1 ; done

Would you like me to (re)send my test script offlist?

INFO: task vgcreate:2617 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
vgcreate D ffff88007bf7ba00 4688 2617 2479 0x00000080
ffff88007bf7b998 0000000000000046 ffff880000000000 ffff88007bf7bfd8
ffff88005542a430 ffff88007bf7bfd8 00000000001d3880 ffff88005542a800
ffff88005542a7f8 00000000001d3880 00000000001d3880 ffff88007bf7bfd8
Call Trace:
[] io_schedule+0x73/0xb5
[] get_request_wait+0xef/0x17d
[] ? autoremove_wake_function+0x0/0x39
[] __make_request+0x333/0x467
[] ? pvclock_clocksource_read+0x50/0xb9
[] generic_make_request+0x342/0x3bf
[] ? trace_hardirqs_off+0xd/0xf
[] ? local_clock+0x41/0x5a
[] submit_bio+0xdb/0xf8
[] ? trace_hardirqs_on+0xd/0xf
[] dio_bio_submit+0x7b/0x9c
[] __blockdev_direct_IO+0x7f3/0x97d
[] ? pvclock_clocksource_read+0x50/0xb9
[] blkdev_direct_IO+0x57/0x59
[] ? blkdev_get_blocks+0x0/0x90
[] generic_file_aio_read+0xed/0x5b4
[] ? lock_release_non_nested+0xd5/0x23b
[] ? might_fault+0x5c/0xac
[] ? pvclock_clocksource_read+0x50/0xb9
[] do_sync_read+0xcb/0x108
[] ? trace_hardirqs_off_caller+0x1f/0x9e
[] ? __mutex_unlock_slowpath+0x120/0x132
[] ? fsnotify_perm+0x4a/0x50
[] ? security_file_permission+0x2e/0x33
[] vfs_read+0xab/0x107
[] ? trace_hardirqs_on_caller+0x11d/0x141
[] sys_read+0x4d/0x74
[] system_call_fastpath+0x16/0x1b
no locks held by vgcreate/2617.

Re: [PATCH 4/5] dm: implement REQ_FLUSH/FUA support for request-baseddm

am 31.08.2010 15:14:42 von Tejun Heo

Hello,

On 08/31/2010 03:02 PM, Mike Snitzer wrote:
> On Tue, Aug 31 2010 at 6:29am -0400,
> Tejun Heo wrote:
>
>> On 08/30/2010 11:28 PM, Mike Snitzer wrote:
>>> Hmm, but unfortunately I was too quick to say the patch fixed the hang.
>>>
>>> It is much more rare, but I can still get a hang. I just got the
>>> following running vgcreate against an DM mpath (rq-based) device:
>>
>> Can you please try this one instead?
>
> Still hit the hang on the 5th iteration of my test:
> while true ; do ./test_dm_discard_mpath.sh && sleep 1 ; done
>
> Would you like me to (re)send my test script offlist?

Yes, please. Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-baseddm

am 01.09.2010 09:15:27 von Kiyoshi Ueda

Hi Tejun,

On 08/31/2010 12:45 AM +0900, Tejun Heo wrote:
> This patch converts request-based dm to support the new REQ_FLUSH/FUA.
>
> The original request-based flush implementation depended on
> request_queue blocking other requests while a barrier sequence is in
> progress, which is no longer true for the new REQ_FLUSH/FUA.
>
> In general, request-based dm doesn't have infrastructure for cloning
> one source request to multiple targets, but the original flush
> implementation had a special mostly independent path which can issue
> flushes to multiple targets and sequence them. However, the
> capability isn't currently in use and adds a lot of complexity.
> Moreoever, it's unlikely to be useful in its current form as it
> doesn't make sense to be able to send out flushes to multiple targets
> when write requests can't be.
>
> This patch rips out special flush code path and deals handles
> REQ_FLUSH/FUA requests the same way as other requests. The only
> special treatment is that REQ_FLUSH requests use the block address 0
> when finding target, which is enough for now.
>
> * added BUG_ON(!dm_target_is_valid(ti)) in dm_request_fn() as
> suggested by Mike Snitzer

Thank you for your work.

I don't see any obvious problem on this patch.
However, I hit a NULL pointer dereference below when I use a mpath
device with barrier option of ext3. I'm investigating the cause now.
(Also I'm not sure the cause of the hang which Mike is hitting yet.)

I tried on the commit 28dd53b26d362c16234249bad61db8cbd9222d0b of
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git flush-fua.

# mke2fs -j /dev/mapper/mpatha
# mount -o barrier=1 /dev/mapper/mpatha /mnt/0
# dd if=/dev/zero of=/mnt/0/a bs=512 count=1
# sync

BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
IP: [] scsi_finish_command+0xa3/0x120 [scsi_mod]
PGD 29fd9a067 PUD 2a21ff067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
CPU 1
Modules linked in: ext4 jbd2 crc16 ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables autofs4 lockd sunrpc cpufreq_ondemand acpi_cpufreq bridge stp llc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log dm_service_time dm_multipath scsi_dh dm_mod video output sbs sbshc battery ac kvm_intel kvm e1000e sg sr_mod cdrom lpfc scsi_transport_fc piix rtc_cmos rtc_core ioatdma ata_piix button serio_raw rtc_lib libata dca megaraid_sas sd_mod scsi_mod crc_t10dif ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]

Pid: 0, comm: kworker/0:0 Not tainted 2.6.36-rc2+ #1 MS-9196/Express5800/120Lj [N8100-1417]
RIP: 0010:[] [] scsi_finish_command+0xa3/0x120 [scsi_mod]
RSP: 0018:ffff880002c83e50 EFLAGS: 00010297
RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
RDX: 0000000000007d7c RSI: ffffffff81389c55 RDI: 0000000000000286
RBP: ffff880002c83e70 R08: 0000000000000002 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8802a2acf750
R13: ffff8802a25686c8 R14: ffff8802791f7eb8 R15: 0000000000000100
FS: 0000000000000000(0000) GS:ffff880002c80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000078 CR3: 00000002a2ab6000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:0 (pid: 0, threadinfo ffff8802a4576000, task ffff8802a4574050)
Stack:
ffff8802791f7eb8 0000000000002002 000000000000ea60 0000000000000005
<0> ffff880002c83ea0 ffffffffa0079ec8 ffff880002c83eb0 0000000000000020
<0> ffffffff815220a0 0000000000000004 ffff880002c83ed0 ffffffff811c6636
Call Trace:

[] scsi_softirq_done+0x138/0x170 [scsi_mod]
[] blk_done_softirq+0x86/0xa0
[] __do_softirq+0xd6/0x210
[] call_softirq+0x1c/0x50
[] do_softirq+0x95/0xd0
[] irq_exit+0x4d/0x60
[] do_IRQ+0x78/0xf0
[] ret_from_intr+0x0/0x16

[] ? mwait_idle+0x70/0xe0
[] ? trace_hardirqs_on+0xd/0x10
[] ? mwait_idle+0x79/0xe0
[] ? mwait_idle+0x70/0xe0
[] cpu_idle+0x66/0xe0
[] ? start_secondary+0x181/0x1f0
[] start_secondary+0x18f/0x1f0
Code: 0c 83 e0 07 83 f8 04 77 6c 49 8b 86 80 00 00 00 41 8b 5e 68 83 78 44 02 74 27 48 8b 80 b0 00 00 00 48 8b 80 70 02 00 00 48 8b 00 <48> 8b 50 78 89 d8 48 85 d2 74 05 4c 89 f7 ff d2 39 c3 74 21 89
RIP [] scsi_finish_command+0xa3/0x120 [scsi_mod]
RSP
CR2: 0000000000000078



Also, I have one comment below on this patch.

> @@ -2619,9 +2458,8 @@ int dm_suspend(struct mapped_device *md,
> up_write(&md->io_lock);
>
> /*
> - * Request-based dm uses md->wq for barrier (dm_rq_barrier_work) which
> - * can be kicked until md->queue is stopped. So stop md->queue before
> - * flushing md->wq.
> + * Stop md->queue before flushing md->wq in case request-based
> + * dm defers requests to md->wq from md->queue.
> */
> if (dm_request_based(md))
> stop_queue(md->queue);

Request-based dm doesn't use md->wq now, so you can just remove
the comment above.

Thanks,
Kiyoshi Ueda
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support forrequest-based dm

am 01.09.2010 14:25:27 von Mike Snitzer

On Wed, Sep 01 2010 at 3:15am -0400,
Kiyoshi Ueda wrote:

> Hi Tejun,
>
> On 08/31/2010 12:45 AM +0900, Tejun Heo wrote:
> > This patch converts request-based dm to support the new REQ_FLUSH/FUA.
> >
> > The original request-based flush implementation depended on
> > request_queue blocking other requests while a barrier sequence is in
> > progress, which is no longer true for the new REQ_FLUSH/FUA.
> >
> > In general, request-based dm doesn't have infrastructure for cloning
> > one source request to multiple targets, but the original flush
> > implementation had a special mostly independent path which can issue
> > flushes to multiple targets and sequence them. However, the
> > capability isn't currently in use and adds a lot of complexity.
> > Moreoever, it's unlikely to be useful in its current form as it
> > doesn't make sense to be able to send out flushes to multiple targets
> > when write requests can't be.
> >
> > This patch rips out special flush code path and deals handles
> > REQ_FLUSH/FUA requests the same way as other requests. The only
> > special treatment is that REQ_FLUSH requests use the block address 0
> > when finding target, which is enough for now.
> >
> > * added BUG_ON(!dm_target_is_valid(ti)) in dm_request_fn() as
> > suggested by Mike Snitzer
>
> Thank you for your work.
>
> I don't see any obvious problem on this patch.
> However, I hit a NULL pointer dereference below when I use a mpath
> device with barrier option of ext3. I'm investigating the cause now.
> (Also I'm not sure the cause of the hang which Mike is hitting yet.)
>
> I tried on the commit 28dd53b26d362c16234249bad61db8cbd9222d0b of
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git flush-fua.
>
> # mke2fs -j /dev/mapper/mpatha
> # mount -o barrier=1 /dev/mapper/mpatha /mnt/0
> # dd if=/dev/zero of=/mnt/0/a bs=512 count=1
> # sync
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000078

FYI, I can't reproduce this using all of Tejun's latest patches (not yet
in the flush-fua git tree). But I haven't tried the specific flush-fua
commit that you referenced.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-baseddm

am 02.09.2010 15:22:57 von Tejun Heo

Hello,

On 09/01/2010 09:15 AM, Kiyoshi Ueda wrote:
> I don't see any obvious problem on this patch.
> However, I hit a NULL pointer dereference below when I use a mpath
> device with barrier option of ext3. I'm investigating the cause now.
> (Also I'm not sure the cause of the hang which Mike is hitting yet.)
>
> I tried on the commit 28dd53b26d362c16234249bad61db8cbd9222d0b of
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git flush-fua.
>
> # mke2fs -j /dev/mapper/mpatha
> # mount -o barrier=1 /dev/mapper/mpatha /mnt/0
> # dd if=/dev/zero of=/mnt/0/a bs=512 count=1
> # sync

Hmm... I'm trying to reproduce this problem but hasn't been successful
yet.

> BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
> IP: [] scsi_finish_command+0xa3/0x120 [scsi_mod]

Can you please ask gdb which source line this is?

> Also, I have one comment below on this patch.
>
>> @@ -2619,9 +2458,8 @@ int dm_suspend(struct mapped_device *md,
>> up_write(&md->io_lock);
>>
>> /*
>> - * Request-based dm uses md->wq for barrier (dm_rq_barrier_work) which
>> - * can be kicked until md->queue is stopped. So stop md->queue before
>> - * flushing md->wq.
>> + * Stop md->queue before flushing md->wq in case request-based
>> + * dm defers requests to md->wq from md->queue.
>> */
>> if (dm_request_based(md))
>> stop_queue(md->queue);
>
> Request-based dm doesn't use md->wq now, so you can just remove
> the comment above.

I sure can remove it but md->wq already has most stuff necessary to
process deferred requests and when someone starts using it, having the
comment there about the rather delicate ordering would definitely be
helpful, so I suggest keeping the comment.

Thanks.

--
tejun

Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-baseddm

am 02.09.2010 15:32:32 von Tejun Heo

On 09/02/2010 03:22 PM, Tejun Heo wrote:
> Hello,
>
> On 09/01/2010 09:15 AM, Kiyoshi Ueda wrote:
>> I don't see any obvious problem on this patch.
>> However, I hit a NULL pointer dereference below when I use a mpath
>> device with barrier option of ext3. I'm investigating the cause now.
>> (Also I'm not sure the cause of the hang which Mike is hitting yet.)
>>
>> I tried on the commit 28dd53b26d362c16234249bad61db8cbd9222d0b of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git flush-fua.
>>
>> # mke2fs -j /dev/mapper/mpatha
>> # mount -o barrier=1 /dev/mapper/mpatha /mnt/0
>> # dd if=/dev/zero of=/mnt/0/a bs=512 count=1
>> # sync
>
> Hmm... I'm trying to reproduce this problem but hasn't been successful
> yet.
>
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
>> IP: [] scsi_finish_command+0xa3/0x120 [scsi_mod]
>
> Can you please ask gdb which source line this is?

Ooh, never mind. Reproduced it.

Thanks.

--
tejun

[PATCH] block: make sure FSEQ_DATA request has the same rq_disk asthe original

am 02.09.2010 19:43:45 von Tejun Heo

rq->rq_disk and bio->bi_bdev->bd_disk may differ if a request has
passed through remapping drivers. FSEQ_DATA request incorrectly
followed bio->bi_bdev->bd_disk ending up being issued w/ mismatching
rq_disk. Make it follow orig_rq->rq_disk.

Signed-off-by: Tejun Heo
Reported-by: Kiyoshi Ueda
---
Kiyoshi, can you please apply this patch on top and verify that the
problem goes away? The reason I and Mike didn't see the problem was
because we were using REQ_FUA capable underlying device, which makes
block layer bypass sequencing for FSEQ_DATA request.

Thanks.

block/blk-flush.c | 7 +++++++
1 file changed, 7 insertions(+)

Index: block/block/blk-flush.c
============================================================ =======
--- block.orig/block/blk-flush.c
+++ block/block/blk-flush.c
@@ -111,6 +111,13 @@ static struct request *queue_next_fseq(s
break;
case QUEUE_FSEQ_DATA:
init_request_from_bio(rq, orig_rq->bio);
+ /*
+ * orig_rq->rq_disk may be different from
+ * bio->bi_bdev->bd_disk if orig_rq got here through
+ * remapping drivers. Make sure rq->rq_disk points
+ * to the same one as orig_rq.
+ */
+ rq->rq_disk = orig_rq->rq_disk;
rq->cmd_flags &= ~(REQ_FLUSH | REQ_FUA);
rq->cmd_flags |= orig_rq->cmd_flags & (REQ_FLUSH | REQ_FUA);
rq->end_io = flush_data_end_io;

Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-baseddm

am 03.09.2010 07:46:32 von Kiyoshi Ueda

Hi Tejun,

On 09/02/2010 10:22 PM +0900, Tejun Heo wrote:
> On 09/01/2010 09:15 AM, Kiyoshi Ueda wrote:
>>> @@ -2619,9 +2458,8 @@ int dm_suspend(struct mapped_device *md,
>>> up_write(&md->io_lock);
>>>
>>> /*
>>> - * Request-based dm uses md->wq for barrier (dm_rq_barrier_work) which
>>> - * can be kicked until md->queue is stopped. So stop md->queue before
>>> - * flushing md->wq.
>>> + * Stop md->queue before flushing md->wq in case request-based
>>> + * dm defers requests to md->wq from md->queue.
>>> */
>>> if (dm_request_based(md))
>>> stop_queue(md->queue);
>>
>> Request-based dm doesn't use md->wq now, so you can just remove
>> the comment above.
>
> I sure can remove it but md->wq already has most stuff necessary to
> process deferred requests and when someone starts using it, having the
> comment there about the rather delicate ordering would definitely be
> helpful, so I suggest keeping the comment.

OK, makes sense.

Thanks,
Kiyoshi Ueda
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH] block: make sure FSEQ_DATA request has the same rq_diskas the original

am 03.09.2010 07:47:36 von Kiyoshi Ueda

Hi Tejun,

On 09/03/2010 02:43 AM +0900, Tejun Heo wrote:
> rq->rq_disk and bio->bi_bdev->bd_disk may differ if a request has
> passed through remapping drivers. FSEQ_DATA request incorrectly
> followed bio->bi_bdev->bd_disk ending up being issued w/ mismatching
> rq_disk. Make it follow orig_rq->rq_disk.
>
> Signed-off-by: Tejun Heo
> Reported-by: Kiyoshi Ueda
> ---
> Kiyoshi, can you please apply this patch on top and verify that the
> problem goes away? The reason I and Mike didn't see the problem was
> because we were using REQ_FUA capable underlying device, which makes
> block layer bypass sequencing for FSEQ_DATA request.
>
> Thanks.
>
> block/blk-flush.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> Index: block/block/blk-flush.c
> ============================================================ =======
> --- block.orig/block/blk-flush.c
> +++ block/block/blk-flush.c
> @@ -111,6 +111,13 @@ static struct request *queue_next_fseq(s
> break;
> case QUEUE_FSEQ_DATA:
> init_request_from_bio(rq, orig_rq->bio);
> + /*
> + * orig_rq->rq_disk may be different from
> + * bio->bi_bdev->bd_disk if orig_rq got here through
> + * remapping drivers. Make sure rq->rq_disk points
> + * to the same one as orig_rq.
> + */
> + rq->rq_disk = orig_rq->rq_disk;
> rq->cmd_flags &= ~(REQ_FLUSH | REQ_FUA);
> rq->cmd_flags |= orig_rq->cmd_flags & (REQ_FLUSH | REQ_FUA);
> rq->end_io = flush_data_end_io;

Ah, I see, thank you for the quick fix!
I confirmed no panic occurs with this patch.

Tested-by: Kiyoshi Ueda


By the way, I had been considering a block-layer interface which remaps
struct request and its bios to a block device such as:
void blk_remap_request(struct request *rq, struct block_device *bdev)
{
rq->rq_disk = bdev->bd_disk;

__rq_for_each_bio(bio, rq) {
bio->bi_bdev = bdev->bd_disk;
}
}

If there is such an interface and remapping drivers use it, then these
kind of issues may be avoided in the future.

Thanks,
Kiyoshi Ueda
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH] block: make sure FSEQ_DATA request has the same rq_diskas the original

am 03.09.2010 11:33:10 von Tejun Heo

Hello,

On 09/03/2010 07:47 AM, Kiyoshi Ueda wrote:
> Ah, I see, thank you for the quick fix!
> I confirmed no panic occurs with this patch.
>
> Tested-by: Kiyoshi Ueda

Great, thanks for testing.

> By the way, I had been considering a block-layer interface which remaps
> struct request and its bios to a block device such as:
> void blk_remap_request(struct request *rq, struct block_device *bdev)
> {
> rq->rq_disk = bdev->bd_disk;
>
> __rq_for_each_bio(bio, rq) {
> bio->bi_bdev = bdev->bd_disk;
> }
> }
>
> If there is such an interface and remapping drivers use it, then these
> kind of issues may be avoided in the future.

I think the problem is more with request initialization. After all,
once bios are packed into a request, they are (or at least should be)
just data containers. We now have multiple request init paths in
block layer and different ones initialize different subsets and it's
not very clear which fields are supposed to be initialized to what by
whom.

But yeah I agree removing discrepancy between request and bio would be
nice to have too. It's not really remapping tho. Maybe just
blk_set_rq_q() or something like that (it should also set rq->q)?

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH] block: make sure FSEQ_DATA request has the same rq_diskas the original

am 03.09.2010 12:28:22 von Kiyoshi Ueda

Hi Tejun,

On 09/03/2010 06:33 PM +0900, Tejun Heo wrote:
> On 09/03/2010 07:47 AM, Kiyoshi Ueda wrote:
>> By the way, I had been considering a block-layer interface which remaps
>> struct request and its bios to a block device such as:
>> void blk_remap_request(struct request *rq, struct block_device *bdev)
>> {
>> rq->rq_disk = bdev->bd_disk;
>>
>> __rq_for_each_bio(bio, rq) {
>> bio->bi_bdev = bdev->bd_disk;
>> }
>> }
>>
>> If there is such an interface and remapping drivers use it, then these
>> kind of issues may be avoided in the future.
>
> I think the problem is more with request initialization. After all,
> once bios are packed into a request, they are (or at least should be)
> just data containers. We now have multiple request init paths in
> block layer and different ones initialize different subsets and it's
> not very clear which fields are supposed to be initialized to what by
> whom.
>
> But yeah I agree removing discrepancy between request and bio would be
> nice to have too. It's not really remapping tho. Maybe just
> blk_set_rq_q() or something like that (it should also set rq->q)?

Thank you for pointing it.
Yes, the interface should also set rq->q.

About the naming of the interface, blk_set_ sounds
reasonable to me.
But does blk_set_rq_q() take request and queue as arguments?
Then, I'm afraid we can't find bdev for bio from given queue.

Thanks,
Kiyoshi Ueda
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH] block: make sure FSEQ_DATA request has the same rq_diskas the original

am 03.09.2010 13:42:13 von Tejun Heo

Hello,

On 09/03/2010 12:28 PM, Kiyoshi Ueda wrote:
> Thank you for pointing it.
> Yes, the interface should also set rq->q.
>
> About the naming of the interface, blk_set_ sounds
> reasonable to me.
> But does blk_set_rq_q() take request and queue as arguments?
> Then, I'm afraid we can't find bdev for bio from given queue.

Oh, right. Maybe blk_set_rq_bdev() then?

Thanks.

--
tejun

Re: [PATCH] block: make sure FSEQ_DATA request has the same rq_diskas the original

am 03.09.2010 13:51:43 von Kiyoshi Ueda

Hi Tejun,

On 09/03/2010 08:42 PM +0900, Tejun Heo wrote:
> Hello,
>
> On 09/03/2010 12:28 PM, Kiyoshi Ueda wrote:
>> Thank you for pointing it.
>> Yes, the interface should also set rq->q.
>>
>> About the naming of the interface, blk_set_ sounds
>> reasonable to me.
>> But does blk_set_rq_q() take request and queue as arguments?
>> Then, I'm afraid we can't find bdev for bio from given queue.
>
> Oh, right. Maybe blk_set_rq_bdev() then?

Yeah, although struct request doesn't have the member 'bdev',
blk_set_rq_bdev() may be better from the view of the arguments.

Thanks,
Kiyoshi Ueda
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html