Re: [BUG 2.6.32] md/raid1: barrier disabling does not workcorrectly in all cases

Re: [BUG 2.6.32] md/raid1: barrier disabling does not workcorrectly in all cases

am 02.02.2011 01:32:15 von NeilBrown

On Tue, 1 Feb 2011 15:45:16 -0500 Paul Clements
wrote:

> On Wed, Jan 26, 2011 at 8:55 AM, Paul Clements
> wrote:
>
> > Attached is a modified patch, which does the extra necessary work
> > (bitmap_endwrite, md_write_end) on the bio before failing it.
> >
> > Does this look correct? It seems to work.
>
> Well, not quite...it's more complicated. From my reading of the code,
> it looks like behind writes and barrier retries just do not work
> correctly together. The issue is this:
>
> - With behind writes, we signal the master_bio complete as soon as all
> non-write-behind writes are complete.
>
> - With barrier retries, you don't know if you'll need to retry until
> you've completed all legs of the write (the last leg to complete might
> throw EOPNOTSUPP).
>
> So in the case where the master_bio has been completed, we still try
> to do a retry for the leg that failed the barrier (but it's really too
> late to retry). In any case, raid1d is touching master_bio (looking at
> bi_size and bio_cloning it) during the retry, which causes a panic if
> master_bio is already being reused by someone else.
>
> I can't think of a good way to do behind writes and barrier retries
> together. Seems we've got to disable behind writes for barriers, or
> we've got to disable barrier retries when doing behind writes...
>
> Any thoughts?

I suspect you are right that barriers and behind writes are deeply
incompatible. I suspect they could be made to work together in some vaguely
sane way, but I suspect it would be a lot of work and not worth the effort.

Disabling behind-writes for all barrier requests would be quite easy, but it
might negate a lot of the value of behind writes

We could simply ignore the barrier flag on writes to behind-write devices,
but that would risk them being even more inconsistent than they currently can
be, so I doubt that is a good direct - though it is a possibility.

I think the best option is to reject barrier writes if there are any
behind-write devices. That would be reasonably safe and reasonably
consistent.

So maybe something like this??

NeilBrown

Index: linux-2.6.32.orig/drivers/md/bitmap.c
============================================================ =======
--- linux-2.6.32.orig.orig/drivers/md/bitmap.c 2009-12-03 14:51:21.000000000 +1100
+++ linux-2.6.32.orig/drivers/md/bitmap.c 2011-02-02 11:31:51.156585883 +1100
@@ -1676,6 +1676,8 @@ int bitmap_create(mddev_t *mddev)
pages, bmname(bitmap));

mddev->bitmap = bitmap;
+ if (bitmap->max_write_behind)
+ mddev->barriers_work = 0;

mddev->thread->timeout = bitmap->daemon_sleep * HZ;

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html