reproducer for DM on MD flush deadlock? (was: Re: [PULL REQUEST] mdbug fixes)
am 17.12.2010 19:13:47 von Mike Snitzer
On Tue, Dec 14, 2010 at 2:22 AM, Neil Brown wrote:
>
>
> Hi Linus,
> =A0here are a few bug fixes for md.
> =A0Some of the patches are actually clean-up rather than bug-fix,
> =A0but I that make the bugfix simpler to review.
>
> Thanks,
> NeilBrown
>
>
> The following changes since commit 6313e3c21743cc88bb5bd8aa72948ee1e8=
3937b6:
>
> =A0Merge branches 'x86-fixes-for-linus', 'perf-fixes-for-linus' and '=
sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git=
/tip/linux-2.6-tip (2010-12-08 06:40:59 -0800)
>
> are available in the git repository at:
>
> =A0git://neil.brown.name/md/ for-linus
>
> NeilBrown (5):
> =A0 =A0 =A0md: remove handling of flush_pending in md_submit_flush_da=
ta
> =A0 =A0 =A0md: move code in to submit_flushes.
> =A0 =A0 =A0md: fix possible deadlock in handling flush requests.
Hi Neil,
Thanks for fixing this DM on MD flush issue. But my attempts to
reproduce it have been unsuccessful.
I've tried ext4 w/ barriers to a DM device above a 2 member MD RAID1.
The DM device has a table with 2 linear targets to the same md0
device:
# dmsetup table
multiple_targets: 0 24576 linear 9:0 2048
multiple_targets: 24576 49152 linear 9:0 26624
No amount of IO with flushes has enabled me to hit a deadlock (in
md_flush_request, md_write_start, etc).
Do you have a simple reproducer for this issue?
Mike
Re: reproducer for DM on MD flush deadlock? (was: Re: [PULLREQUEST] md bug fixes)
am 17.12.2010 22:01:58 von NeilBrown
On Fri, 17 Dec 2010 13:13:47 -0500 Mike Snitzer wr=
ote:
> On Tue, Dec 14, 2010 at 2:22 AM, Neil Brown wrote:
> >
> >
> > Hi Linus,
> > =A0here are a few bug fixes for md.
> > =A0Some of the patches are actually clean-up rather than bug-fix,
> > =A0but I that make the bugfix simpler to review.
> >
> > Thanks,
> > NeilBrown
> >
> >
> > The following changes since commit 6313e3c21743cc88bb5bd8aa72948ee1=
e83937b6:
> >
> > =A0Merge branches 'x86-fixes-for-linus', 'perf-fixes-for-linus' and=
'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/g=
it/tip/linux-2.6-tip (2010-12-08 06:40:59 -0800)
> >
> > are available in the git repository at:
> >
> > =A0git://neil.brown.name/md/ for-linus
> >
> > NeilBrown (5):
> > =A0 =A0 =A0md: remove handling of flush_pending in md_submit_flush_=
data
> > =A0 =A0 =A0md: move code in to submit_flushes.
> > =A0 =A0 =A0md: fix possible deadlock in handling flush requests.
>=20
> Hi Neil,
>=20
> Thanks for fixing this DM on MD flush issue. But my attempts to
> reproduce it have been unsuccessful.
>=20
> I've tried ext4 w/ barriers to a DM device above a 2 member MD RAID1.
> The DM device has a table with 2 linear targets to the same md0
> device:
>=20
> # dmsetup table
> multiple_targets: 0 24576 linear 9:0 2048
> multiple_targets: 24576 49152 linear 9:0 26624
>=20
> No amount of IO with flushes has enabled me to hit a deadlock (in
> md_flush_request, md_write_start, etc).
>=20
> Do you have a simple reproducer for this issue?
No. I think the issue is very sensitive to the exact placement of the =
border
between the two dm targets. You need to be able to produce a flush req=
uest
that crosses that border.
So to reproduce it I would:
Create an ext4 filesystem of some known size.
Impose some simple easily reproducible load and use e.g. blktrace to g=
ets a
log of the flush requests.
Choose on such request that is larger than a sector and note it's loca=
tion
Create a DM device of the same size with two targets on md devices whe=
re the
first target ends in the middle of where the flush request was
Repeat the above sequence on the dm device. That should result in a f=
lush
request overlapping both targets and thus triggering the issue.
NeilBrown