linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
am 12.01.2011 18:34:33 von Valdis.Kletnieks
--==_Exmh_1294853673_4778P
Content-Type: text/plain; charset=us-ascii
Seen in a boot of yesterday's linux-next. The 'W' flag was due to the
already-reported 'WARNING: at kernel/workqueue.c:1202 worker_enter_idle'.
Not sure if it's a block_dev or dm/LVM issue, so I'm cc:'ing both groups. I wonder
if the fact I still have 'CONFIG_DEVTMPFS=n' is involved (it's apparently ticking off
dracut before and after the warning).
[ 16.840333] dracut: Found volume group "vg_blackice" using metadata type lvm2
[ 16.892282] dracut: The link /dev/vg_blackice/opt should had been created by udev but it was not found. Falling back to direct link creation.
[ 16.912627] ------------[ cut here ]------------
[ 16.912635] WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
[ 16.912638] Hardware name: Latitude E6500
[ 16.912640] Modules linked in:
[ 16.912644] Pid: 1920, comm: lvm Tainted: G W 2.6.37-next-20110111 #1
[ 16.912647] Call Trace:
[ 16.912653] [] ? warn_slowpath_common+0x80/0x98
[ 16.912657] [] ? warn_slowpath_null+0x15/0x17
[ 16.912660] [] ? bd_link_disk_holder+0x92/0x1ac
[ 16.912665] [] ? open_dev+0x76/0x9e
[ 16.912668] [] ? dm_get_device+0x147/0x1dc
[ 16.912672] [] ? linear_ctr+0x98/0xd4
[ 16.912676] [] ? dm_table_add_target+0x149/0x1d4
[ 16.912679] [] ? table_load+0xf0/0x27b
[ 16.912683] [] ? table_load+0x0/0x27b
[ 16.912686] [] ? ctl_ioctl+0x1c6/0x21e
[ 16.912690] [] ? dm_ctl_ioctl+0xe/0x12
[ 16.912695] [] ? do_vfs_ioctl+0x4c4/0x505
[ 16.912699] [] ? sys_ioctl+0x57/0x95
[ 16.912703] [] ? system_call_fastpath+0x16/0x1b
[ 16.912706] ---[ end trace 4eaa2a86a8e2da24 ]---
[ 16.925289] dracut: The link /dev/vg_blackice/src should had been created by udev but it was not found. Falling back to direct link creation.
--==_Exmh_1294853673_4778P
Content-Type: application/pgp-signature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Exmh version 2.5 07/13/2001
iD8DBQFNLeYpcC3lWbTT17ARAgZTAJ9ANtKMam01zeEQcrSePvARkO1a/gCg 2OQ1
qNJroZRFZurx2Z8n3ZrUEOE=
=OhP/
-----END PGP SIGNATURE-----
--==_Exmh_1294853673_4778P--
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
am 13.01.2011 01:23:45 von Milan Broz
On 01/12/2011 06:34 PM, Valdis.Kletnieks@vt.edu wrote:
> Seen in a boot of yesterday's linux-next. The 'W' flag was due to the
> already-reported 'WARNING: at kernel/workqueue.c:1202 worker_enter_idle'.
> Not sure if it's a block_dev or dm/LVM issue, so I'm cc:'ing both groups. I wonder
> if the fact I still have 'CONFIG_DEVTMPFS=n' is involved (it's apparently ticking off
> dracut before and after the warning).
>
> [ 16.840333] dracut: Found volume group "vg_blackice" using metadata type lvm2
> [ 16.892282] dracut: The link /dev/vg_blackice/opt should had been created by udev but it was not found. Falling back to direct link creation.
> [ 16.912627] ------------[ cut here ]------------
> [ 16.912635] WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
That seems to be
WARN_ON_ONCE(!bdev->bd_holder || bdev->bd_holder_disk);
added in patch
http://git.kernel.org/?p=linux/kernel/git/next/linux-next.gi t;a=commitdiff;h=e09b457bdb7e8d23fc54dcef0930ac697d8de895
"block: simplify holder symlink handling"
dm linear just claims device in table constructor, I don't think it is bug in DM code.
> [ 16.912638] Hardware name: Latitude E6500
> [ 16.912640] Modules linked in:
> [ 16.912644] Pid: 1920, comm: lvm Tainted: G W 2.6.37-next-20110111 #1
> [ 16.912647] Call Trace:
> [ 16.912653] [] ? warn_slowpath_common+0x80/0x98
> [ 16.912657] [] ? warn_slowpath_null+0x15/0x17
> [ 16.912660] [] ? bd_link_disk_holder+0x92/0x1ac
> [ 16.912665] [] ? open_dev+0x76/0x9e
> [ 16.912668] [] ? dm_get_device+0x147/0x1dc
> [ 16.912672] [] ? linear_ctr+0x98/0xd4
> [ 16.912676] [] ? dm_table_add_target+0x149/0x1d4
> [ 16.912679] [] ? table_load+0xf0/0x27b
> [ 16.912683] [] ? table_load+0x0/0x27b
> [ 16.912686] [] ? ctl_ioctl+0x1c6/0x21e
> [ 16.912690] [] ? dm_ctl_ioctl+0xe/0x12
> [ 16.912695] [] ? do_vfs_ioctl+0x4c4/0x505
> [ 16.912699] [] ? sys_ioctl+0x57/0x95
> [ 16.912703] [] ? system_call_fastpath+0x16/0x1b
> [ 16.912706] ---[ end trace 4eaa2a86a8e2da24 ]---
> [ 16.925289] dracut: The link /dev/vg_blackice/src should had been created by udev but it was not found. Falling back to direct link creation.
>
Milan
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
am 13.01.2011 03:19:21 von j-nomura
On 01/13/11 09:23, Milan Broz wrote:
> On 01/12/2011 06:34 PM, Valdis.Kletnieks@vt.edu wrote:
>> Seen in a boot of yesterday's linux-next. The 'W' flag was due to the
>> already-reported 'WARNING: at kernel/workqueue.c:1202 worker_enter_idle'.
>> Not sure if it's a block_dev or dm/LVM issue, so I'm cc:'ing both groups. I wonder
>> if the fact I still have 'CONFIG_DEVTMPFS=n' is involved (it's apparently ticking off
>> dracut before and after the warning).
>>
>> [ 16.840333] dracut: Found volume group "vg_blackice" using metadata type lvm2
>> [ 16.892282] dracut: The link /dev/vg_blackice/opt should had been created by udev but it was not found. Falling back to direct link creation.
>> [ 16.912627] ------------[ cut here ]------------
>> [ 16.912635] WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
>
> That seems to be
> WARN_ON_ONCE(!bdev->bd_holder || bdev->bd_holder_disk);
> added in patch
> http://git.kernel.org/?p=linux/kernel/git/next/linux-next.gi t;a=commitdiff;h=e09b457bdb7e8d23fc54dcef0930ac697d8de895
> "block: simplify holder symlink handling"
>
> dm linear just claims device in table constructor, I don't think it is bug in DM code.
The patch assumes only one holder disk for a claimed dev, which is not true.
E.g. if there are multiple LVs on a PV.
In addition to that, since claiming is done in table constructor,
there can be 2 claim instances for a slave/holder pair at a time
when you load a table while there's already an active one.
E.g. if you do lvresize.
We need consideration for this, too.
Thanks,
--
Jun'ichi Nomura, NEC Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next - WARNING: at fs/block_dev.c:824bd_link_disk_holder+0x92/0x1ac()
am 13.01.2011 12:06:40 von Tejun Heo
Hello,
On Thu, Jan 13, 2011 at 11:19:21AM +0900, Jun'ichi Nomura wrote:
> > "block: simplify holder symlink handling"
> >
> > dm linear just claims device in table constructor, I don't think it is bug in DM code.
>
> The patch assumes only one holder disk for a claimed dev, which is not true.
> E.g. if there are multiple LVs on a PV.
>
> In addition to that, since claiming is done in table constructor,
> there can be 2 claim instances for a slave/holder pair at a time
> when you load a table while there's already an active one.
> E.g. if you do lvresize.
> We need consideration for this, too.
Urgh... gees, so there actually was a user using all that cruft.
Sorry about the breakage, I'll see how multiple symlinks can be
restored. I'm curious why this was added at all tho. What was the
rationalization? It's not like two subsystems can share the same
block device so marking the currently owning subsystem should have
been enough at the block layer. There is no reason for block devices
to present information which is of no use to itself. All that's
necessary is "this is taken by dm or md for more information, query
those". dm and md need their own conf/representation layer anyway.
Thanks.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
am 13.01.2011 12:26:14 von Milan Broz
On 01/13/2011 12:06 PM, Tejun Heo wrote:
> Urgh... gees, so there actually was a user using all that cruft.
> Sorry about the breakage, I'll see how multiple symlinks can be
> restored. I'm curious why this was added at all tho. What was the
> rationalization? It's not like two subsystems can share the same
> block device so marking the currently owning subsystem should have
> been enough at the block layer. There is no reason for block devices
> to present information which is of no use to itself. All that's
> necessary is "this is taken by dm or md for more information, query
> those". dm and md need their own conf/representation layer anyway.
I am not sure if I understand it correctly, but multiple holders/slaves
links are very useful in userspace to present device dependences.
We just implemented lsblk command in util-linux which simple prints
tree according to these links, so dependendes of DM/MD/whatever devices
can be printed without using any system specific callbacks, just using
sysfs. See http://karelzak.blogspot.com/2010/12/lsblk8.html
Whatever changes are needed, please keep this functionality, it can be useful.
Thanks,
Milan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824
am 13.01.2011 13:27:01 von Karel Zak
On Thu, Jan 13, 2011 at 12:26:14PM +0100, Milan Broz wrote:
> On 01/13/2011 12:06 PM, Tejun Heo wrote:
> > Urgh... gees, so there actually was a user using all that cruft.
> > Sorry about the breakage, I'll see how multiple symlinks can be
> > restored. I'm curious why this was added at all tho. What was the
> > rationalization? It's not like two subsystems can share the same
> > block device so marking the currently owning subsystem should have
> > been enough at the block layer. There is no reason for block devices
> > to present information which is of no use to itself. All that's
> > necessary is "this is taken by dm or md for more information, query
> > those". dm and md need their own conf/representation layer anyway.
>
> I am not sure if I understand it correctly, but multiple holders/slaves
> links are very useful in userspace to present device dependences.
>
> We just implemented lsblk command in util-linux which simple prints
> tree according to these links, so dependendes of DM/MD/whatever devices
> can be printed without using any system specific callbacks, just using
> sysfs. See http://karelzak.blogspot.com/2010/12/lsblk8.html
We also use holders/slaves links in libblkid to evaluate dependencies
between devices (since 2008).
> Whatever changes are needed, please keep this functionality, it can be useful.
Definitely.
Karel
--
Karel Zak
http://karelzak.blogspot.com
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824bd_link_disk_holder+0x92/0x1ac()
am 13.01.2011 14:12:16 von Tejun Heo
Hello,
On Thu, Jan 13, 2011 at 01:27:01PM +0100, Karel Zak wrote:
> We also use holders/slaves links in libblkid to evaluate dependencies
> between devices (since 2008).
>
> > Whatever changes are needed, please keep this functionality, it
> > can be useful.
>
> Definitely.
Yeah, sure, it's not like I can afford to avoid fixing it at this
point anyway, but I still want to point out it's at the wrong layer
and shouldn't have been added like this, really. If you want blkid to
identify it, the proper thing to do would be querying blk device for
the claimer and then use claimer-specific method to query them. It's
not like the current method would make sense for btrfs or whatnot.
So, yeap, I definitely am fixing it but let's not do things like this
in the future.
Thanks.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824
am 13.01.2011 14:26:37 von Karel Zak
On Thu, Jan 13, 2011 at 02:12:16PM +0100, Tejun Heo wrote:
> Hello,
>
> On Thu, Jan 13, 2011 at 01:27:01PM +0100, Karel Zak wrote:
> > We also use holders/slaves links in libblkid to evaluate dependencies
> > between devices (since 2008).
> >
> > > Whatever changes are needed, please keep this functionality, it
> > > can be useful.
> >
> > Definitely.
>
> Yeah, sure, it's not like I can afford to avoid fixing it at this
> point anyway, but I still want to point out it's at the wrong layer
> and shouldn't have been added like this, really. If you want blkid to
> identify it, the proper thing to do would be querying blk device for
> the claimer and then use claimer-specific method to query them. It's
It seems that dependencies (holders/slaves) between devices is pretty
generic thing. Why do you think that we need claimer-specific method?
The /sys filesystem is better that ictls in many ways.
> not like the current method would make sense for btrfs or whatnot.
Could you be more verbose, please?
Karel
--
Karel Zak
http://karelzak.blogspot.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824bd_link_disk_holder+0x92/0x1ac()
am 13.01.2011 14:37:22 von Tejun Heo
Hello, Karel.
On Thu, Jan 13, 2011 at 02:26:37PM +0100, Karel Zak wrote:
> > Yeah, sure, it's not like I can afford to avoid fixing it at this
> > point anyway, but I still want to point out it's at the wrong layer
> > and shouldn't have been added like this, really. If you want blkid to
> > identify it, the proper thing to do would be querying blk device for
> > the claimer and then use claimer-specific method to query them. It's
>
> It seems that dependencies (holders/slaves) between devices is pretty
> generic thing. Why do you think that we need claimer-specific method?
> The /sys filesystem is better that ictls in many ways.
Because sysfs is already complex enough without representing
dependency information without strictly defined strcture in it. It's
for exporting the device tree to users. We have developed interesting
ways to exploit it but it generally has proven to be a bad idea to add
symlinks without clearly defined structure beneath it. People end up
using them differently and often they don't understand how the kobject
sysfs things are wired and it ends up with a lot of cruft exporting
something which is designed by small number of people without really
considering what's going on in the rest of the hierarchy.
> > not like the current method would make sense for btrfs or whatnot.
>
> Could you be more verbose, please?
If the symlinkery was something which could properly replace
configuration and query, sure, let's standardize it and share it among
all possible claimers of block devices, but it's not. md, dm, btrfs
need to export and process way more information than can be represnted
in these symlinks and it's there more as a pretty thing than anything
which is actually necessary and useful. And the original
implementation directly played with kobjects (in an unnecessarily
complicated way too) and allowed any kobject to be linked. Things
like that just never end up pretty.
So, just don't do it. Sysfs is for device hierarchy. Don't try to
shove pretty looking things there (unless it's something widely agreed
on and necessary, of course).
Thanks.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824bd_link_disk_holder+0x92/0x1ac()
am 13.01.2011 14:52:10 von Tejun Heo
On Thu, Jan 13, 2011 at 02:37:22PM +0100, Tejun Heo wrote:
> > It seems that dependencies (holders/slaves) between devices is pretty
> > generic thing. Why do you think that we need claimer-specific method?
> > The /sys filesystem is better that ictls in many ways.
Also, one more thing. In btrfs, there's no block device for the
master device. You can add the slaves directory somewhere and your
choice would be as good as any other's but it just will not be generic
while being superflous exactly the same way it's superflous for md and
dm devices.
Thanks.
--
tejun
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
am 13.01.2011 14:58:23 von Milan Broz
On 01/13/2011 02:37 PM, Tejun Heo wrote:
> So, just don't do it. Sysfs is for device hierarchy. Don't try to
> shove pretty looking things there (unless it's something widely agreed
> on and necessary, of course).
Hi,
I think that it is exactly what holders/slaves do - displaying device
hierarchy. So application can check which underlying device are related
and ask them for more info if needed (=> with system specific call,
it can be simple sysfs attribute, ioctl, whatever).
So the only request here is to keep these symlinks correct, nothing more.
Or am I missing anything?
Milan
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824bd_link_disk_holder+0x92/0x1ac()
am 13.01.2011 15:11:07 von Tejun Heo
Hello,
On Thu, Jan 13, 2011 at 02:58:23PM +0100, Milan Broz wrote:
> On 01/13/2011 02:37 PM, Tejun Heo wrote:
>
> > So, just don't do it. Sysfs is for device hierarchy. Don't try to
> > shove pretty looking things there (unless it's something widely agreed
> > on and necessary, of course).
>
> I think that it is exactly what holders/slaves do - displaying device
> hierarchy. So application can check which underlying device are related
> and ask them for more info if needed (=> with system specific call,
> it can be simple sysfs attribute, ioctl, whatever).
Yeah, sure but in a completely unrestrained and non-standard way.
First of all, it wasn't even necessary to begin with and I don't
really see anyone else other than md/dm using it. I mean, where are
you gonna you put that slaves directory? Sure you can put it
somewhere but really it would be just that - somewhere. All this
doesn't even matter. It wasn't even necessary to begin with.
> So the only request here is to keep these symlinks correct, nothing more.
> Or am I missing anything?
Yeah, I'm fixing that. Don't worry. I just wanna say it wasn't such
a brilliant idea to add it in the first place and hope that people
would restrain from doing similar things in the future.
So, as a general rule, when in doubt, just create an attribute. Let's
refrain from custom symlinkery in sysfs, please. In this case too, a
holder attribute containing strings like ext[3|4], md, dm or whatnot
would have been _much_ simpler and actually more useful.
Thank you.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
am 13.01.2011 15:25:46 von Milan Broz
On 01/13/2011 03:11 PM, Tejun Heo wrote:
> So, as a general rule, when in doubt, just create an attribute. Let's
> refrain from custom symlinkery in sysfs, please. In this case too, a
> holder attribute containing strings like ext[3|4], md, dm or whatnot
> would have been _much_ simpler and actually more useful.
Maybe, but this was not invented in DM/MD camp:-)
Probably Kay or Greg can answer why it was done this way?
For DM it just added links to be proper user of it, see
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6 .git;a=commitdiff;h=f165921df46a977e3561f1bd9f13a348441486d1
Anyway, it is /sys/block - so it represents block devices.
If btrfs internally creates some virtual _block_ device for its pool, it should
present it here too with slaves/holders. If not, why it should create any links there?
Milan
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
am 13.01.2011 15:30:52 von Tejun Heo
Hello,
On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz wrote:
> Maybe, but this was not invented in DM/MD camp:-)
> Probably Kay or Greg can answer why it was done this way?
Let's not play the dig the history and blame game if possible. We
(including me, of course) all did a lot of horrible things in the
past. :-)
> For DM it just added links to be proper user of it, see
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6 .git;a=commitdiff;h=f165921df46a977e3561f1bd9f13a348441486d1
>
> Anyway, it is /sys/block - so it represents block devices.
>
> If btrfs internally creates some virtual _block_ device for its pool, it should
> present it here too with slaves/holders. If not, why it should create any links there?
Yeah, that's the most bothering part for me. The biggest customers of
bd_claim are filesystems and all these custom symlinkeries don't do
nothing for them. It just doesn't make a whole lot of sense to me.
Thanks.
--
tejun
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
am 13.01.2011 15:43:38 von Kay Sievers
On Thu, Jan 13, 2011 at 15:30, Tejun Heo wrote:
> On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz wrote:
>> Maybe, but this was not invented in DM/MD camp:-)
>> Probably Kay or Greg can answer why it was done this way?
It's not from Greg or Kay. It just appeared some day in the context of =
dm. :)
And yes, symlinks *look* nice and simple for the outside, but they are
not, and have all sorts of problems like non-atomic updates, make it
impossible to ever rename a device (as long as they copy the device
name), and and and .... we should not add more of this.
>> If btrfs internally creates some virtual _block_ device for its pool=
, it should
>> present it here too with slaves/holders. If not, why it should creat=
e any links there?
>
> Yeah, that's the most bothering part for me. Â The biggest custom=
ers of
> bd_claim are filesystems and all these custom symlinkeries don't do
> nothing for them. Â It just doesn't make a whole lot of sense to =
me.
Btrfs does not use any blockdev as the master for good reason, and it
can never map its slaves inside of /sys/block. Simple meta-blockdevs
like md/dm just don't fit into modern requirements of a filesystem
(directory snapshots, directory subvolumes, complex raids, hassle-free
resizing, ...) -- hence btrfs is much more like a network-filesystem
mount than a stream of blocks like a disk, and does not fit at all
into this model.
Kay
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
am 13.01.2011 15:45:54 von tytso
On Jan 13, 2011, at 9:30 AM, Tejun Heo wrote:
>>
>
> Yeah, that's the most bothering part for me. The biggest customers of
> bd_claim are filesystems and all these custom symlinkeries don't do
> nothing for them. It just doesn't make a whole lot of sense to me.
>
Well, if there's better way to do things, we can send patches to libblkid to switch away from using sysfs, assuming it's using a new enough kernel.
The primary problem that we're trying to solve is to know whether a particular device contains a file system that should potentially mounted (or fsck'ed, or used as a external journal device, etc.)
If the file system is located on a raid 0 device created using devicemapper, the first physical block device could look like a file system. So what we want is a very easy way of determining, "is this device being used by the device mapper layer"? If it is, then it's probably not the droids we are looking for. We'll keep looking at the rest of the block devices in the system, and when we find the dm-concatenated devices, we can properly identify it.
Can you suggest a better way doing what it is we need to do?
-- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
am 13.01.2011 15:49:50 von Milan Broz
On 01/13/2011 03:30 PM, Tejun Heo wrote:
Hi,
>> Anyway, it is /sys/block - so it represents block devices.
>>
>> If btrfs internally creates some virtual _block_ device for its pool, it should
>> present it here too with slaves/holders. If not, why it should create any links there?
>
> Yeah, that's the most bothering part for me. The biggest customers of
> bd_claim are filesystems and all these custom symlinkeries don't do
> nothing for them. It just doesn't make a whole lot of sense to me.
Well, then it is not optimal for fs and should it be separated?
There is also /sys/fs now...
I think represent dependences of block devices in some generic (and exactly defined)
way is useful. So if there is something missing, lets define it properly.
The whole idea behind lsblk was practical: we responded many questions why
"device is busy" and what keeps it open in stacked subsystem.
Just try fdisk, mdadm, dmsetup table, losetup -a, kpartx, cryptsetup, etc ...
So listing block device tree using ONE interface (sysfs) to check and display
device tree for ALL subsystems seems to be good idea for me.
In this exactly defined case.
I am not saying that symlinks are perfect, just that generic interface here
is not superfluous.
Thanks,
Milan
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
am 13.01.2011 16:03:12 von Milan Broz
On 01/13/2011 03:43 PM, Kay Sievers wrote:
>> On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz wrote:
>>> Maybe, but this was not invented in DM/MD camp:-)
>>> Probably Kay or Greg can answer why it was done this way?
>
> It's not from Greg or Kay. It just appeared some day in the context of dm. :)
ah, then sorry, I am just confused:-)
But DM does not need it for operation at all so it must had some other reason.
(We have dmsetup ls --tree using dm-ioctls for years.)
> And yes, symlinks *look* nice and simple for the outside, but they are
> not, and have all sorts of problems like non-atomic updates, make it
> impossible to ever rename a device (as long as they copy the device
> name), and and and .... we should not add more of this.
Yes, I agree, if there is better way, let's switch to that.
Milan
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next - WARNING: at
am 13.01.2011 16:59:55 von Karel Zak
T24gVGh1LCBKYW4gMTMsIDIwMTEgYXQgMDM6NDM6MzhQTSArMDEwMCwgS2F5 IFNpZXZlcnMgd3Jv
dGU6Cj4gT24gVGh1LCBKYW4gMTMsIDIwMTEgYXQgMTU6MzAsIFRlanVuIEhl byA8dGpAa2VybmVs
Lm9yZz4gd3JvdGU6Cj4gPiBPbiBUaHUsIEphbiAxMywgMjAxMSBhdCAzOjI1 IFBNLCBNaWxhbiBC
cm96IDxtYnJvekByZWRoYXQuY29tPiB3cm90ZToKPiA+PiBNYXliZSwgYnV0 IHRoaXMgd2FzIG5v
dCBpbnZlbnRlZCBpbiBETS9NRCBjYW1wOi0pCj4gPj4gUHJvYmFibHkgS2F5 IG9yIEdyZWcgY2Fu
IGFuc3dlciB3aHkgaXQgd2FzIGRvbmUgdGhpcyB3YXk/Cj4gCj4gSXQncyBu b3QgZnJvbSBHcmVn
IG9yIEtheS4gSXQganVzdCBhcHBlYXJlZCBzb21lIGRheSBpbiB0aGUgY29u dGV4dCBvZiBkbS4g
OikKPiAKPiBBbmQgeWVzLCBzeW1saW5rcyAqbG9vayogbmljZSBhbmQgc2lt cGxlIGZvciB0aGUg
b3V0c2lkZSwgYnV0IHRoZXkgYXJlCj4gbm90LCBhbmQgaGF2ZSBhbGwgc29y dHMgb2YgcHJvYmxl
bXMgbGlrZSBub24tYXRvbWljIHVwZGF0ZXMsIG1ha2UgaXQKCiBTb3VuZHMg bGlrZSBzeXNmcyBp
bXBsZW1lbnRhdGlvbiBwcm9ibGVtLCByaWdodD8KCiBJZiB0aGVyZSBpcyBu b3dheSB0byBmaXgg
c3lzZnMgdGhlbiB3ZSBjYW4gYWRkIGEgZ2VuZXJpYyBpb2N0bCBvcgogL3N5 cy9ibG9jay88ZGV2
aWNlPi97c2xhdmUsaG9sZGVyfV9saXN0IGZpbGVzIHdpdGggbGlzdCBvZgog aG9sZGVycy9zbGF2
ZXMuCiAKIEJ1dCBwbGVhc2UsIGRvbid0IGZvcmNlIHVzZXJzcGFjZSB0byB1 c2UgKmNsYWltZXIt
c3BlY2lmaWMqCiBtZXRob2RzIHRvIGFuc3dlciAqZ2VuZXJpYyBxdWVzdGlv bnMqIGxpa2Ugc2xh
dmUvaG9sZGVyIGRlcGVuZGVuY2llcwogYmV0d2VlbiBkZXZpY2VzLgogCj4g aW1wb3NzaWJsZSB0
byBldmVyIHJlbmFtZSBhIGRldmljZSAoYXMgbG9uZyBhcyB0aGV5IGNvcHkg dGhlIGRldmljZQo+
IG5hbWUpLCBhbmQgYW5kIGFuZCAuLi4uIHdlIHNob3VsZCBub3QgYWRkIG1v cmUgb2YgdGhpcy4K
PiAKPiA+PiBJZiBidHJmcyBpbnRlcm5hbGx5IGNyZWF0ZXMgc29tZSB2aXJ0 dWFsIF9ibG9ja18g
ZGV2aWNlIGZvciBpdHMgcG9vbCwgaXQgc2hvdWxkCj4gPj4gcHJlc2VudCBp dCBoZXJlIHRvbyB3
aXRoIHNsYXZlcy9ob2xkZXJzLiBJZiBub3QsIHdoeSBpdCBzaG91bGQgY3Jl YXRlIGFueSBsaW5r
cyB0aGVyZT8KPiA+Cj4gPiBZZWFoLCB0aGF0J3MgdGhlIG1vc3QgYm90aGVy aW5nIHBhcnQgZm9y
IG1lLiDCoFRoZSBiaWdnZXN0IGN1c3RvbWVycyBvZgo+ID4gYmRfY2xhaW0g YXJlIGZpbGVzeXN0
ZW1zIGFuZCBhbGwgdGhlc2UgY3VzdG9tIHN5bWxpbmtlcmllcyBkb24ndCBk bwo+ID4gbm90aGlu
ZyBmb3IgdGhlbS4gwqBJdCBqdXN0IGRvZXNuJ3QgbWFrZSBhIHdob2xlIGxv dCBvZiBzZW5zZSB0
byBtZS4KPiAKPiBCdHJmcyBkb2VzIG5vdCB1c2UgYW55IGJsb2NrZGV2IGFz IHRoZSBtYXN0ZXIg
Zm9yIGdvb2QgcmVhc29uLCBhbmQgaXQKPiBjYW4gbmV2ZXIgbWFwIGl0cyBz bGF2ZXMgaW5zaWRl
IG9mIC9zeXMvYmxvY2suIAoKIFllcCwgZXhwZWN0ZWQgYW5kIGNvcnJlY3Qg cmVzcG9uc2UgOi0p
CgogICAgS2FyZWwKCi0tIAogS2FyZWwgWmFrICA8a3pha0ByZWRoYXQuY29t PgogaHR0cDovL2th
cmVsemFrLmJsb2dzcG90LmNvbQoKLS0KZG0tZGV2ZWwgbWFpbGluZyBsaXN0 CmRtLWRldmVsQHJl
ZGhhdC5jb20KaHR0cHM6Ly93d3cucmVkaGF0LmNvbS9tYWlsbWFuL2xpc3Rp bmZvL2RtLWRldmVs
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
am 13.01.2011 17:10:02 von Kay Sievers
On Thu, Jan 13, 2011 at 16:59, Karel Zak wrote:
> On Thu, Jan 13, 2011 at 03:43:38PM +0100, Kay Sievers wrote:
>> On Thu, Jan 13, 2011 at 15:30, Tejun Heo wrote:
>> > On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz wro=
te:
>> >> Maybe, but this was not invented in DM/MD camp:-)
>> >> Probably Kay or Greg can answer why it was done this way?
>>
>> It's not from Greg or Kay. It just appeared some day in the context =
of dm. :)
>>
>> And yes, symlinks *look* nice and simple for the outside, but they a=
re
>> not, and have all sorts of problems like non-atomic updates, make it
>
> Â Sounds like sysfs implementation problem, right?
It's a normal multi-file problem. It can by-definition not be atomic
without doing really weird locking things.
> Â If there is noway to fix sysfs then we can add a generic ioctl =
or
> Â /sys/block//{slave,holder}_list files with list of
> Â holders/slaves.
Yeah, we've been there with the btrfs problem. For btrfs it woud
probably need to be something statfs()-like.
> Â But please, don't force userspace to use *claimer-specific*
> Â methods to answer *generic questions* like slave/holder depende=
ncies
> Â between devices.
The links exist only for dm and md so far, I think. It's the classical
multiple-parents-in-a-tree problem. We have that for bonded network
devices and some IO buses too. There is no nice representation for
these reversed-trees-in-the-tree so far.
Kay
[PATCH] block: restore multiple bd_link_disk_holder() support
am 13.01.2011 18:21:33 von Tejun Heo
Commit e09b457b (block: simplify holder symlink handling) incorrectly
assumed that there is only one holder at maximum. dm may use multiple
holders. Remove the single holder assumption and automatic removal of
the link. Let the callers explicitly remove them. This change makes
it even more alien from the rest of the block layer.
While at it, note that this facility should not be used by anyone else
than the current ones. Sysfs symlinks shouldn't be abused like this
and the whole thing doesn't belong in the block layer at all.
Signed-off-by: Tejun Heo
Reported-by: Milan Broz
Cc: Jun'ichi Nomura
Cc: Neil Brown
Cc: linux-raid@vger.kernel.org
Cc: Kay Sievers
---
Milan, Jun, can you guys please verify this works correctly for the
multi holder dm case? Thank you.
drivers/md/dm-table.c | 1 +
drivers/md/md.c | 1 +
fs/block_dev.c | 28 +++++++++++++++-------------
include/linux/fs.h | 9 ++++++---
4 files changed, 23 insertions(+), 16 deletions(-)
Index: work/include/linux/fs.h
============================================================ =======
--- work.orig/include/linux/fs.h
+++ work/include/linux/fs.h
@@ -663,9 +663,6 @@ struct block_device {
void * bd_holder;
int bd_holders;
bool bd_write_holder;
-#ifdef CONFIG_SYSFS
- struct gendisk * bd_holder_disk; /* for sysfs slave linkng */
-#endif
struct block_device * bd_contains;
unsigned bd_block_size;
struct hd_struct * bd_part;
@@ -2045,12 +2042,18 @@ extern struct block_device *blkdev_get_b
extern int blkdev_put(struct block_device *bdev, fmode_t mode);
#ifdef CONFIG_SYSFS
extern int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk);
+extern void bd_unlink_disk_holder(struct block_device *bdev,
+ struct gendisk *disk);
#else
static inline int bd_link_disk_holder(struct block_device *bdev,
struct gendisk *disk)
{
return 0;
}
+static inline void bd_unlink_disk_holder(struct block_device *bdev,
+ struct gendisk *disk)
+{
+}
#endif
#endif
Index: work/drivers/md/dm-table.c
============================================================ =======
--- work.orig/drivers/md/dm-table.c
+++ work/drivers/md/dm-table.c
@@ -347,6 +347,7 @@ static void close_dev(struct dm_dev_inte
if (!d->dm_dev.bdev)
return;
+ bd_unlink_disk_holder(d->dm_dev.bdev, dm_disk(md));
blkdev_put(d->dm_dev.bdev, d->dm_dev.mode | FMODE_EXCL);
d->dm_dev.bdev = NULL;
}
Index: work/drivers/md/md.c
============================================================ =======
--- work.orig/drivers/md/md.c
+++ work/drivers/md/md.c
@@ -1907,6 +1907,7 @@ static void unbind_rdev_from_array(mdk_r
MD_BUG();
return;
}
+ bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk);
list_del_rcu(&rdev->same_set);
printk(KERN_INFO "md: unbind<%s>\n", bdevname(rdev->bdev,b));
rdev->mddev = NULL;
Index: work/fs/block_dev.c
============================================================ =======
--- work.orig/fs/block_dev.c
+++ work/fs/block_dev.c
@@ -788,6 +788,8 @@ static void del_symlink(struct kobject *
* @bdev: the claimed slave bdev
* @disk: the holding disk
*
+ * DON'T USE THIS UNLESS YOU'RE ALREADY USING IT.
+ *
* This functions creates the following sysfs symlinks.
*
* - from "slaves" directory of the holder @disk to the claimed @bdev
@@ -815,7 +817,7 @@ int bd_link_disk_holder(struct block_dev
mutex_lock(&bdev->bd_mutex);
- WARN_ON_ONCE(!bdev->bd_holder || bdev->bd_holder_disk);
+ WARN_ON_ONCE(!bdev->bd_holder);
/* FIXME: remove the following once add_disk() handles errors */
if (WARN_ON(!disk->slave_dir || !bdev->bd_part->holder_dir))
@@ -831,27 +833,28 @@ int bd_link_disk_holder(struct block_dev
goto out_unlock;
}
- bdev->bd_holder_disk = disk;
out_unlock:
mutex_unlock(&bdev->bd_mutex);
return ret;
}
EXPORT_SYMBOL_GPL(bd_link_disk_holder);
-static void bd_unlink_disk_holder(struct block_device *bdev)
+/**
+ * bd_unlink_disk_holder - destroy symlinks created by bd_link_disk_holder()
+ * @bdev: the calimed slave bdev
+ * @disk: the holding disk
+ *
+ * DON'T USE THIS UNLESS YOU'RE ALREADY USING IT.
+ *
+ * CONTEXT:
+ * Might sleep.
+ */
+void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
{
- struct gendisk *disk = bdev->bd_holder_disk;
-
- bdev->bd_holder_disk = NULL;
- if (!disk)
- return;
-
del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
del_symlink(bdev->bd_part->holder_dir, &disk_to_dev(disk)->kobj);
}
-#else
-static inline void bd_unlink_disk_holder(struct block_device *bdev)
-{ }
+EXPORT_SYMBOL_GPL(bd_unlink_disk_holder);
#endif
/**
@@ -1374,7 +1377,6 @@ int blkdev_put(struct block_device *bdev
* unblock evpoll if it was a write holder.
*/
if (bdev_free) {
- bd_unlink_disk_holder(bdev);
if (bdev->bd_write_holder) {
disk_unblock_events(bdev->bd_disk);
bdev->bd_write_holder = false;
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] block: restore multiple bd_link_disk_holder() support
am 13.01.2011 19:42:33 von Milan Broz
On 01/13/2011 06:21 PM, Tejun Heo wrote:
> Commit e09b457b (block: simplify holder symlink handling) incorrectly
> assumed that there is only one holder at maximum. dm may use multiple
> holders. Remove the single holder assumption and automatic removal of
> the link. Let the callers explicitly remove them. This change makes
> it even more alien from the rest of the block layer.
>
> While at it, note that this facility should not be used by anyone else
> than the current ones. Sysfs symlinks shouldn't be abused like this
> and the whole thing doesn't belong in the block layer at all.
>
> Signed-off-by: Tejun Heo
> Reported-by: Milan Broz
> Cc: Jun'ichi Nomura
> Cc: Neil Brown
> Cc: linux-raid@vger.kernel.org
> Cc: Kay Sievers
> ---
> Milan, Jun, can you guys please verify this works correctly for the
> multi holder dm case? Thank you.
Hi,
unfortunately not. And the problem is much worse,
it breaks lvm resize operation completely.
I cherry-picked you patch to my tree, it fails, reverting helps.
Following test is over linux-next with your last patch applied:
Test case (lvresize of volume over 3 disks sdb,c,d):
# pvcreate /dev/sd[bcd]
Physical volume "/dev/sdb" successfully created
Physical volume "/dev/sdc" successfully created
Physical volume "/dev/sdd" successfully created
# vgcreate vg_test /dev/sd[bcd]
Volume group "vg_test" successfully created
# lvcreate -l100%FREE -n lv vg_test
Logical volume "lv" created
# dmsetup table
vg_test-lv: 0 409600 linear 8:16 2048
vg_test-lv: 409600 409600 linear 8:32 2048
vg_test-lv: 819200 409600 linear 8:48 2048
so we have active LV mapped to 3 devices now.
Now online resize it. I means that it will create inactive table,
then switch active (so the magic with claiming disks applies):
# lvresize -L-10M vg_test/lv
Rounding up size to full physical extent 8.00 MiB
WARNING: Reducing active logical volume to 592.00 MiB
THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce lv? [y/n]: y
Reducing logical volume lv to 592.00 MiB
device-mapper: reload ioctl failed: Invalid argument
Failed to suspend lv
and in syslog:
[ 291.221081] ------------[ cut here ]------------
[ 291.221111] WARNING: at fs/sysfs/dir.c:455 sysfs_add_one+0x6b/0x80()
[ 291.221130] Hardware name: VMware Virtual Platform
[ 291.221140] sysfs: cannot create duplicate filename '/devices/virtual/block/dm-0/slaves/sdb'
[ 291.221162] Modules linked in: usbcore dm_mod
[ 291.221265] Pid: 4074, comm: lvresize Tainted: G W 2.6.37-next-20110113+ #2
[ 291.221269] Call Trace:
[ 291.221286] [] ? warn_slowpath_common+0x65/0x7a
[ 291.221290] [] ? sysfs_add_one+0x6b/0x80
[ 291.221295] [] ? warn_slowpath_fmt+0x26/0x2a
[ 291.221299] [] ? sysfs_add_one+0x6b/0x80
[ 291.221304] [] ? sysfs_do_create_link+0xda/0x164
[ 291.221308] [] ? sysfs_create_link+0xa/0xc
[ 291.221314] [] ? bd_link_disk_holder+0x66/0xad
[ 291.221484] [] ? open_dev+0x47/0x67 [dm_mod]
[ 291.221492] [] ? dm_get_device+0xf5/0x1d5 [dm_mod]
[ 291.221502] [] ? vsscanf+0x364/0x3ee
[ 291.221510] [] ? cache_alloc_debugcheck_after+0xf/0x180
[ 291.221523] [] ? linear_ctr+0x86/0xc0 [dm_mod]
[ 291.221531] [] ? dm_table_add_target+0x153/0x1d3 [dm_mod]
[ 291.221538] [] ? table_load+0x1fd/0x21e [dm_mod]
[ 291.221545] [] ? dm_ctl_ioctl+0x188/0x1c8 [dm_mod]
[ 291.221551] [] ? table_load+0x0/0x21e [dm_mod]
[ 291.221558] [] ? dm_ctl_ioctl+0x0/0x1c8 [dm_mod]
[ 291.221565] [] ? do_vfs_ioctl+0x493/0x4d8
[ 291.221573] [] ? do_page_fault+0x3ee/0x418
[ 291.221578] [] ? sys_ioctl+0x2e/0x48
[ 291.221583] [] ? sysenter_do_call+0x12/0x32
[ 291.221609] ---[ end trace c21a5bad605a4a87 ]---
[ 291.221760] device-mapper: table: 254:0: linear: dm-linear: Device lookup failed
[ 291.221782] device-mapper: ioctl: error adding target to table
Milan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824bd_link_disk_holder+0x92/0x1ac()
am 13.01.2011 21:18:58 von NeilBrown
On Thu, 13 Jan 2011 09:45:54 -0500 Theodore Tso wrote:
>
> On Jan 13, 2011, at 9:30 AM, Tejun Heo wrote:
> >>
> >
> > Yeah, that's the most bothering part for me. The biggest customers of
> > bd_claim are filesystems and all these custom symlinkeries don't do
> > nothing for them. It just doesn't make a whole lot of sense to me.
> >
>
> Well, if there's better way to do things, we can send patches to libblkid to switch away from using sysfs, assuming it's using a new enough kernel.
>
> The primary problem that we're trying to solve is to know whether a particular device contains a file system that should potentially mounted (or fsck'ed, or used as a external journal device, etc.)
>
> If the file system is located on a raid 0 device created using devicemapper, the first physical block device could look like a file system. So what we want is a very easy way of determining, "is this device being used by the device mapper layer"? If it is, then it's probably not the droids we are looking for. We'll keep looking at the rest of the block devices in the system, and when we find the dm-concatenated devices, we can properly identify it.
>
> Can you suggest a better way doing what it is we need to do?
open(O_EXCL) will fail on a block device if it is being used by anything else
- a filesystem or a dm target or an md array or ....
So if the *only* thing you want is "is this currently an active part of
something else", then O_EXCL works since 2.6.0 (I think).
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824bd_link_disk_holder+0x92/0x1ac()
am 13.01.2011 21:41:45 von tytso
On Fri, Jan 14, 2011 at 07:18:58AM +1100, NeilBrown wrote:
>
> open(O_EXCL) will fail on a block device if it is being used by anything else
> - a filesystem or a dm target or an md array or ....
>
> So if the *only* thing you want is "is this currently an active part of
> something else", then O_EXCL works since 2.6.0 (I think).
Unfortunately, that won't distinguish between a currently active file
system, and a device which is being used by a dm target, which is what
we want to do.
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] block: restore multiple bd_link_disk_holder() support
am 14.01.2011 08:31:57 von j-nomura
Hi,
On 01/14/11 03:42, Milan Broz wrote:
>> Milan, Jun, can you guys please verify this works correctly for the
>> multi holder dm case? Thank you.
>
> Hi,
>
> unfortunately not. And the problem is much worse,
> it breaks lvm resize operation completely.
For quick testing, try this:
("/dev/sdf" can be other block device.
"testA" and "testB" can be other dm names which aren't used in
your machine.)
# echo "0 10 linear /dev/sdf 0" | dmsetup create testA
# echo "0 10 linear /dev/sdf 10" | dmsetup create testB
(at this point, /dev/sdf has multiple holders)
# echo "0 20 linear /dev/sdf 10" | dmsetup load testB
(at this point, /dev/mapper/testB has 2 claims on /dev/sdf)
# dmsetup resume testB
(/dev/mapper/testB has 1 claim on /dev/sdf)
# dmsetup remove testA
# dmsetup remove testB
Thanks,
--
Jun'ichi Nomura, NEC Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
am 14.01.2011 08:38:05 von j-nomura
On 01/14/11 00:03, Milan Broz wrote:
> On 01/13/2011 03:43 PM, Kay Sievers wrote:
>
>>> On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz wrote:
>>>> Maybe, but this was not invented in DM/MD camp:-)
>>>> Probably Kay or Greg can answer why it was done this way?
>>
>> It's not from Greg or Kay. It just appeared some day in the context of dm. :)
>
> ah, then sorry, I am just confused:-)
> But DM does not need it for operation at all so it must had some other reason.
> (We have dmsetup ls --tree using dm-ioctls for years.)
Sorry. It's from me 5 years ago. :)
See this for backgrounds:
[PATCH 0/3] sysfs representation of stacked devices (dm/md)
http://lwn.net/Articles/172689/
And I don't adhere to my implementation if there's better one.
Thanks,
--
Jun'ichi Nomura, NEC Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824
am 14.01.2011 16:07:36 von Karel Zak
On Thu, Jan 13, 2011 at 05:10:02PM +0100, Kay Sievers wrote:
> On Thu, Jan 13, 2011 at 16:59, Karel Zak wrote:
> > On Thu, Jan 13, 2011 at 03:43:38PM +0100, Kay Sievers wrote:
> >> On Thu, Jan 13, 2011 at 15:30, Tejun Heo wrote:
> >> > On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz w=
rote:
> >> >> Maybe, but this was not invented in DM/MD camp:-)
> >> >> Probably Kay or Greg can answer why it was done this way?
> >>
> >> It's not from Greg or Kay. It just appeared some day in the contex=
t of dm. :)
> >>
> >> And yes, symlinks *look* nice and simple for the outside, but they=
are
> >> not, and have all sorts of problems like non-atomic updates, make =
it
> >
> > Â Sounds like sysfs implementation problem, right?
>=20
> It's a normal multi-file problem. It can by-definition not be atomic
> without doing really weird locking things.
BTW, lsblk(8) and libblkid don't depend on the fact that slaves/holder=
s=20
files are symlinks.
=20
The important thing is the filename (/sys/block/.../slaves/)=20
only. We don't follow the symlinks and we don't use readlink() there.
=20
It means that you can replace the symlinks with regular files where
in the file contents is for example maj:min, etc.
Karel
--=20
Karel Zak
http://karelzak.blogspot.com
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel=
" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
am 14.01.2011 16:23:56 von Kay Sievers
On Fri, Jan 14, 2011 at 16:07, Karel Zak wrote:
> On Thu, Jan 13, 2011 at 05:10:02PM +0100, Kay Sievers wrote:
>> On Thu, Jan 13, 2011 at 16:59, Karel Zak wrote:
>> > On Thu, Jan 13, 2011 at 03:43:38PM +0100, Kay Sievers wrote:
>> >> On Thu, Jan 13, 2011 at 15:30, Tejun Heo wrote:
>> >> > On Thu, Jan 13, 2011 at 3:25 PM, Milan Broz =
wrote:
>> >> >> Maybe, but this was not invented in DM/MD camp:-)
>> >> >> Probably Kay or Greg can answer why it was done this way?
>> >>
>> >> It's not from Greg or Kay. It just appeared some day in the conte=
xt of dm. :)
>> >>
>> >> And yes, symlinks *look* nice and simple for the outside, but the=
y are
>> >> not, and have all sorts of problems like non-atomic updates, make=
it
>> >
>> > Â Sounds like sysfs implementation problem, right?
>>
>> It's a normal multi-file problem. It can by-definition not be atomic
>> without doing really weird locking things.
>
> Â BTW, lsblk(8) and libblkid don't depend on the fact that slaves=
/holders
> Â files are symlinks.
>
> Â The important thing is the filename (/sys/block/.../slaves/
e>)
> Â only. We don't follow the symlinks and we don't use readlink() =
there.
>
> Â It means that you can replace the symlinks with regular files w=
here
> Â in the file contents is for example maj:min, etc.
I don't think we really can change anything here, there are more users =
of it.
If we would go changing things here, the file names should never be
the device name but the format "b8:32" for blockdevs, "n22" for the
network ifindex, ... which is the unchangeable unique kernel id -- not
depending on any possible device rename, or fast destroy/re-create or
anything like that.
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel=
" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH UPDATED] block: restore multiplebd_link_disk_holder() support
am 14.01.2011 17:10:43 von Tejun Heo
Commit e09b457b (block: simplify holder symlink handling) incorrectly
assumed that there is only one link at maximum. dm may use multiple
links and expects block layer to track reference count for each link,
which is different from and unrelated to the exclusive device holder
identified by @holder when the device is opened.
Remove the single holder assumption and automatic removal of the link
and revive the per-link reference count tracking. The code
essentially behaves the same as before commit e09b457b sans the
unnecessary kobject reference count dancing.
While at it, note that this facility should not be used by anyone else
than the current ones. Sysfs symlinks shouldn't be abused like this
and the whole thing doesn't belong in the block layer at all.
Signed-off-by: Tejun Heo
Reported-by: Milan Broz
Cc: Jun'ichi Nomura
Cc: Neil Brown
Cc: linux-raid@vger.kernel.org
Cc: Kay Sievers
---
Thanks for the test commands. They were very helpful. Can you please
test this one?
Thanks.
drivers/md/dm-table.c | 1
drivers/md/md.c | 1
fs/block_dev.c | 93 ++++++++++++++++++++++++++++++++++++++++----------
include/linux/fs.h | 8 +++-
4 files changed, 84 insertions(+), 19 deletions(-)
Index: work/include/linux/fs.h
============================================================ =======
--- work.orig/include/linux/fs.h
+++ work/include/linux/fs.h
@@ -664,7 +664,7 @@ struct block_device {
int bd_holders;
bool bd_write_holder;
#ifdef CONFIG_SYSFS
- struct gendisk * bd_holder_disk; /* for sysfs slave linkng */
+ struct list_head bd_holder_disks;
#endif
struct block_device * bd_contains;
unsigned bd_block_size;
@@ -2045,12 +2045,18 @@ extern struct block_device *blkdev_get_b
extern int blkdev_put(struct block_device *bdev, fmode_t mode);
#ifdef CONFIG_SYSFS
extern int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk);
+extern void bd_unlink_disk_holder(struct block_device *bdev,
+ struct gendisk *disk);
#else
static inline int bd_link_disk_holder(struct block_device *bdev,
struct gendisk *disk)
{
return 0;
}
+static inline void bd_unlink_disk_holder(struct block_device *bdev,
+ struct gendisk *disk)
+{
+}
#endif
#endif
Index: work/drivers/md/dm-table.c
============================================================ =======
--- work.orig/drivers/md/dm-table.c
+++ work/drivers/md/dm-table.c
@@ -347,6 +347,7 @@ static void close_dev(struct dm_dev_inte
if (!d->dm_dev.bdev)
return;
+ bd_unlink_disk_holder(d->dm_dev.bdev, dm_disk(md));
blkdev_put(d->dm_dev.bdev, d->dm_dev.mode | FMODE_EXCL);
d->dm_dev.bdev = NULL;
}
Index: work/drivers/md/md.c
============================================================ =======
--- work.orig/drivers/md/md.c
+++ work/drivers/md/md.c
@@ -1907,6 +1907,7 @@ static void unbind_rdev_from_array(mdk_r
MD_BUG();
return;
}
+ bd_unlink_disk_holder(rdev->bdev, rdev->mddev->gendisk);
list_del_rcu(&rdev->same_set);
printk(KERN_INFO "md: unbind<%s>\n", bdevname(rdev->bdev,b));
rdev->mddev = NULL;
Index: work/fs/block_dev.c
============================================================ =======
--- work.orig/fs/block_dev.c
+++ work/fs/block_dev.c
@@ -426,6 +426,9 @@ static void init_once(void *foo)
mutex_init(&bdev->bd_mutex);
INIT_LIST_HEAD(&bdev->bd_inodes);
INIT_LIST_HEAD(&bdev->bd_list);
+#ifdef CONFIG_SYSFS
+ INIT_LIST_HEAD(&bdev->bd_holder_disks);
+#endif
inode_init_once(&ei->vfs_inode);
/* Initialize mutex for freeze. */
mutex_init(&bdev->bd_fsfreeze_mutex);
@@ -773,6 +776,23 @@ static struct block_device *bd_start_cla
}
#ifdef CONFIG_SYSFS
+struct bd_holder_disk {
+ struct list_head list;
+ struct gendisk *disk;
+ int refcnt;
+};
+
+static struct bd_holder_disk *bd_find_holder_disk(struct block_device *bdev,
+ struct gendisk *disk)
+{
+ struct bd_holder_disk *holder;
+
+ list_for_each_entry(holder, &bdev->bd_holder_disks, list)
+ if (holder->disk == disk)
+ return holder;
+ return NULL;
+}
+
static int add_symlink(struct kobject *from, struct kobject *to)
{
return sysfs_create_link(from, to, kobject_name(to));
@@ -788,6 +808,8 @@ static void del_symlink(struct kobject *
* @bdev: the claimed slave bdev
* @disk: the holding disk
*
+ * DON'T USE THIS UNLESS YOU'RE ALREADY USING IT.
+ *
* This functions creates the following sysfs symlinks.
*
* - from "slaves" directory of the holder @disk to the claimed @bdev
@@ -811,47 +833,83 @@ static void del_symlink(struct kobject *
*/
int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
{
+ struct bd_holder_disk *holder;
int ret = 0;
mutex_lock(&bdev->bd_mutex);
- WARN_ON_ONCE(!bdev->bd_holder || bdev->bd_holder_disk);
+ WARN_ON_ONCE(!bdev->bd_holder);
/* FIXME: remove the following once add_disk() handles errors */
if (WARN_ON(!disk->slave_dir || !bdev->bd_part->holder_dir))
goto out_unlock;
- ret = add_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
- if (ret)
+ holder = bd_find_holder_disk(bdev, disk);
+ if (holder) {
+ holder->refcnt++;
goto out_unlock;
+ }
- ret = add_symlink(bdev->bd_part->holder_dir, &disk_to_dev(disk)->kobj);
- if (ret) {
- del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
+ holder = kzalloc(sizeof(*holder), GFP_KERNEL);
+ if (!holder) {
+ ret = -ENOMEM;
goto out_unlock;
}
- bdev->bd_holder_disk = disk;
+ INIT_LIST_HEAD(&holder->list);
+ holder->disk = disk;
+ holder->refcnt = 1;
+
+ ret = add_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
+ if (ret)
+ goto out_free;
+
+ ret = add_symlink(bdev->bd_part->holder_dir, &disk_to_dev(disk)->kobj);
+ if (ret)
+ goto out_del;
+
+ list_add(&holder->list, &bdev->bd_holder_disks);
+ goto out_unlock;
+
+out_del:
+ del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
+out_free:
+ kfree(holder);
out_unlock:
mutex_unlock(&bdev->bd_mutex);
return ret;
}
EXPORT_SYMBOL_GPL(bd_link_disk_holder);
-static void bd_unlink_disk_holder(struct block_device *bdev)
+/**
+ * bd_unlink_disk_holder - destroy symlinks created by bd_link_disk_holder()
+ * @bdev: the calimed slave bdev
+ * @disk: the holding disk
+ *
+ * DON'T USE THIS UNLESS YOU'RE ALREADY USING IT.
+ *
+ * CONTEXT:
+ * Might sleep.
+ */
+void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
{
- struct gendisk *disk = bdev->bd_holder_disk;
+ struct bd_holder_disk *holder;
- bdev->bd_holder_disk = NULL;
- if (!disk)
- return;
+ mutex_lock(&bdev->bd_mutex);
- del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
- del_symlink(bdev->bd_part->holder_dir, &disk_to_dev(disk)->kobj);
+ holder = bd_find_holder_disk(bdev, disk);
+
+ if (!WARN_ON_ONCE(holder == NULL) && !--holder->refcnt) {
+ del_symlink(disk->slave_dir, &part_to_dev(bdev->bd_part)->kobj);
+ del_symlink(bdev->bd_part->holder_dir,
+ &disk_to_dev(disk)->kobj);
+ list_del_init(&holder->list);
+ kfree(holder);
+ }
+
+ mutex_unlock(&bdev->bd_mutex);
}
-#else
-static inline void bd_unlink_disk_holder(struct block_device *bdev)
-{ }
+EXPORT_SYMBOL_GPL(bd_unlink_disk_holder);
#endif
/**
@@ -1374,7 +1432,6 @@ int blkdev_put(struct block_device *bdev
* unblock evpoll if it was a write holder.
*/
if (bdev_free) {
- bd_unlink_disk_holder(bdev);
if (bdev->bd_write_holder) {
disk_unblock_events(bdev->bd_disk);
bdev->bd_write_holder = false;
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824bd_link_disk_holder+0x92/0x1ac()
am 14.01.2011 17:20:22 von Tejun Heo
Hello, Ted, Neil.
On Thu, Jan 13, 2011 at 03:41:45PM -0500, Ted Ts'o wrote:
> On Fri, Jan 14, 2011 at 07:18:58AM +1100, NeilBrown wrote:
> >
> > open(O_EXCL) will fail on a block device if it is being used by anything else
> > - a filesystem or a dm target or an md array or ....
> >
> > So if the *only* thing you want is "is this currently an active part of
> > something else", then O_EXCL works since 2.6.0 (I think).
>
> Unfortunately, that won't distinguish between a currently active file
> system, and a device which is being used by a dm target, which is what
> we want to do.
Hmmm... that's already possible with the existing holders symlinks,
right? As Kay said in another message, I don't think we can do
anything about the symlinks at this point. It already has userland
users, so we'll have to maintain them and there's no reason to create
something else for the same functionality.
It's silly that there's no way to tell whether the device is mounted
from a given block device but then again we've been working around it
by reverse mapping it till now so unless there's a new requirement
maybe it's okay as it is now.
Ideally, it would have been nice and more fitting with the whole bd
claim API if we just exported single attribute which identifies the
current holder supplied at the time of claiming (just the kernel
identifier in string), so that we can forward-map the exclusive opener
in general instead of having to reverse-map it for everything other
than md/dm.
Thank you.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824bd_link_disk_holder+0x92/0x1ac()
am 14.01.2011 17:35:20 von Tejun Heo
Hello,
On Thu, Jan 13, 2011 at 03:49:50PM +0100, Milan Broz wrote:
> So listing block device tree using ONE interface (sysfs) to check
> and display device tree for ALL subsystems seems to be good idea for
> me. In this exactly defined case.
>
> I am not saying that symlinks are perfect, just that generic
> interface here is not superfluous.
I'm not saying finding out the current owner is superflous. That
actually is useful but what's implemented now doesn't satisfy that in
any generic manner. It only works for md/dm and is designed in such
way that it can't be used out of that scope.
So, it's only useful for md/dm and for those two cases it tries to
show information which doesn't really belong there using an overly
complex mechanism.
Block device open allows _nesting_ by the claiming device. It is
never meant to track different nested claimers and it shows in the
implementation and interface. The holder tracking doesn't fit
anywhere. It's a completely separate piece of logic which doesn't mix
with anything else in the block layer.
If you consider its only user - dm-table, the whole picture is
somewhat uglier. dm-table tracks how it uses block devices but only
within a single target, so there are two separate layers of reference
counting going on. This again is caused by the above stated design
inconsistency.
As block device only considers single claimer which doesn't have to be
another device, the owner naturally should have been something which
only points to single in-kernel entity in a way which doesn't require
link to another sysfs entry.
Karel pointed out that having different methods between md and dm
would be silly; however, that is a problem which should be solved
between dm and md and actually is a quite artificial problem (or
solution). It only exists because dm and md do very similar things in
many cases. A block device can be used by many more things which are
way out of the scope which can be represented in the sysfs hierarchy.
It simply is impossible to represent that relationship with symlinks
under block device sysfs hierarchy.
Thank you.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824bd_link_disk_holder+0x92/0x1ac()
am 14.01.2011 18:59:42 von tytso
On Fri, Jan 14, 2011 at 05:20:22PM +0100, Tejun Heo wrote:
>
> Hmmm... that's already possible with the existing holders symlinks,
> right? As Kay said in another message, I don't think we can do
> anything about the symlinks at this point. It already has userland
> users, so we'll have to maintain them and there's no reason to create
> something else for the same functionality.
Yes, agreed. My point was that if we have a better system, we could
migrate blkid and other users to new interface, and maybe in a few
years, we can deprecate that interface. So we do have to maintain
them for at least the medium term, but that shouldn't stop us from
divising a better interface, and then gradually migrating systems to
use that newer and better interface.
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [dm-devel] linux-next - WARNING: at fs/block_dev.c:824 bd_link_disk_holder+0x92/0x1ac()
am 14.01.2011 19:23:40 von Tejun Heo
Hello, Ted.
On Fri, Jan 14, 2011 at 6:59 PM, Ted Ts'o wrote:
> Yes, agreed. =A0My point was that if we have a better system, we coul=
d
> migrate blkid and other users to new interface, and maybe in a few
> years, we can deprecate that interface. =A0So we do have to maintain
> them for at least the medium term, but that shouldn't stop us from
> divising a better interface, and then gradually migrating systems to
> use that newer and better interface.
Oh yeah, fully agreed there and the implementation at the block layer
isn't even gonna be difficult. Just a single attribute which contains
a string which can be specified at the time of claiming. We probably
can introduce a new interface where the @holder is always a string the
content of which should uniquely identify the holder in human/machine
readable form and add a few helpers to format those strings so that
the same type of usages use consistently formatted strings. For
nested devices, kdev should do. For filesystems, the mount point
maybe (it should be something which identifies it absolutely)? For
specific programs (cd burner, whatnot), pid and so on.
The only question is, would that be actually necessary? By now, we
already have mostly working reverse mapping for most things which
matter through blkid, fs information, fd listing under proc and so on.
They sure are ugly and probably unreliable at times but is it
pressing enough to introduce new interface and migrate things over or
would we be just creating churns? I don't really know how these
things look from userland so am genuinely unsure what the answer is.
Thanks.
--=20
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH UPDATED] block: restore multiple bd_link_disk_holder()support
am 14.01.2011 22:09:12 von Milan Broz
On 01/14/2011 05:10 PM, Tejun Heo wrote:
> Commit e09b457b (block: simplify holder symlink handling) incorrectly
> assumed that there is only one link at maximum. dm may use multiple
> links and expects block layer to track reference count for each link,
> which is different from and unrelated to the exclusive device holder
> identified by @holder when the device is opened.
>
> Remove the single holder assumption and automatic removal of the link
> and revive the per-link reference count tracking. The code
> essentially behaves the same as before commit e09b457b sans the
> unnecessary kobject reference count dancing.
>
> While at it, note that this facility should not be used by anyone else
> than the current ones. Sysfs symlinks shouldn't be abused like this
> and the whole thing doesn't belong in the block layer at all.
>
> Signed-off-by: Tejun Heo
> Reported-by: Milan Broz
> Cc: Jun'ichi Nomura
> Cc: Neil Brown
> Cc: linux-raid@vger.kernel.org
> Cc: Kay Sievers
> ---
> Thanks for the test commands. They were very helpful. Can you please
> test this one?
Hi,
yes, this one works for me. I run full lvm2 testsuite and no warnings.
Thanks!
Tested-by: Milan Broz
Milan
Re: [PATCH UPDATED] block: restore multiple bd_link_disk_holder()support
am 17.01.2011 01:18:04 von j-nomura
On 01/15/11 06:09, Milan Broz wrote:
> On 01/14/2011 05:10 PM, Tejun Heo wrote:
>> Commit e09b457b (block: simplify holder symlink handling) incorrectly
>> assumed that there is only one link at maximum. dm may use multiple
>> links and expects block layer to track reference count for each link,
>> which is different from and unrelated to the exclusive device holder
>> identified by @holder when the device is opened.
>>
>> Remove the single holder assumption and automatic removal of the link
>> and revive the per-link reference count tracking. The code
>> essentially behaves the same as before commit e09b457b sans the
>> unnecessary kobject reference count dancing.
>>
>> While at it, note that this facility should not be used by anyone else
>> than the current ones. Sysfs symlinks shouldn't be abused like this
>> and the whole thing doesn't belong in the block layer at all.
>>
>> Signed-off-by: Tejun Heo
>> Reported-by: Milan Broz
>> Cc: Jun'ichi Nomura
>> Cc: Neil Brown
>> Cc: linux-raid@vger.kernel.org
>> Cc: Kay Sievers
>> ---
>> Thanks for the test commands. They were very helpful. Can you please
>> test this one?
>
> Hi,
>
> yes, this one works for me. I run full lvm2 testsuite and no warnings.
> Thanks!
>
> Tested-by: Milan Broz
Thanks Tejun, Milan!
If it passed my quick test and the lvm2 testsuite,
I have nothing to add here.
And the code looks ok, too.
--
Jun'ichi Nomura, NEC Corporation