systemd kills mdmon if it was started manually by user
am 04.12.2010 09:41:26
If user starts array manually (mdadm -A -s as example) from within
user session and array needs mdmon, mdmon becomes part of user session
control group:
It is then killed by systemd during shutdown as part of user session.
It results in dirty array on next boot.
Is there any magic that allows daemon to be exempted from killing?
Re: systemd kills mdmon if it was started manually
am 04.12.2010 10:12:23
Re: systemd kills mdmon if it was started manually
am 04.12.2010 13:08:05
Re: [systemd-devel] systemd kills mdmon if it was started manually
am 12.12.2010 14:20:28
On Sat, Dec 04, 2010 at 03:08:05PM +0300, Andrey Borzenkov wrote:
>mdmon does not belong to user. User is not even aware that it is
>started. And it is likely not the last case. So systemd does need some
>framework which can move such processes out of user session. It
>probably needs some sd_daemon API to notify systemd that it is system
>level task even if it was started as result of user interaction.
what about running mdmon --all --takeover outside of user context at
shutdown, it should replace all mdmon processes with new ones that won't
be killed when user sessions are being closed?
Luca Berra --
Re: systemd kills mdmon if it was started manuallyby user
am 07.01.2011 01:38:27
Re: systemd kills mdmon if it was started manuallyby user
am 07.01.2011 01:40:24
On Sat, 04.12.10 15:08, Andrey Borzenkov ( wrote:
> >> It is then killed by systemd during shutdown as part of user session.
> >> It results in dirty array on next boot.
> >>
> >> Is there any magic that allows daemon to be exempted from killing?
> >
> > While your raid should absolutely not be corrupted on next reboot
> > when mdmon receives a SIGTERM,
> This won't be corrupted but it will initiate rebuilt. I have reports
> that such rebuild may take hours, costing performance and loss of
> redundancy.
Well, eventually we need to be able to kill mdmon. Otherwise we might
not be able to remount the root dir r/o. How exactly is mdmon supposed
to behave on shutdown?
Lennart Poettering - Red Hat, Inc.
Re: [systemd-devel] systemd kills mdmon if it was started manually by user
am 07.01.2011 02:09:32
2011/1/7 Lennart Poettering :
> Well, I have been discussing this with Kay and we'll most likely add
> something like DontKillOnShutdown=yes or so, which if added to a unit
Make that KillOnShutdown=no, please.
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?
Re: [systemd-devel] systemd kills mdmon if it was started manuallyby user
am 07.01.2011 02:16:28
On Fri, 7 Jan 2011 01:38:27 +0100 Lennart Poettering
> On Sat, 04.12.10 11:41, Andrey Borzenkov ( wrote:
> > If user starts array manually (mdadm -A -s as example) from within
> > user session and array needs mdmon, mdmon becomes part of user session
> > control group:
> Are you suggesting that mdadm forks off mdmon from within the user
> session? This is horribly ugly and broken and they shouldn't do that.
What alternative would you suggest?
A daemon needs to be running while certain md arrays are running and writable.
Re: [systemd-devel] systemd kills mdmon if it was started manuallyby user
am 07.01.2011 02:17:46
On Fri, 7 Jan 2011 02:09:32 +0100
Michael Biebl wrote:
> 2011/1/7 Lennart Poettering :
> >
> > Well, I have been discussing this with Kay and we'll most likely add
> > something like DontKillOnShutdown=3Dyes or so, which if added to a unit
> Make that KillOnShutdown=3Dno, please.
Agreed :) That reminds me of "hal-disable-polling --enable-polling"
( )
With respect,
Re: [systemd-devel] systemd kills mdmon if it was started manually by user
am 07.01.2011 02:42:01
On Fri, 07.01.11 12:16, NeilBrown ( wrote:
> On Fri, 7 Jan 2011 01:38:27 +0100 Lennart Poettering
> wrote:
> > On Sat, 04.12.10 11:41, Andrey Borzenkov ( wrote:
> >
> > > If user starts array manually (mdadm -A -s as example) from within
> > > user session and array needs mdmon, mdmon becomes part of user session
> > > control group:
> >
> > Are you suggesting that mdadm forks off mdmon from within the user
> > session? This is horribly ugly and broken and they shouldn't do that.
> What alternative would you suggest?
Start it as a normal service like any other. But if you fork off the
daemon from the user session then the daemon will run in a very broken
context: the resource limits of the user apply, the audit trail will
point to the user (i.e. /proc/self/loginuid), the cgroup will be of the
user, the daemon cannot be supervised as every other daemon. Also, the
daemon will inherit all the other process properties from the user,
which is almost definitely wrong. i.e. the env block and so
on, the sig mask. gazillions of small little properties. Of course, a
big bunch of them you can reset in your code, but that's a race you
cannot win: the kernel adds new process properties all the time, and
you'd have to reset them manually.
It's is really essential that daemons are started from a clean process
environment, and are detached from the user session. SysV kinda provides
that, for everything started on boot and in a limited way for stuff
started via /sbin/service. systemd provides that too and much more
correct. But just forking off things just like that is not a good
A thinkable, relatively simple solution in a systemd world is to pull in
the mdmon service from the udev device. The udev device would do all the
necessary matching to figure out whether mdmon is needed or not. If you
care about non-systemd environments something like this of course
becomes a lot more complex.
> A daemon needs to be running while certain md arrays are running and writable.
Well, but auto-spawning it from the user session is not really a usable solution.
Lennart Poettering - Red Hat, Inc.
Re: systemd kills mdmon if it was started manually
am 04.02.2011 20:55:06
am 08.02.2011 10:48:43
On Fri, 04.02.11 22:55, Andrey Borzenkov ( wrote:
> >> That's right, but the names are not known in advance and can change
> >> between reboots. This means such units have to be generated
> >> dynamically, exist until reboot (ramfs?) and be removed when array is
> >> destroyed. Not sure it is really manageable.
> >
> > Hmm? It should be sufficient to just write the service template properly
> > ("mdmon@.service") and then instantiate it when needed with "systemctl
> > start mdmon@xyz.service" or something equivalent. itMs a matter of
> > issuing a single dbus call.
> >
> >> And which instance should generate them? mdadm?
> >
> > i think it is much nicer to spawn the necessary mdadm service instance
> > from a udev rule,
> Yes, this can be done relatively easily; as proof of concept:
> SUBSYSTEM!="block", GOTO="systemd_md_end"
> ACTION!="change", GOTO="systemd_md_end"
> KERNEL!="md*", GOTO="systemd_md_end"
> ATTR{md/metadata_version}=="external:[A-Za-z]*", RUN+="/bin/systemctl
> start mdmon@%k.service"
> LABEL="systemd_md_end"
Nah, it's much better to simply use the SYSTEMD_WANTS var on the device.
Something like this:
....., ENV{SYSTEMD_WANTS}="mdmon@%k.service"
That way the device unit will simply have a wants dep on the service
unit, and this is prefectly discoverable.
> Setting SYSTEMD_WANTS would be more elegant solution, but it does not
> work with current systemd implementation. It is capable of starting
> requested units only on "add" event (effectively the very first time
> device becomes plugged), while mdmon must be started on "change"
> event, as only then we know whether mdmon is required at all.
Oha, so you are actually aware of SYSTEMD_WANTS. Hmm. I need to think
about this. Why does md employ the change event? Is this really
necessary, smells a bit foul.
> Running mdmon via systemd in this way opens up interesting
> possibility. E.g. service could be declared "immortal" and be exempt
> from usual shutdown sequence ... or is it possible to do already?
A service needs to conflict with to be shut down when we
go down normally. If your service does not conflict with
then it will stay around and be killed only after systemd is gone and
PID1 is systemd-shutdown which then kills all processes remaining
(independent of any idea of "service") and the unmounts all file
systems. Normally all services conflict with implicitly,
which you can turn off by setting DefaultDependencies=.
> Actually it can be implemented even without mdadm patches; apparently
> it is possible to suppress normal starting of mdmon by setting
A this point mdmon is simply broken: if glibc or mdmon itself (or any
lib it is using) is upgraded, then mdmon will keep referencing the old or binary as long as it is running. This means that the fs these
files are on cannot be remounted r/o. However mdmon insists on being
shutdown only after all fs got remounted ro. So you have a cyclic
ordering loop here: mdmon wants to be shut down after the remount, but
we need to shut it down before the remount.
This is unfixable unless a) mdmon learns reexecution of itself without
losing state (like most init systems so), or b) mdmon would stop
insisting on being shutdown only after the remount.
In my eyes b) is very much preferebale: It should be possible to shut
down mdmon like any other service. And if then some md related code
still needs to be run on late shutdown this should be done from a new
process. I would be willing to add some hooks for this, so that we can
execute arbitrary drop-in processes as part of the final shutdown loop.
Lennart Poettering - Red Hat, Inc.
Re: systemd kills mdmon if it was started manually
am 08.02.2011 11:52:41
On Tue, Feb 8, 2011 at 12:48 PM, Lennart Poettering
> On Fri, 04.02.11 22:55, Andrey Borzenkov ( wrote:
>> >> That's right, but the names are not known in advance and can change
>> >> between reboots. This means such units have to be generated
>> >> dynamically, exist until reboot (ramfs?) and be removed when array is
>> >> destroyed. Not sure it is really manageable.
>> >
>> > Hmm? It should be sufficient to just write the service template properly
>> > ("mdmon@.service") and then instantiate it when needed with "systemctl
>> > start mdmon@xyz.service" or something equivalent. itMs a matter of
>> > issuing a single dbus call.
>> >
>> >> And which instance should generate them? mdadm?
>> >
>> > i think it is much nicer to spawn the necessary mdadm service instance
>> > from a udev rule,
>> Yes, this can be done relatively easily; as proof of concept:
>> SUBSYSTEM!="block", GOTO="systemd_md_end"
>> ACTION!="change", GOTO="systemd_md_end"
>> KERNEL!="md*", GOTO="systemd_md_end"
>> ATTR{md/metadata_version}=="external:[A-Za-z]*", RUN+="/bin/systemctl
>> start mdmon@%k.service"
>> LABEL="systemd_md_end"
> Nah, it's much better to simply use the SYSTEMD_WANTS var on the device.
> Something like this:
> ...., ENV{SYSTEMD_WANTS}="mdmon@%k.service"
> That way the device unit will simply have a wants dep on the service
> unit, and this is prefectly discoverable.
>> Setting SYSTEMD_WANTS would be more elegant solution, but it does not
>> work with current systemd implementation. It is capable of starting
>> requested units only on "add" event (effectively the very first time
>> device becomes plugged), while mdmon must be started on "change"
>> event, as only then we know whether mdmon is required at all.
> Oha, so you are actually aware of SYSTEMD_WANTS. Hmm. I need to think
> about this. Why does md employ the change event? Is this really
> necessary, smells a bit foul.
I am probably the wrong one to ask, but here is what happens when
array is started (from udev perspective)
UDEV [1297507039.109828] add /devices/virtual/block/md127 (block)
After this event device goes "plugged" and SYSTEMD_WANTS (if any) are
triggered. But at this point we have zero information about array to
decide anything.
UDEV [1297507039.211940] change /devices/virtual/block/md127 (block)
DEVLINKS=/dev/disk/by-id/md-uuid-f8362f39:0436b20f:cf338104: afec436e
At this point we know it is container, know that it has external
metadata and know that we need external metadata handler (mdmon). But
it is too late for systemd.
>> Actually it can be implemented even without mdadm patches; apparently
>> it is possible to suppress normal starting of mdmon by setting
> A this point mdmon is simply broken: if glibc or mdmon itself (or any
> lib it is using) is upgraded, then mdmon will keep referencing the old
> .so or binary as long as it is running. This means that the fs these
> files are on cannot be remounted r/o. However mdmon insists on being
> shutdown only after all fs got remounted ro. So you have a cyclic
> ordering loop here: mdmon wants to be shut down after the remount, but
> we need to shut it down before the remount.
Ehh ...
a) mdmon is perfectly capable of restarting, it is already used to
take over mdmon launched in initrd. The problem is to know when to
restart - i.e. when respective libraries are changed. This is a job
for package management in distribution. It is already employed for
glibc, systemd and some others and can just as well be employed for
mdmon. And this is totally unrelated to systemd :)
b) having binary launched off some fs should not prevent this fs to be
remountd ro - binaries are not opened rw
> This is unfixable unless a) mdmon learns reexecution of itself without
> losing state (like most init systems so), or b) mdmon would stop
> insisting on being shutdown only after the remount.
As far as I can tell, both is true today; but remounting is not
enough, unfortunately.
> In my eyes b) is very much preferebale: It should be possible to shut
> down mdmon like any other service. And if then some md related code
> still needs to be run on late shutdown this should be done from a new
> process. I would be willing to add some hooks for this, so that we can
> execute arbitrary drop-in processes as part of the final shutdown loop.
mdmon is needed to ensure metadata were correctly updated. So it needs
to exist as long as metadata *may* be updated. For practical purposes
it means - until file system is unmounted and flushed to disks. I am
not sure that remounting ro stops all activity (at least, mounting ro
definitely *writes* to device using some filesystems).
Re: systemd kills mdmon if it was started manuallyby user
am 08.02.2011 12:07:31
On Tue, 08.02.11 13:52, Andrey Borzenkov ( wrote:
> I am probably the wrong one to ask, but here is what happens when
> array is started (from udev perspective)
> After this event device goes "plugged" and SYSTEMD_WANTS (if any) are
> triggered. But at this point we have zero information about array to
> decide anything.
> At this point we know it is container, know that it has external
> metadata and know that we need external metadata handler (mdmon). But
> it is too late for systemd.
Kay, do you know why this "change" event is used here? Any chance we can
get rid of it?
> >
> >> Actually it can be implemented even without mdadm patches; apparently
> >> it is possible to suppress normal starting of mdmon by setting
> >
> > A this point mdmon is simply broken: if glibc or mdmon itself (or any
> > lib it is using) is upgraded, then mdmon will keep referencing the old
> > .so or binary as long as it is running. This means that the fs these
> > files are on cannot be remounted r/o. However mdmon insists on being
> > shutdown only after all fs got remounted ro. So you have a cyclic
> > ordering loop here: mdmon wants to be shut down after the remount, but
> > we need to shut it down before the remount.
> >
> Ehh ...
> a) mdmon is perfectly capable of restarting, it is already used to
> take over mdmon launched in initrd. The problem is to know when to
> restart - i.e. when respective libraries are changed. This is a job
> for package management in distribution. It is already employed for
> glibc, systemd and some others and can just as well be employed for
> mdmon. And this is totally unrelated to systemd :)
Really, you are sying there is a synchronous way to make mdmon reexec
itself? How does that work?
> b) having binary launched off some fs should not prevent this fs to be
> remountd ro - binaries are not opened rw
If you run a binary and then the package manager replaces it then the
running instance will still refer to the old copy and this will have the
effect that the file isn't actually deleted until the proces
exits/execs. And because that is the way it is the kernel will refuse
unmounting of the fs until you terminated/reexeced your process.
> > This is unfixable unless a) mdmon learns reexecution of itself without
> > losing state (like most init systems so), or b) mdmon would stop
> > insisting on being shutdown only after the remount.
> As far as I can tell, both is true today; but remounting is not
> enough, unfortunately.
So, you are saying we can shut down mdmon without ill effects early?
> > In my eyes b) is very much preferebale: It should be possible to shut
> > down mdmon like any other service. And if then some md related code
> > still needs to be run on late shutdown this should be done from a new
> > process. I would be willing to add some hooks for this, so that we can
> > execute arbitrary drop-in processes as part of the final shutdown loop.
> mdmon is needed to ensure metadata were correctly updated. So it needs
> to exist as long as metadata *may* be updated. For practical purposes
> it means - until file system is unmounted and flushed to disks. I am
> not sure that remounting ro stops all activity (at least, mounting ro
> definitely *writes* to device using some filesystems).
Well, the root file systems cannot be unmounted, only remounted.
So, is there a way to invoke mdmon so that it flushes all metadata
changes to disk and immediately terminates then this should be all we
need for a clean solution. We'd then shutdown the normal instances of
mdmon down like any other daemon and simply invoke this metadata
flushing command as part of late shutdown.
Lennart Poettering - Red Hat, Inc.
Re: systemd kills mdmon if it was started manually
am 08.02.2011 14:54:03
On Tue, Feb 8, 2011 at 2:07 PM, Lennart Poettering
> On Tue, 08.02.11 13:52, Andrey Borzenkov ( wrote:
>> I am probably the wrong one to ask, but here is what happens when
>> array is started (from udev perspective)
> [...]
>> After this event device goes "plugged" and SYSTEMD_WANTS (if any) are
>> triggered. But at this point we have zero information about array to
>> decide anything.
> [...]
>> At this point we know it is container, know that it has external
>> metadata and know that we need external metadata handler (mdmon). But
>> it is too late for systemd.
> Kay, do you know why this "change" event is used here? Any chance we can
> get rid of it?
>> >
>> >> Actually it can be implemented even without mdadm patches; apparently
>> >> it is possible to suppress normal starting of mdmon by setting
>> >
>> > A this point mdmon is simply broken: if glibc or mdmon itself (or any
>> > lib it is using) is upgraded, then mdmon will keep referencing the old
>> > .so or binary as long as it is running. This means that the fs these
>> > files are on cannot be remounted r/o. However mdmon insists on being
>> > shutdown only after all fs got remounted ro. So you have a cyclic
>> > ordering loop here: mdmon wants to be shut down after the remount, but
>> > we need to shut it down before the remount.
>> >
>> Ehh ...
>> a) mdmon is perfectly capable of restarting, it is already used to
>> take over mdmon launched in initrd. The problem is to know when to
>> restart - i.e. when respective libraries are changed. This is a job
>> for package management in distribution. It is already employed for
>> glibc, systemd and some others and can just as well be employed for
>> mdmon. And this is totally unrelated to systemd :)
> Really, you are sying there is a synchronous way to make mdmon reexec
> itself? How does that work?
I am not sure whether it qualifies as synchronous, but "mdmon
--takeover" will kill any existing mdmon for this and start monitoring
>> b) having binary launched off some fs should not prevent this fs to be
>> remountd ro - binaries are not opened rw
> If you run a binary and then the package manager replaces it then the
> running instance will still refer to the old copy and this will have the
> effect that the file isn't actually deleted until the proces
> exits/execs. And because that is the way it is the kernel will refuse
> unmounting of the fs until you terminated/reexeced your process.
>> > This is unfixable unless a) mdmon learns reexecution of itself without
>> > losing state (like most init systems so), or b) mdmon would stop
>> > insisting on being shutdown only after the remount.
>> As far as I can tell, both is true today; but remounting is not
>> enough, unfortunately.
> So, you are saying we can shut down mdmon without ill effects early?
At least that's what I see. You can shutdown mdmon and continue to
work with file system, even if it is mounted rw. Under some conditions
mount will hang; i.e.
start array
kill mdmon
try to mount
mount will hang. If you start mdmon, it is mounted. But if you now
kill mdmon
it is mounted just fine.
>> > In my eyes b) is very much preferebale: It should be possible to shut
>> > down mdmon like any other service. And if then some md related code
>> > still needs to be run on late shutdown this should be done from a new
>> > process. I would be willing to add some hooks for this, so that we can
>> > execute arbitrary drop-in processes as part of the final shutdown loop.
>> mdmon is needed to ensure metadata were correctly updated. So it needs
>> to exist as long as metadata *may* be updated. For practical purposes
>> it means - until file system is unmounted and flushed to disks. I am
>> not sure that remounting ro stops all activity (at least, mounting ro
>> definitely *writes* to device using some filesystems).
> Well, the root file systems cannot be unmounted, only remounted.
> So, is there a way to invoke mdmon so that it flushes all metadata
> changes to disk and immediately terminates then this should be all we
> need for a clean solution. We'd then shutdown the normal instances of
> mdmon down like any other daemon and simply invoke this metadata
> flushing command as part of late shutdown.
Hmm ... it looks like you just need to
start mdmon
do mdadm --wait-clean
After this you can kill mdmon again (assuming decide is no more in use).
Re: [systemd-devel] systemd kills mdmon if it was started manuallyby user
am 08.02.2011 18:28:22
On Tue, 08.02.11 16:54, Andrey Borzenkov ( wrote:
> >> a) mdmon is perfectly capable of restarting, it is already used to
> >> take over mdmon launched in initrd. The problem is to know when to
> >> restart - i.e. when respective libraries are changed. This is a job
> >> for package management in distribution. It is already employed for
> >> glibc, systemd and some others and can just as well be employed for
> >> mdmon. And this is totally unrelated to systemd :)
> >
> > Really, you are sying there is a synchronous way to make mdmon reexec
> > itself? How does that work?
> >
> I am not sure whether it qualifies as synchronous, but "mdmon
> --takeover" will kill any existing mdmon for this and start monitoring
> itself.
I wonder if this is really fully synchronous, i.e. that a) there is no
point in time where mdmon is not running during this restart and b) the
mdmom --takeover command returns when the new daemon is fully up, and
not right-away.
> > Well, the root file systems cannot be unmounted, only remounted.
> >
> > So, is there a way to invoke mdmon so that it flushes all metadata
> > changes to disk and immediately terminates then this should be all we
> > need for a clean solution. We'd then shutdown the normal instances of
> > mdmon down like any other daemon and simply invoke this metadata
> > flushing command as part of late shutdown.
> Hmm ... it looks like you just need to
> start mdmon
> do mdadm --wait-clean
> After this you can kill mdmon again (assuming decide is no more in
> use).
Well, it would be nice if the md utils would offer something doing this
without spawning multiple processes and killing them again.
Lennart Poettering - Red Hat, Inc.
Re: [systemd-devel] systemd kills mdmon if it was started manuallyby user
am 09.02.2011 15:01:39
On Tue, 08.02.11 12:07, Lennart Poettering ( wrote:
> > At this point we know it is container, know that it has external
> > metadata and know that we need external metadata handler (mdmon). But
> > it is too late for systemd.
> Kay, do you know why this "change" event is used here? Any chance we can
> get rid of it?
So, it seems that the "change" event does make some sense here. I have
now added a new property to systemd: if you set SYSTEMD_READY=0 on a
udev device then systemd will consider it unplugged even if it shows up
in the udev tree. If this property is not set for a device, or is set to
1 we will conisder the device plugged.
To make this md stuff compatible with systemd we hence just need to set
SYSTEMD_READY=0 during the "new" event and drop it when the device is
fully set up.
Andrey, since you are playing around with this, do you happen to know
which attribute we should check to set SYSTEMD_READY=0 properly? It
would be cool if we could come up with a default rule for inclusion in
our systemd rules file that will ensure the device only shows up when it
is ready.
Lennart Poettering - Red Hat, Inc.
