Killing subshells after predefined timespan

Killing subshells after predefined timespan

am 19.09.2007 12:39:14 von abgrund

Hello there,

I need to kill a bash script after a given timespan and it am
currently still having problems with it.
This is an excerpt of a script supposed to be used as Nagios[1] plugin
for monitoring a MySQL server through an SSH tunnel.


THREAD_COUNT=3D$(\
/usr/bin/ssh $SSH_HOST /usr/bin/mysql \
-h$MYSQL_SERVER -u$MYSQL_LOGIN \
-p$MYSQL_PASS -e "'SHOW STATUS;'" | \
/usr/bin/awk '/Threads_connected/ { print $2 }' ) &
sleep $TIMEOUT
if [ "$(jobs | grep ssh)" ]
then
STATE=3D"TIMEOUT_EXCEEDED"
fi

Now, I have two problems. Firstly, for some reason the subprocess is
always being terminated with no result when I leave the `&' at the end
of the $(ssh ... ) command, but it works fine in far less than two
seconds when I leave the `&' away. But of course in this case it
wouldn't terminate even if the $(ssh ... ) command hung up.

My second problem is that I need a way to terminate at once when the $
(ssh ... ) command is done, not only when the $TIMEOUT time has been
exceeded.

Could anyone give me advice or point me to an example where this is
handled better?

Thanks,
Björn

[1] Nagios is a system monitoring service. See: http://www.nagios.org/

Re: Killing subshells after predefined timespan

am 19.09.2007 16:37:27 von cfajohnson

On 2007-09-19, Björn Keil wrote:
> Hello there,
>
> I need to kill a bash script after a given timespan and it am
> currently still having problems with it.
> This is an excerpt of a script supposed to be used as Nagios[1] plugin
> for monitoring a MySQL server through an SSH tunnel.
>
>
> THREAD_COUNT=$(\
> /usr/bin/ssh $SSH_HOST /usr/bin/mysql \
> -h$MYSQL_SERVER -u$MYSQL_LOGIN \
> -p$MYSQL_PASS -e "'SHOW STATUS;'" | \
> /usr/bin/awk '/Threads_connected/ { print $2 }' ) &
> sleep $TIMEOUT
> if [ "$(jobs | grep ssh)" ]
> then
> STATE="TIMEOUT_EXCEEDED"
> fi
>
> Now, I have two problems. Firstly, for some reason the subprocess is
> always being terminated with no result when I leave the `&' at the end
> of the $(ssh ... ) command, but it works fine in far less than two
> seconds when I leave the `&' away. But of course in this case it
> wouldn't terminate even if the $(ssh ... ) command hung up.

There can be no result in THREAD_COUNT because background processes
are run in a subshell and cannot affect the calling shell.

--
Chris F.A. Johnson, author
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence

Re: Killing subshells after predefined timespan

am 19.09.2007 20:11:13 von Miles

On Sep 19, 5:39 am, Björn Keil wrote:
> Hello there,
>
> I need to kill a bash script after a given timespan and it am
> currently still having problems with it.
> This is an excerpt of a script supposed to be used as Nagios[1] plugin
> for monitoring a MySQL server through an SSH tunnel.
>
> THREAD_COUNT=3D$(\
> /usr/bin/ssh $SSH_HOST /usr/bin/mysql \
> -h$MYSQL_SERVER -u$MYSQL_LOGIN \
> -p$MYSQL_PASS -e "'SHOW STATUS;'" | \
> /usr/bin/awk '/Threads_connected/ { print $2 }' ) &
> sleep $TIMEOUT
> if [ "$(jobs | grep ssh)" ]
> then
> STATE=3D"TIMEOUT_EXCEEDED"
> fi
>
> Now, I have two problems. Firstly, for some reason the subprocess is
> always being terminated with no result when I leave the `&' at the end
> of the $(ssh ... ) command, but it works fine in far less than two
> seconds when I leave the `&' away. But of course in this case it
> wouldn't terminate even if the $(ssh ... ) command hung up.
>
> My second problem is that I need a way to terminate at once when the $
> (ssh ... ) command is done, not only when the $TIMEOUT time has been
> exceeded.
>
> Could anyone give me advice or point me to an example where this is
> handled better?
>
> Thanks,
> Björn
>
> [1] Nagios is a system monitoring service. See:http://www.nagios.org/

I don't know about Nagios, but I have a process to monitor the length
of time that a job runs.

First, I have a script (run_job) that runs all my jobs. It is a
wrapper.
I have another script (timeout_job) that monitors all jobs started by
run_job. One copy of timeout_job per job.

So run_job will start a timeout_job and then the real job.
I know this won't paste nicely, but it shows how the processes are
linked:

root@unxd1:/home/unxsa/bin>pgrep 2793472
UID PID PPID STIME TTY TIME CMD
root 2793472 413744 0 00:01:00 - 0:00 /usr/bin/ksh /home/
unxsa/bin/run_job -d 1450 -m -c /home/unxsa/bin/pg_spc_log 24 4
root 2797576 2793472 0 00:01:12 - 0:00 /usr/bin/ksh /home/
unxsa/bin/pg_spc_log 24 4
root 3203304 2793472 0 00:01:12 - 0:00 /usr/bin/ksh /home/
unxsa/bin/timeout_job.AIX 1450 2793472 root /home/unxsa/bin/pg_spc_log
dba

root@unxd1:/home/unxsa/bin>pstree 2793472
-+- 2793472 root /usr/bin/ksh /home/unxsa/bin/run_job -d 1450 -m -c /
home/unxsa/bin/pg_spc_log 24 4
|-+- 2797576 root /usr/bin/ksh /home/unxsa/bin/pg_spc_log 24 4
| \--- 3072074 root sleep 900
\-+- 3203304 root /usr/bin/ksh /home/unxsa/bin/timeout_job.AIX 1450
2793472 root /home/unxsa/bin/pg_spc_log dba
\--- 3211298 root sleep 87000

Run_job was started with a duration (-d ) of 1450, this gets passed to
timeout_job.
Timeout_job also gets the PPID to monitor, 2793472.
(timeout_job gets some other information, the user, the command being
run and a "group")
You can see the sleep 87000 (1450*60), spawned by timeout_job.

The role of timeout_job is to sleep (for 87000 seconds in this case),
then check for a process with a PPID of 2793472. This is pretty easy
to code. You could modify a timeout_job like script to simply kill the
correspodning PID.

run_job could be pretty simple, the meat of it works like:
$TIMEOUT_JOB $DURATION $$ $NOTIFYVALUE $COMMAND $USERGROUP 2>&1 &
$COMMAND $ARGUMENTS 1>>$OUTFIL 2>$STANDARD_ERROR_MESSAGES_FILE

timeout_job works like:
############################################################ ###############=
########
# Sleep duration then check if job is still alive
############################################################ ###############=
########
let "SLEEPDUR =3D $DURATION * 60" # convert minutes to seconds
sleep $SLEEPDUR

############################################################ ###############=
########
# Loop and alert if job doesn't finish
############################################################ ###############=
########
ps w $JOB >$FILE

$JOB is the PPID. I send it to a file to make sure the command is the
same, ie. that the same PID isn't a new command.

The rest of my script sends emails every 20 minutes if the job runs
long, but you could just kill it.

I have versions of these two scripts that run on AIX, HP-UX and Linux.

Miles

Re: Killing subshells after predefined timespan

am 21.09.2007 11:00:55 von SiKing

Björn Keil wrote:
> I need to kill a bash script after a given timespan and it am
> currently still having problems with it.
>
> Could anyone give me advice or point me to an example where this is
> handled better?

I have posted something like this a while back. Did not get any feedback on it
from the group, so I don't actually know if it is any good. :) Here are some
links, all (should) point to the same thing:




SiKing.


--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GE d+(-) s+: a@ C+ ULAHS++$ P- L+>++ E--- W++ N++ o !K w--(+) O- M?>+ V? PS+
PE+(++) Y+ PGP- t+ 5 X R !tv b+ DI(+) D G e++ h---- r+++@ y++++
------END GEEK CODE BLOCK------

Re: Killing subshells after predefined timespan

am 21.09.2007 20:01:57 von Cyrus Kriticos

Björn Keil wrote:
>
> I need to kill a bash script after a given timespan
> [...]
>
> Could anyone give me advice or point me to an example where this is
> handled better?


$ wget -O /tmp/timeout.c \
http://www.terraluna.org/cgi-bin/cvsweb.cgi/satan/src/misc/t imeout.c

$ gcc /tmp/timeout.c -o /usr/local/bin/timeout


then:

$ timeout 10 your_command

--
Best regards | "The only way to really learn scripting is to write
Cyrus | scripts." -- Advanced Bash-Scripting Guide

Re: Killing subshells after predefined timespan

am 24.09.2007 00:07:56 von keeling

Cyrus Kriticos :
> Björn Keil wrote:
> >
> > I need to kill a bash script after a given timespan
> > [...]
> >
> > Could anyone give me advice or point me to an example where this is
> > handled better?
>
>
> $ wget -O /tmp/timeout.c \
> http://www.terraluna.org/cgi-bin/cvsweb.cgi/satan/src/misc/t imeout.c
>
> $ gcc /tmp/timeout.c -o /usr/local/bin/timeout

Does not compile (Debian Etch). Add "#include " (man 7posix wait.h).

> $ timeout 10 your_command



--
Any technology distinguishable from magic is insufficiently advanced.
(*) http://blinkynet.net/comp/uip5.html Linux Counter #80292
- - http://www.faqs.org/rfcs/rfc1855.html Please, don't Cc: me.