Ticket 1001 - --signal=KILL@120 vs --signal=TERM@120
Summary: --signal=KILL@120 vs --signal=TERM@120
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other tickets)
Version: 14.03.6
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: David Bigagli
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2014-07-29 11:03 MDT by Levi Morrison
Modified: 2014-08-07 07:31 MDT (History)
2 users (show)

See Also:
Site: BYU - Brigham Young University
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 14.11.0pre4
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
catchchild program (1.02 KB, patch)
2014-08-05 07:49 MDT, David Bigagli
Details | Diff
diff for the --signal sbatch and srun command (6.60 KB, patch)
2014-08-07 07:31 MDT, David Bigagli
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Levi Morrison 2014-07-29 11:03:14 MDT
As clarified in bug 333, signals that are not prefixed with B: should only go to job steps. However, when you use --signal=KILL@120 it will kill the job and job steps, instead of only the steps.


Compare the output of this:
#!/bin/bash
#SBATCH -t00:04:00 --signal=KILL@120

srun sleep 1h
echo "Exited with $?"


With this:
#!/bin/bash
#SBATCH -t00:04:00 --signal=TERM@120

srun sleep 1h
echo "Exited with $?"


Output from KILL version:
----------
slurmstepd: *** STEP 4296274.0 CANCELLED AT 2014-07-29T16:50:20 DUE TO TIME LIMIT ***
slurmstepd: *** JOB 4296274 CANCELLED AT 2014-07-29T16:50:20 DUE TO TIME LIMIT ***
srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
srun: error: m6-17-16: task 0: Terminated


Output from TERM version:
----------
slurmstepd: *** STEP 4296276.0 CANCELLED AT 2014-07-29T16:53:05 ***
srun: error: m6-18-16: task 0: Terminated
srun: Force Terminated job step 4296276.0
Exited with 143


$ sacct -j 4296274 # KILL variant
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
4296274        gbash.sh m6beta,m7+      staff          1    TIMEOUT      0:1 
4296274.bat+      batch                 staff          1  CANCELLED     0:15 
4296274.0         sleep                 staff          1  CANCELLED     0:15


$ sacct -j 4296276 # TERM variant
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
4296276        gbash.sh m6beta,m7+      staff          1  COMPLETED      0:0 
4296276.bat+      batch                 staff          1  COMPLETED      0:0 
4296276.0         sleep                 staff          1  CANCELLED     0:15


Note that because the TERM variant ran an echo command it exited with a zero status.
Comment 1 David Bigagli 2014-08-04 08:13:31 MDT
Hi, 
   signal 9 (KILL) is an exception to this rule, by design when -9 is issued,
the job is terminated immediately, nodes are deallocated, resources freed
and all steps including batch signaled. This emulates the Unix behaviour
in which signal 9 cannot be block or caught.

David
Comment 2 Levi Morrison 2014-08-04 08:25:17 MDT
(In reply to David Bigagli from comment #1)
>    signal 9 (KILL) is an exception to this rule, by design when -9 is issued,
> the job is terminated immediately, nodes are deallocated, resources freed
> and all steps including batch signaled. This emulates the Unix behaviour
> in which signal 9 cannot be block or caught.

In a Unix environment, sending a KILL signal to a child process will not kill the parent. Similarly, sending a KILL signal to a job step (the child) should not kill the job (the parent).
Comment 3 David Bigagli 2014-08-04 08:27:34 MDT
True but signaling a job using sigkill is like sending the signal to an entire unix process group.

David
Comment 4 David Bigagli 2014-08-04 08:34:57 MDT
David
Comment 5 Levi Morrison 2014-08-04 08:35:30 MDT
(In reply to David Bigagli from comment #3)
> True but signaling a job using sigkill is like sending the signal to an
> entire unix process group.

True, but in this analogy the main job is the parent of the process group that is getting killed. Sending a kill to the process group doesn't kill the parent.

According to the documentation --signal is sent to job steps and not the batch job; why does the main job get signaled on SIGKILL but not on SIGTERM (or others)?
Comment 6 Levi Morrison 2014-08-04 08:37:27 MDT
(In reply to Levi Morrison from comment #5)
> (In reply to David Bigagli from comment #3)
> > True but signaling a job using sigkill is like sending the signal to an
> > entire unix process group.
> 
> True, but in this analogy the main job is the parent of the process group
> that is getting killed. Sending a kill to the process group doesn't kill the
> parent.
> 
> According to the documentation --signal is sent to job steps and not the
> batch job; why does the main job get signaled on SIGKILL but not on SIGTERM
> (or others)?

What this means is that either:

 1) The behavior of sending KILL is incorrect
 2) The behavior of all other signals is incorrect

And in either case the documentation is wrong.
Comment 7 David Bigagli 2014-08-04 09:02:10 MDT
I did some research back in time and it looks like the code always worked like
that. What problems is this causing you exactly? Can you just send the SIGTERM?

Yes the man page of sbatch should be corrected.

David
Comment 8 Levi Morrison 2014-08-04 09:51:04 MDT
(In reply to David Bigagli from comment #7)
> I did some research back in time and it looks like the code always worked
> like that. What problems is this causing you exactly? Can you just send
> the SIGTERM?

The fact that --signal=KILL@120 is sent to the main job and its job steps instead of just the job steps is inconsistent with the behavior of sending --signal=TERM@120; the fact that the documentation doesn't mention this inconsistency only compounds the issue.

Our use-case is that a particular piece of commercial software swallows all signals sent to it and we need to copy off some data after it dies. This means we must send KILL because if we send TERM it doesn't die until a KILL is sent at the end of the job and can no longer copy off the data.

We thought of a solution using --signal=B:TERM@120 and a trap handler but are having issues; we have not yet pinpointed the cause(s).

All in all, this means that a certain active research group is unable to retrieve data from 12% of all their jobs.
Comment 9 David Bigagli 2014-08-05 07:48:45 MDT
You have an application that does not react to any signal? What is the reason?

I think the idea of using sigterm in a wrapper is the right one.
I simulated an application that ignores sigterm:
------------------------------------------------------
#include <stdio.h>
#include <signal.h>
#include <sys/types.h>
#include <unistd.h>

int
main(int argc, char **argv)
{
        printf("spin %d\n", getpid());
        signal(SIGTERM, SIG_IGN);
        while(1);
}
---------------------------------------------------------
then using the attached wrapper (catchchild) I submit a batch script like this:
---------------------------------------
cat Catch 
#!/bin/sh

trap 'echo trapped TERM/15' 15

./catchchild
#srun ./catchchild
exit 0
----------------------------------------

using the --signal=TERM@X the wrapper catches the term signal and 
does whatever it needs to do then kills the child processes.

David
Comment 10 David Bigagli 2014-08-05 07:49:34 MDT
Created attachment 1108 [details]
catchchild program
Comment 11 Levi Morrison 2014-08-05 07:55:49 MDT
(In reply to David Bigagli from comment #9)
> You have an application that does not react to any signal? What is the
> reason?

It is a closed source, commercial application; I do not know why it exhibits this behavior. I suspect that it has something to do with calls to `sleep` and `wait` not running signal handlers until they wake up.

Please, seriously consider aligning the behavior of --signal=KILL to match that of the rest of the signals. There is no reason that --signal=KILL@60 should be sent to the batch job.
Comment 12 David Bigagli 2014-08-05 08:19:25 MDT
Perhaps you could ask their support since both sleep and wait are interruptible.

Anyway there is a reason why the code was written this way we just don't know it, and at this stage and we are not sure what the consequences might be.

Why don't we do this. I will investigate a patch and then send it to you.
Then we will have a general solution in 14.11. How does this sound?

David
Comment 13 Levi Morrison 2014-08-05 08:35:15 MDT
(In reply to David Bigagli from comment #12)
> Perhaps you could ask their support since both sleep and wait are
> interruptible.

In tests we have seen that this is not the case. Run this:

#!/bin/bash

echo $$
trap date TERM

sleep 20s


And then send it a TERM signal. Note that the handler will not run until the sleep has returned.
 
> 
> Anyway there is a reason why the code was written this way we just don't
> know it, and at this stage and we are not sure what the consequences might
> be.

I understand that, but I'd certainly argue that whatever the case it should be rewritten. Special casing KILL is not a good behavior unless there are strange technical reasons.

> Why don't we do this. I will investigate a patch and then send it to you.
> Then we will have a general solution in 14.11. How does this sound?

That sounds great. Hopefully you'll be able to see what, if anything, it breaks in the process.
Comment 14 David Bigagli 2014-08-05 09:12:35 MDT
That is the shell you are signaling *not* the sleep command. The bash shell
traps the signal, as you ask it to do, and restart its wait() system call,
but does not signal the child process. 

This is easy to see if you run 'strace -p shell process id', the one you 
print with $$. Only when the child it is running has finished it prints the
message.

If you want to signal the child process, the sleep, command you have 2 ways

1) Get the process id of sleep and send it sigterm, sleep since it is interruptible  will exit right away and so the shell.

2) Ask the kill to do this for you specifying -pid to signal the entire process group. The result will be the same as in 1). Have a look at the kill man page.

David
Comment 15 Levi Morrison 2014-08-05 09:15:54 MDT
(In reply to David Bigagli from comment #14)
> That is the shell you are signaling *not* the sleep command. The bash shell
> traps the signal, as you ask it to do, and restart its wait() system call,
> but does not signal the child process. 
> 
> This is easy to see if you run 'strace -p shell process id', the one you 
> print with $$. Only when the child it is running has finished it prints the
> message.
> 
> If you want to signal the child process, the sleep, command you have 2 ways
> 
> 1) Get the process id of sleep and send it sigterm, sleep since it is
> interruptible  will exit right away and so the shell.
> 
> 2) Ask the kill to do this for you specifying -pid to signal the entire
> process group. The result will be the same as in 1). Have a look at the kill
> man page.

Nothing you've said really matters because the bash script is what gets the signal, and the trap handler won't run until sleep returns.
Comment 16 Levi Morrison 2014-08-05 09:16:28 MDT
(In reply to David Bigagli from comment #14)
> That is the shell you are signaling *not* the sleep command. The bash shell
> traps the signal, as you ask it to do, and restart its wait() system call,
> but does not signal the child process. 
> 
> This is easy to see if you run 'strace -p shell process id', the one you 
> print with $$. Only when the child it is running has finished it prints the
> message.
> 
> If you want to signal the child process, the sleep, command you have 2 ways
> 
> 1) Get the process id of sleep and send it sigterm, sleep since it is
> interruptible  will exit right away and so the shell.
> 
> 2) Ask the kill to do this for you specifying -pid to signal the entire
> process group. The result will be the same as in 1). Have a look at the kill
> man page.

Nothing you've said really matters because the bash script is what gets the signal from Slurm, and the trap handler won't run until sleep returns.
Comment 17 David Bigagli 2014-08-05 09:24:30 MDT
I thought you said:

->I suspect that it has something to do with calls to `sleep` and
->`wait` not running signal handlers until they wake up.

I just wanted to be precise, it is not the sleep or wait, which have no signal handlers and are interruptible but the shell, or in the other case the application. This should be easy to verify using strace.

On a slightly different note do you use, or plan to use, srun with this application?

David
Comment 18 Levi Morrison 2014-08-05 10:19:11 MDT
(In reply to David Bigagli from comment #17) 
> On a slightly different note do you use, or plan to use, srun with this
> application?

We have been using srun, which is one reason why the signal handler stuff is an issue. We can send SIGTERM to the job step and sometimes it dies, but not always. That's why we want to send SIGKILL. However, sending SIGKILL also kills the whole job instead of just the job steps.
Comment 19 David Bigagli 2014-08-05 10:22:29 MDT
Ok I am working on what we discussed before. I realized that you are only concerned about the sbatch --signal option rather than a generic signaling mechanism. In this case we can just piggyback on the bug 333 implementation 
which is going to make it much easier.

David
Comment 20 David Bigagli 2014-08-07 06:59:34 MDT
I am attaching the diffs for this problem. Please apply them to your source.
We will apply the patch against the 14.11 code base.

David
Comment 21 Levi Morrison 2014-08-07 07:28:05 MDT
(In reply to David Bigagli from comment #20)
> I am attaching the diffs for this problem. Please apply them to your source.

I don't see any attachments other than the catchchild program.
Comment 22 Levi Morrison 2014-08-07 07:28:23 MDT
(In reply to David Bigagli from comment #20)
> I am attaching the diffs for this problem. Please apply them to your source.

I don't see any attachments other than the catchchild program.
Comment 23 Levi Morrison 2014-08-07 07:28:33 MDT
(In reply to David Bigagli from comment #20)
> I am attaching the diffs for this problem. Please apply them to your source.

I don't see any attachments other than the catchchild program.
Comment 24 David Bigagli 2014-08-07 07:31:07 MDT
Created attachment 1120 [details]
diff for the --signal sbatch and srun command