As clarified in bug 333, signals that are not prefixed with B: should only go to job steps. However, when you use --signal=KILL@120 it will kill the job and job steps, instead of only the steps. Compare the output of this: #!/bin/bash #SBATCH -t00:04:00 --signal=KILL@120 srun sleep 1h echo "Exited with $?" With this: #!/bin/bash #SBATCH -t00:04:00 --signal=TERM@120 srun sleep 1h echo "Exited with $?" Output from KILL version: ---------- slurmstepd: *** STEP 4296274.0 CANCELLED AT 2014-07-29T16:50:20 DUE TO TIME LIMIT *** slurmstepd: *** JOB 4296274 CANCELLED AT 2014-07-29T16:50:20 DUE TO TIME LIMIT *** srun: Job step aborted: Waiting up to 2 seconds for job step to finish. srun: error: m6-17-16: task 0: Terminated Output from TERM version: ---------- slurmstepd: *** STEP 4296276.0 CANCELLED AT 2014-07-29T16:53:05 *** srun: error: m6-18-16: task 0: Terminated srun: Force Terminated job step 4296276.0 Exited with 143 $ sacct -j 4296274 # KILL variant JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 4296274 gbash.sh m6beta,m7+ staff 1 TIMEOUT 0:1 4296274.bat+ batch staff 1 CANCELLED 0:15 4296274.0 sleep staff 1 CANCELLED 0:15 $ sacct -j 4296276 # TERM variant JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 4296276 gbash.sh m6beta,m7+ staff 1 COMPLETED 0:0 4296276.bat+ batch staff 1 COMPLETED 0:0 4296276.0 sleep staff 1 CANCELLED 0:15 Note that because the TERM variant ran an echo command it exited with a zero status.
Hi, signal 9 (KILL) is an exception to this rule, by design when -9 is issued, the job is terminated immediately, nodes are deallocated, resources freed and all steps including batch signaled. This emulates the Unix behaviour in which signal 9 cannot be block or caught. David
(In reply to David Bigagli from comment #1) > signal 9 (KILL) is an exception to this rule, by design when -9 is issued, > the job is terminated immediately, nodes are deallocated, resources freed > and all steps including batch signaled. This emulates the Unix behaviour > in which signal 9 cannot be block or caught. In a Unix environment, sending a KILL signal to a child process will not kill the parent. Similarly, sending a KILL signal to a job step (the child) should not kill the job (the parent).
True but signaling a job using sigkill is like sending the signal to an entire unix process group. David
David
(In reply to David Bigagli from comment #3) > True but signaling a job using sigkill is like sending the signal to an > entire unix process group. True, but in this analogy the main job is the parent of the process group that is getting killed. Sending a kill to the process group doesn't kill the parent. According to the documentation --signal is sent to job steps and not the batch job; why does the main job get signaled on SIGKILL but not on SIGTERM (or others)?
(In reply to Levi Morrison from comment #5) > (In reply to David Bigagli from comment #3) > > True but signaling a job using sigkill is like sending the signal to an > > entire unix process group. > > True, but in this analogy the main job is the parent of the process group > that is getting killed. Sending a kill to the process group doesn't kill the > parent. > > According to the documentation --signal is sent to job steps and not the > batch job; why does the main job get signaled on SIGKILL but not on SIGTERM > (or others)? What this means is that either: 1) The behavior of sending KILL is incorrect 2) The behavior of all other signals is incorrect And in either case the documentation is wrong.
I did some research back in time and it looks like the code always worked like that. What problems is this causing you exactly? Can you just send the SIGTERM? Yes the man page of sbatch should be corrected. David
(In reply to David Bigagli from comment #7) > I did some research back in time and it looks like the code always worked > like that. What problems is this causing you exactly? Can you just send > the SIGTERM? The fact that --signal=KILL@120 is sent to the main job and its job steps instead of just the job steps is inconsistent with the behavior of sending --signal=TERM@120; the fact that the documentation doesn't mention this inconsistency only compounds the issue. Our use-case is that a particular piece of commercial software swallows all signals sent to it and we need to copy off some data after it dies. This means we must send KILL because if we send TERM it doesn't die until a KILL is sent at the end of the job and can no longer copy off the data. We thought of a solution using --signal=B:TERM@120 and a trap handler but are having issues; we have not yet pinpointed the cause(s). All in all, this means that a certain active research group is unable to retrieve data from 12% of all their jobs.
You have an application that does not react to any signal? What is the reason? I think the idea of using sigterm in a wrapper is the right one. I simulated an application that ignores sigterm: ------------------------------------------------------ #include <stdio.h> #include <signal.h> #include <sys/types.h> #include <unistd.h> int main(int argc, char **argv) { printf("spin %d\n", getpid()); signal(SIGTERM, SIG_IGN); while(1); } --------------------------------------------------------- then using the attached wrapper (catchchild) I submit a batch script like this: --------------------------------------- cat Catch #!/bin/sh trap 'echo trapped TERM/15' 15 ./catchchild #srun ./catchchild exit 0 ---------------------------------------- using the --signal=TERM@X the wrapper catches the term signal and does whatever it needs to do then kills the child processes. David
Created attachment 1108 [details] catchchild program
(In reply to David Bigagli from comment #9) > You have an application that does not react to any signal? What is the > reason? It is a closed source, commercial application; I do not know why it exhibits this behavior. I suspect that it has something to do with calls to `sleep` and `wait` not running signal handlers until they wake up. Please, seriously consider aligning the behavior of --signal=KILL to match that of the rest of the signals. There is no reason that --signal=KILL@60 should be sent to the batch job.
Perhaps you could ask their support since both sleep and wait are interruptible. Anyway there is a reason why the code was written this way we just don't know it, and at this stage and we are not sure what the consequences might be. Why don't we do this. I will investigate a patch and then send it to you. Then we will have a general solution in 14.11. How does this sound? David
(In reply to David Bigagli from comment #12) > Perhaps you could ask their support since both sleep and wait are > interruptible. In tests we have seen that this is not the case. Run this: #!/bin/bash echo $$ trap date TERM sleep 20s And then send it a TERM signal. Note that the handler will not run until the sleep has returned. > > Anyway there is a reason why the code was written this way we just don't > know it, and at this stage and we are not sure what the consequences might > be. I understand that, but I'd certainly argue that whatever the case it should be rewritten. Special casing KILL is not a good behavior unless there are strange technical reasons. > Why don't we do this. I will investigate a patch and then send it to you. > Then we will have a general solution in 14.11. How does this sound? That sounds great. Hopefully you'll be able to see what, if anything, it breaks in the process.
That is the shell you are signaling *not* the sleep command. The bash shell traps the signal, as you ask it to do, and restart its wait() system call, but does not signal the child process. This is easy to see if you run 'strace -p shell process id', the one you print with $$. Only when the child it is running has finished it prints the message. If you want to signal the child process, the sleep, command you have 2 ways 1) Get the process id of sleep and send it sigterm, sleep since it is interruptible will exit right away and so the shell. 2) Ask the kill to do this for you specifying -pid to signal the entire process group. The result will be the same as in 1). Have a look at the kill man page. David
(In reply to David Bigagli from comment #14) > That is the shell you are signaling *not* the sleep command. The bash shell > traps the signal, as you ask it to do, and restart its wait() system call, > but does not signal the child process. > > This is easy to see if you run 'strace -p shell process id', the one you > print with $$. Only when the child it is running has finished it prints the > message. > > If you want to signal the child process, the sleep, command you have 2 ways > > 1) Get the process id of sleep and send it sigterm, sleep since it is > interruptible will exit right away and so the shell. > > 2) Ask the kill to do this for you specifying -pid to signal the entire > process group. The result will be the same as in 1). Have a look at the kill > man page. Nothing you've said really matters because the bash script is what gets the signal, and the trap handler won't run until sleep returns.
(In reply to David Bigagli from comment #14) > That is the shell you are signaling *not* the sleep command. The bash shell > traps the signal, as you ask it to do, and restart its wait() system call, > but does not signal the child process. > > This is easy to see if you run 'strace -p shell process id', the one you > print with $$. Only when the child it is running has finished it prints the > message. > > If you want to signal the child process, the sleep, command you have 2 ways > > 1) Get the process id of sleep and send it sigterm, sleep since it is > interruptible will exit right away and so the shell. > > 2) Ask the kill to do this for you specifying -pid to signal the entire > process group. The result will be the same as in 1). Have a look at the kill > man page. Nothing you've said really matters because the bash script is what gets the signal from Slurm, and the trap handler won't run until sleep returns.
I thought you said: ->I suspect that it has something to do with calls to `sleep` and ->`wait` not running signal handlers until they wake up. I just wanted to be precise, it is not the sleep or wait, which have no signal handlers and are interruptible but the shell, or in the other case the application. This should be easy to verify using strace. On a slightly different note do you use, or plan to use, srun with this application? David
(In reply to David Bigagli from comment #17) > On a slightly different note do you use, or plan to use, srun with this > application? We have been using srun, which is one reason why the signal handler stuff is an issue. We can send SIGTERM to the job step and sometimes it dies, but not always. That's why we want to send SIGKILL. However, sending SIGKILL also kills the whole job instead of just the job steps.
Ok I am working on what we discussed before. I realized that you are only concerned about the sbatch --signal option rather than a generic signaling mechanism. In this case we can just piggyback on the bug 333 implementation which is going to make it much easier. David
I am attaching the diffs for this problem. Please apply them to your source. We will apply the patch against the 14.11 code base. David
(In reply to David Bigagli from comment #20) > I am attaching the diffs for this problem. Please apply them to your source. I don't see any attachments other than the catchchild program.
Created attachment 1120 [details] diff for the --signal sbatch and srun command