Some of our support staff reported that setting traps in Slurm jobs was not working for them. In the course of debugging, we discovered that any sort of signal sent to the batch step of a job, using either --signal=B:[...] or scancel --batch --signal=[...], never seems to be delivered. For instance, consider the following job script: ----- #!/bin/bash #SBATCH --job-name=minimal_trap #SBATCH --time=2:00 #SBATCH --nodes=1 --ntasks-per-node=1 #SBATCH --output=%x.%A.log #SBATCH --signal=B:USR1@60 function my_handler() { echo "Catching signal" touch $SLURM_SUBMIT_DIR/job_${SLURM_JOB_ID}_caught_signal exit } trap my_handler USR1 trap my_handler TERM sleep 3600 ----- If I submit and run this job script, it only produces the following output: troy@pitzer-login04:~/Beowulf/slurm/signal-delivery$ more minimal_trap.18491.log slurmstepd: error: *** JOB 18491 ON p0614 CANCELLED AT 2020-09-01T12:55:14 DUE TO TIME LIMIT *** Note that the echo from the signal handler does not appear in the output. The file created by the touch command in the signal handler does not exist either, leading me to suspect that the handler is never invoked. Similarly, if I submit this job script, wait for it to start running, and then immediately do an scancel --batch --signal=USR1 <jobid>, there is no evidence that the signal handler fired: troy@pitzer-login04:~/Beowulf/slurm/signal-delivery$ sbatch minimal_trap.job Submitted batch job 18517 troy@pitzer-login04:~/Beowulf/slurm/signal-delivery$ scancel --batch --signal=USR1 18517 troy@pitzer-login04:~/Beowulf/slurm/signal-delivery$ more minimal_trap.18517.log slurmstepd: error: *** JOB 18517 ON p0614 CANCELLED AT 2020-09-01T13:19:14 DUE TO TIME LIMIT *** Other tests we have done have shown that #SBATCH --signal=<spec> directives that do not include B: work as expected, so this appears to be specific to the batch steps of jobs.
Hi Troy, You need to do the following: sleep 3600 & wait From https://slurm.schedmd.com/scancel.html: "Note that most shells cannot handle signals while a command is running (child process of the batch step), the shell [needs to] use `wait` [to] wait until the command ends to then handle the signal." Thanks, -Michael
Thanks Michael, I've verified that the background+wait trick works. The whole reason this came up is that "trap 'handler_cmd' TERM" is a common job script pattern that our users have been using for years in TORQUE, and we were a bit surprised to discover that it doesn't seem to work in Slurm. Do you know why there is a difference in behavior with that?
(In reply to Troy Baer from comment #3) > The whole reason this came up is that "trap 'handler_cmd' TERM" is a common > job script pattern that our users have been using for years in TORQUE, and > we were a bit surprised to discover that it doesn't seem to work in Slurm. > Do you know why there is a difference in behavior with that? I'm not entirely sure. But I do notice that if I do this: scancel --full --signal=USR1 <jobid> then the batch script indeed does catch the signal, even when executing a blocking `sleep 3600`. I haven't looked at the code, but my guess is that --batch doesn't know how to route the signal to just the batch script process when there is a blocking child process in the foreground, whereas --full sends a signal to the batch process and all its children, so it doesn't have to be selective about it. This might be an area where we could improve Slurm, if it's possible.
Having a way to specify the equivalent behavior of scancel --full -signal=<signal> with sbatch --signal would be quite useful IMHO.
(In reply to Troy Baer from comment #5) > Having a way to specify the equivalent behavior of scancel --full > -signal=<signal> with sbatch --signal would be quite useful IMHO. I'm surprised by that asymmetry. That does seem useful. Feel free to open an enhancement ticket to address this.
TLDR: ==================== So after playing around with Slurm and looking at the code, I think I understand what is going on. Your users should probably be using `scancel --full` instead of `scancel --batch` to get the same behavior as TORQUE. Explanation: ==================== Let's say I put this in my batch script: sleep 3600 & sleep 3600 & sleep 3600 wait Here's what the process tree looks like: $ pstree -p | grep "step\|sleep\|slurm_script" |-slurmstepd(947812)-+-slurm_script(947817)-+-sleep(947818) | | |-sleep(947819) | | `-sleep(947820) | |-{slurmstepd}(947813) | |-{slurmstepd}(947814) | |-{slurmstepd}(947815) | `-{slurmstepd}(947816) --batch tells the stepd to kill() slurm_script(947817), whereas --full tells the stepd to pgkill() slurm_script(947817), which in turn kill()s the 3 children sleeps. The first two `sleep 3600 &` calls are likely implemented by the shell as fork()s, which return control back to slurm_script immediately after forking. The `sleep 3600` call is likely implemented by the shell as some kind of exec() call. This does not return control to the shell immediately afterwards like fork(), but rather the shell process "becomes" the sleep for the duration of the sleep. This makes it so the shell process is no longer running. Since it’s not running, it can’t run any custom signal handlers it registered with the OS, and those signals bounce off, unhandled. If I changed my script to this: sleep 3600 & sleep 3600 & sleep 3600 & wait Then the shell process is actually running (`wait`ing), and all the sleeps are forked children. So in this case, the shell is free to run the custom signal handler (though pstree will look the same). Here’s another example of this Shell quirk, outside of Slurm (in Bash on Ubuntu). Let’s say I execute this shell script in my terminal: $ cat ./9715-2.sh #!/bin/bash function my_handler() { echo "Catching signal USR1" } trap my_handler USR1 sleep 3600 $ ./9715-2.sh In another terminal, I do this: $ pstree -p | grep sleep | | |-bash(2279)---bash(949501)---sleep(949502) $ kill -s SIGUSR1 949501 What do you think will happen? It turns out that nothing happens: $ pstree -p | grep sleep | | |-bash(2279)---bash(949501)---sleep(949502) What about this? $ kill -s SIGUSR2 949501 It terminates the parent bash process (949501), and the child sleep (949502) gets orphaned and put under the init process (systemd): $ pstree -p | grep sleep |-sleep(949502) The difference between these two scenarios is that when the shell script registers a signal handler, the shell script needs to be running in order to handle it. If no handler is defined, then the default Linux process handler kicks in, and the process gets terminated, regardless of if the shell process is running or not. So SIGUSR1 got ignored, and SIGUSR2 made the OS kill the process.
Hopefully that satisfies you. I'll go ahead and close this out as info given. Thanks! -Michael