Summary: | signal delivery to batch step not working | ||
---|---|---|---|
Product: | Slurm | Reporter: | Troy Baer <troy> |
Component: | Other | Assignee: | Director of Support <support> |
Status: | RESOLVED INFOGIVEN | QA Contact: | |
Severity: | 3 - Medium Impact | ||
Priority: | --- | CC: | albert.gil, tdockendorf |
Version: | 20.02.4 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | Ohio State OSC | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | Target Release: | --- | |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Description
Troy Baer
2020-09-01 11:31:09 MDT
Hi Troy, You need to do the following: sleep 3600 & wait From https://slurm.schedmd.com/scancel.html: "Note that most shells cannot handle signals while a command is running (child process of the batch step), the shell [needs to] use `wait` [to] wait until the command ends to then handle the signal." Thanks, -Michael Thanks Michael, I've verified that the background+wait trick works. The whole reason this came up is that "trap 'handler_cmd' TERM" is a common job script pattern that our users have been using for years in TORQUE, and we were a bit surprised to discover that it doesn't seem to work in Slurm. Do you know why there is a difference in behavior with that? (In reply to Troy Baer from comment #3) > The whole reason this came up is that "trap 'handler_cmd' TERM" is a common > job script pattern that our users have been using for years in TORQUE, and > we were a bit surprised to discover that it doesn't seem to work in Slurm. > Do you know why there is a difference in behavior with that? I'm not entirely sure. But I do notice that if I do this: scancel --full --signal=USR1 <jobid> then the batch script indeed does catch the signal, even when executing a blocking `sleep 3600`. I haven't looked at the code, but my guess is that --batch doesn't know how to route the signal to just the batch script process when there is a blocking child process in the foreground, whereas --full sends a signal to the batch process and all its children, so it doesn't have to be selective about it. This might be an area where we could improve Slurm, if it's possible. Having a way to specify the equivalent behavior of scancel --full -signal=<signal> with sbatch --signal would be quite useful IMHO. (In reply to Troy Baer from comment #5) > Having a way to specify the equivalent behavior of scancel --full > -signal=<signal> with sbatch --signal would be quite useful IMHO. I'm surprised by that asymmetry. That does seem useful. Feel free to open an enhancement ticket to address this. TLDR: ==================== So after playing around with Slurm and looking at the code, I think I understand what is going on. Your users should probably be using `scancel --full` instead of `scancel --batch` to get the same behavior as TORQUE. Explanation: ==================== Let's say I put this in my batch script: sleep 3600 & sleep 3600 & sleep 3600 wait Here's what the process tree looks like: $ pstree -p | grep "step\|sleep\|slurm_script" |-slurmstepd(947812)-+-slurm_script(947817)-+-sleep(947818) | | |-sleep(947819) | | `-sleep(947820) | |-{slurmstepd}(947813) | |-{slurmstepd}(947814) | |-{slurmstepd}(947815) | `-{slurmstepd}(947816) --batch tells the stepd to kill() slurm_script(947817), whereas --full tells the stepd to pgkill() slurm_script(947817), which in turn kill()s the 3 children sleeps. The first two `sleep 3600 &` calls are likely implemented by the shell as fork()s, which return control back to slurm_script immediately after forking. The `sleep 3600` call is likely implemented by the shell as some kind of exec() call. This does not return control to the shell immediately afterwards like fork(), but rather the shell process "becomes" the sleep for the duration of the sleep. This makes it so the shell process is no longer running. Since it’s not running, it can’t run any custom signal handlers it registered with the OS, and those signals bounce off, unhandled. If I changed my script to this: sleep 3600 & sleep 3600 & sleep 3600 & wait Then the shell process is actually running (`wait`ing), and all the sleeps are forked children. So in this case, the shell is free to run the custom signal handler (though pstree will look the same). Here’s another example of this Shell quirk, outside of Slurm (in Bash on Ubuntu). Let’s say I execute this shell script in my terminal: $ cat ./9715-2.sh #!/bin/bash function my_handler() { echo "Catching signal USR1" } trap my_handler USR1 sleep 3600 $ ./9715-2.sh In another terminal, I do this: $ pstree -p | grep sleep | | |-bash(2279)---bash(949501)---sleep(949502) $ kill -s SIGUSR1 949501 What do you think will happen? It turns out that nothing happens: $ pstree -p | grep sleep | | |-bash(2279)---bash(949501)---sleep(949502) What about this? $ kill -s SIGUSR2 949501 It terminates the parent bash process (949501), and the child sleep (949502) gets orphaned and put under the init process (systemd): $ pstree -p | grep sleep |-sleep(949502) The difference between these two scenarios is that when the shell script registers a signal handler, the shell script needs to be running in order to handle it. If no handler is defined, then the default Linux process handler kicks in, and the process gets terminated, regardless of if the shell process is running or not. So SIGUSR1 got ignored, and SIGUSR2 made the OS kill the process. Hopefully that satisfies you. I'll go ahead and close this out as info given. Thanks! -Michael |