| Summary: | RFE: do not immediately SIGKILL all processes in a task when the main process exits | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Felix Abecassis <fabecassis> |
| Component: | slurmstepd | Assignee: | Ben Glines <ben.glines> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 5 - Enhancement | ||
| Priority: | --- | CC: | ben.glines, bnabong, jbernauer, lyeager, tim |
| Version: | 24.11.x | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | NVIDIA (PSLA) | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | 25.05.0rc1 | |
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
|
Description
Felix Abecassis
2024-07-22 19:12:15 MDT
Re-tagging as a potential enhancement request. Is this something you're imagining only needing on explicitly launched steps with srun, or might also want as part of the batch step as well? Using PR_SET_CHILD_SUBREAPER is certainly trickier, but also may be worth exploring on its own anyways. Right now our use case is only for steps launched through srun. But we believe supporting processes in the sbatch test could possibly be helpful too, and the behavior would be consistent this way between the sbatch step and srun steps. The '--wait-for-children' options has been added to srun and will be available in 25.05. The option leverages certain cgroup features that make it possible to wait until all processes are done before tasks (and subsequently the step) finished. Please let me know if you have any questions. See commits here: https://github.com/SchedMD/slurm/compare/f3a3d87f3feb...0a25b5ffd297. Please reopen this ticket or submit a new ticket if you run into any issues. Closing now. Re-opening, as I would like to discuss this aspect of the feature: https://slurm.schedmd.com/srun.html#OPT_wait-for-children > Note that if the parent process exits with a non-zero exit code, the task will end regardless of whether there are still children processes running. After discussing with our internal users that requested this feature, we would like to discuss whether this constraint can be relaxed. The use case is having children processes monitor the parent process and give those children the time to notify other nodes that one task crashed, so they should not be killed immediately. The suggestion from a user, and I think it makes sense, is to have a special case when both --no-kill and --wait-for-children are set. In this case we would wait wait all for children processes even if the exit code was non-zero for the parent. If --no-kill is not set, then we would keep the current behavior. What do you think? Hi Felix, We're currently discussing this internally. I'll get back to you ASAP on how we want to handle this. Traveling. Email replies will be delayed. Hey Felix - I'm not sure I want to overload --no-kill in this way, that flag has a lot of other impacts that aren't necessarily directly tied here. It looks like --kill-on-bad-exit=0 would be suitable. Although on my quick testing it seems like that's only respected client-side right now, and we'd need to adjust some code paths to suit. I'll ask Ben to take a look at whether we could patch that in, or if this might also require some RPC changes to support. - Tim Hello Tim, Do you have any update on this? Regarding modifying the behavior when --kill-on-bad-exit=0 is used. Thanks (In reply to Felix Abecassis from comment #11) > Hello Tim, > > Do you have any update on this? Regarding modifying the behavior when > --kill-on-bad-exit=0 is used. > > Thanks We have a patch set to use --kill-bad-on-exit to change the behavior of --wait-for-all-children in regards to the main process exit, i.e. if --kill-on-bad-exit=0, ignore non-zero exit code from main process, and if --kill-on-bad-exit=1, task is ended when main process ends with a non-zero exit code. The patch set is currently under review and I'll let you know about any status updates on it. Please let me know if you have any questions or concerns Thanks Ben! Are you planning to add the patchset to 25.05 or just to master/25.11? (In reply to Felix Abecassis from comment #13) > Thanks Ben! > > Are you planning to add the patchset to 25.05 or just to master/25.11? We are considering adding it to 25.05 as well, but I can't guarantee that yet. I'll let you know what is decided during the review process. We've added this change to --wait-for-children to 25.11, as well as 25.05. The earliest 25.05 version to see this change should be 25.05.2. See the following commits for each respective version: 25.11: https://github.com/SchedMD/slurm/compare/d1ebc21f15b...5f191368bb7 25.05: https://github.com/SchedMD/slurm/compare/d0bb9b329ef...66289192195 Let me know if you have any questions. Closing. |