| Summary: | No signals sent when job is selected for preemtion, only at end of gractime. | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Jon <tegner> |
| Component: | Scheduling | Assignee: | Tim Wickberg <tim> |
| Status: | OPEN --- | QA Contact: | |
| Severity: | C - Contributions | ||
| Priority: | --- | CC: | broderick, bsantos, fullop, mcoyne, sts |
| Version: | 17.02.6 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | -Other- | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: | allow user-specified signal to be delivered at preemption | ||
Jon, Thank you for submitting this ticket. Before this ticket can be assigned to a support engineer we need to discussion support contract options to determine which support option will work best for FOI. Jacob Hi, so how do we get a support contract? And it seems a bit odd that bugs are disregarded if they are not submitted by anyone with a support contract is that a general policy of yours? Regards, /jon On 11/10/2017 12:21 AM, bugs@schedmd.com wrote: > > *Comment # 1 <https://bugs.schedmd.com/show_bug.cgi?id=4352#c1> on bug > 4352 <https://bugs.schedmd.com/show_bug.cgi?id=4352> from Jacob Jenson > <mailto:jacob@schedmd.com> * > Jon, > > Thank you for submitting this ticket. Before this ticket can be assigned to a > support engineer we need to discussion support contract options to determine > which support option will work best for FOI. > > Jacob > ------------------------------------------------------------------------ > You are receiving this mail because: > > * You reported the bug. > Jon You are correct, by policy SchedMD allocates engineering time to building enhancing for Slurm and providing support to sites with support contracts. In order to receive a quote for Slurm support please send your sites node count to sales@schedmd.com Jacob Created attachment 8744 [details]
allow user-specified signal to be delivered at preemption
Is there any update on status of this? In particular, will it be in 18.08.(6?) Note: LANL does have a support contract. (In reply to S Senator from comment #5) > Is there any update on status of this? In particular, will it be in > 18.08.(6?) No, it won't. It might be eligible for 19.05, but I'll need to look into it further. Thank you for the update. Please consider this a LANL request that this be evaluated for 19.05. |
When a job is selected for preemtion signals are supposed to be sent at two occasions: immediately when the situation is detected, and the second time at the end of the gractime. From the manual: "Once a job has been selected for preemption, its end time is set to the current time plus GraceTime. The job is immediately sent SIGCONT and SIGTERM signals in order to provide notification of its imminent termination. This is followed by the SIGCONT, SIGTERM and SIGKILL signal sequence upon reaching its new end time." It seems the first of these signals is not sent. This can be tested using the script job.sh: ****************************** #!/bin/bash #SBATCH -p cheap #SBATCH -n 32 #SBATCH -t 12:00:00 sig_term() { echo "function sig_term called. Exiting" echo 'sig_term' > slask_term echo $(date) >> slask_term } # associate the function "term_handler" with the TERM signal trap 'sig_term' SIGTERM sleep 400 & wait $! ****************************** Partitions defined like: ******************************************* PartitionName=cheap Nodes=ALL Priority=1 PreemptMode=CANCEL GraceTime=10 Default=YES MaxTime=INFINITE State=UP: PartitionName=paid_jobs Nodes=ALL Priority=1000 PreemptMode=OFF Default=YES MaxTime=INFINITE State=UP: ******************************************** Job submitted with: sbatch job.sh When a job is submitted from the higher priority partition (paid_jobs) no signal is detected, only the signal when the gracetime has expired is seem to be sent. OS is CentOS-7.3. Same behavior seen using Slurm-15.08.13-1.el7. Thanks!