| Summary: | Cannot get preempexempttime to work as documented | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Rob <rug262> |
| Component: | Scheduling | Assignee: | Ben Roberts <ben> |
| Status: | OPEN --- | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | ben, marshall |
| Version: | 22.05.8 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | PSU | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 23.02.2, 23.11.0rc1 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Rob
2023-03-10 15:57:13 MST
Hi Rob,
There is no support for this parameter and 'SUSPEND,GANG'.
> https://slurm.schedmd.com/sacctmgr.html#OPT_PreemptExemptTime
PreemptExemptTime
Specifies a minimum run time for jobs of this QOS before they are considered for preemption. This QOS
option takes precedence over the global PreemptExemptTime. This is only honored for
PreemptMode=REQUEUE and PreemptMode=CANCEL.
Setting to -1 disables the option, allowing another QOS or the global option to take effect. Setting
to 0 indicates no minimum run time and supersedes the lower priority QOS (see OverPartQOS) and/or the
global option in slurm.conf.
I have found a rather large amount of ambiguity in the slurm docs, since properties such as "PreemptMode" are repeated at multiple levels. Thus, given that it is in the sacctmgr docs, I assumed that the "PreemptMode" mentioned there that cannot be suspend,gang referred to the QOS preemptmode setting (which I showed as set to REQUEUE), not the global setting in slurm.conf. So you're telling me that, no matter what convoluted combination of global, partition, and QOS settings I choose, there is no way to have PreemptExemptTime and the Gang scheduler (which is needed in order to reschedule suspended jobs) functional in the same cluster? Thanks, Rob I'm not attempting to reconfigure the system for requeue only, no suspending. I have changed the slurm.conf and qos entries to remove all manner of suspending or gang scheduling. slurm.conf PreemptType=preempt/qos PreemptMode=REQUEUE PreemptExemptTime=00:00:00 JobRequeue=1 Partitions: PartitionName=normal Default=YES QOS=normal Oversubscribe=No Nodes=t-sc-1101 (this node has 48 cores) PartitionName=expedite Default=NO QOS=expedite Oversubscribe=No Nodes=t-sc-1101 QOS: normal: PreemptExemptTime=00:01:00, PreemptMode=requeue expedite: Preempt=normal, PreemptMode=requeue If I do this, and start a job on the normal partition (taking up all cores on the one node), then start another job on the expedite partition, I see that the expedite job stays pending due to Resources, and will not preempt the normal partition/qos job even after the minute of exempt time has passed. Please help me figure out why preemption is not happening. Hi Rob, I'm happy to help you get this working. (1) You are correct that PreemptExemptTime does not work with PreemptMode = GANG or SUSPEND. (2) With preempt/qos: * Only job QOS is considered when determining if one job can preempt another job. The Partition QOS Preempt setting is not used to determine if a job can preempt another job. * Partition QOS PreemptExemptTime overrides job PreemptExemptTime, unless the job's QOS has the flag OverPartQOS. * A job cannot preempt another job with the same qos unless the QOS has the PreemptMode=WITHIN (new in 23.02). In your example, you are trying to use Partition QOS to preempt; that won't work. Remove the QOS from the partition definitions and make sure that the jobs request the QOS with the --qos option for salloc, sbatch, and srun. This should give you the behavior that you want: the job that requests --qos=expedite should preempt the job that requests --qos=normal. Can you let me know if this works for you? We will work to clarify the documentation about preempt/qos. Yes, thank you, I did actually see that work that time. Can you tell me if this statement is true: If I am going to use PreemptExemptTime at a cluster or QOS level (since there is no setting for partitions), then that means that: -- The cluster PreemptMode (slurm.conf) cannot be SUSPEND,GANG -- No partition definition at all can be PreemptMode=SUSPEND,GANG even if I don't want to use PreemptExemptTime in that partition -- No QOS definition at all can contain a PreemptMode=SUSPEND,GANG even if I don't want to use PreemptExemptTime in that QOS. Is that all true? In other words, if I want to use PreemptExemptMode in my cluster anywhere, then I will have to give up the ability to SUSPEND and resume jobs completely, in any way, within that same cluster? Thanks. That so far has not been clear. It is confusing because you say "(1) You are correct that PreemptExemptTime does not work with PreemptMode = GANG or SUSPEND.", but there is a PreemptMode in cluster, partition, and QOS settings. Which setting? Or do you mean ALL of them? Thanks. I think I have answered that question myself, as I see in the documentation that in order for ANY preemptmode to be suspend, then "gang" has to be specified at the cluster level. And my testing has showed me that putting gang in the cluster level preemptMode makes the preemptExemptTime setting no longer work. So I believe I now know for sure that Suspend and ExemptTime cannot exist in the same cluster. Thanks. (In reply to Rob from comment #9) > I think I have answered that question myself, as I see in the documentation > that in order for ANY preemptmode to be suspend, then "gang" has to be > specified at the cluster level. And my testing has showed me that putting > gang in the cluster level preemptMode makes the preemptExemptTime setting no > longer work. So I believe I now know for sure that Suspend and ExemptTime > cannot exist in the same cluster. Yes, that is correct. We will clarify the documentation about PreemptExemptTime and what is allowed with PreemptMode=GANG/SUSPEND. I need to make a correction. I'm not sure why I didn't get this to work yesterday, but I did get a QOS PreemptMode=requeue to override the cluster setting SUSPEND,GANG, and PreemptExemptTime worked with it.
slurm.conf:
PreemptType=preempt/qos
PreemptMode=suspend,gang
$ sacctmgr show qos low,high format=name,preempt,preemptexempttime,preemptmode
Name Preempt PreemptExemptTime PreemptMode
---------- ---------- ------------------- -----------
high low cluster
low 00:01:00 requeue
In a default partition with one node, 8 cpus per node, I submit two jobs which take the whole node so that only one can run at a time:
$ sbatch --qos=low -Dtmp -n1 -c8 --wrap='whereami 600'
$ sbatch --qos=high -Dtmp -n1 -c8 --wrap='whereami 600'
After a little more than one minute of runtime, the job in qos low was preempted and requeued, and the job in qos high started running.
Can you test this to see if it works for you?
I've tested it, and preemptexempt time is indeed respected. That is perfect. I am just going to confirm that suspend/gang also works in its own partitions as well. I just want to note about the documentation: The confusion was not about what PreemptMode should be set to, but *WHICH* PreemptMode, since there is a cluster, partition, and QOS one. I have indeed now confirmed that I can suspend and resume jobs, and that other jobs have a working preemptexempttime. Thank you so much. Everything I had done and read had led me to believe that wasn't possible. Rob Thanks! I'm glad that it's all working for you. I'm reopening the bug to track fixing the documentation. We'll re-close the bug once we've updated the docs. Hi Rob, I wanted to let you know that we have checked in some updates to the documentation to better explain some of the questions you had. If you're interested you can see the commits here: https://github.com/SchedMD/slurm/commit/a53f83e34ec7b95898abf40e5ebc20a9bb4f05cd https://github.com/SchedMD/slurm/commit/3e4762a1dfeb6d222f6027ca21f6f2850c006fab https://github.com/SchedMD/slurm/commit/8b4c18e93ac9b456c09b5b8589eae8891bc17204 These will show up in the online documentation with the release of 23.02.2. Thanks, Ben I appreciate the chance to review the changes, but I don't see that any of them address what my concerns expressed in comment 8 and 12 were, which was that in cases where PreemptMode had to be a certain value, there was no distinction of WHICH PreemptMode it was referring to, since there is a PreemptMode set globally, in partitions, and in QOS. It is this ambiguity which I found difficult to navigate. Hi Rob, My apologies, we identified our own shortcomings that we saw in the documentation and fixed those but overlooked the thing that was confusing to you. I'll reopen this ticket and work on updating the documentation to clarify this aspect of it. Thanks, Ben Thank you Ben. Hi Rob, I'm lowering the priority of this ticket while working on documentation changes. I'll let you know as things progress. Thanks, Ben Ok, thanks for keeping me updated. ________________________________ From: bugs@schedmd.com <bugs@schedmd.com> Sent: Wednesday, May 24, 2023 3:24 PM To: Groner, Rob <rug262@psu.edu> Subject: [Bug 16246] Cannot get preempexempttime to work as documented Ben Roberts<mailto:ben@schedmd.com> changed bug 16246<https://bugs.schedmd.com/show_bug.cgi?id=16246> What Removed Added Severity 3 - Medium Impact 4 - Minor Issue Comment # 35<https://bugs.schedmd.com/show_bug.cgi?id=16246#c35> on bug 16246<https://bugs.schedmd.com/show_bug.cgi?id=16246> from Ben Roberts<mailto:ben@schedmd.com> Hi Rob, I'm lowering the priority of this ticket while working on documentation changes. I'll let you know as things progress. Thanks, Ben ________________________________ You are receiving this mail because: * You reported the bug. I've been researching using the preemptexempttime using partition preemption instead of QOS, and I've discovered some interesting things and possibly a bug. Would you like me to add it here, or start a new ticket? Hi Rob, My apologies for the delayed response, I was out of the office for a couple days at the end of last week. I think it would be best in a new ticket so that the documentation changes that we're working on don't get lost in the shuffle. Thanks, Ben Ok, will do. |