Ticket 9209

Summary: Multiple jobs on a node despite Oversubscribe=EXCLUSIVE
Product: Slurm Reporter: Luke Yeager <lyeager>
Component: SchedulingAssignee: Director of Support <support>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 3 - Medium Impact    
Priority: ---    
Version: 19.05.4   
Hardware: Linux   
OS: Linux   
Site: NVIDIA (PSLA) Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: sanitized slurm.conf

Description Luke Yeager 2020-06-09 16:06:06 MDT
Created attachment 14614 [details]
sanitized slurm.conf

On our cluster, we are observing multiple jobs scheduled on the same node, despite having OverSubscribe=EXCLUSIVE set on all partitions.


Here is the smoking gun:

lyeager@login-01:~$ squeue -a -w node-00[59,60]
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            330337      main     bash   user_d  R      25:47      2 node-[0059-0060]
            330110  backfill     job1   user_b  R    1:40:32      1 node-0060
            330065  backfill     job2   user_a  R    1:33:48      1 node-0059


I've attached our [sanitized] slurm.conf. Is there any issue with our configuration that you can see? We have basically the same config on another cluster and have not seen any issues.

This is a very big problem for us.
Comment 1 Luke Yeager 2020-06-09 16:09:31 MDT
Ah, we just discovered that PreemptMode=ON when we meant to have PreemptMode=CANCEL. That's probably it. Checking now.
Comment 2 Michael Hinton 2020-06-09 16:12:21 MDT
I think this is the problem:

PartitionName=main      Default=NO   PriorityTier=2  DefaultTime=2:00:00  MaxTime=2:00:00  PreemptMode=OFF                 nodes=ALL  OverSubscribe=NO
PartitionName=backfill  Default=YES  PriorityTier=1  DefaultTime=0:30:00  MaxTime=8:00:00  PreemptMode=ON   GraceTime=600  nodes=ALL  OverSubscribe=NO  QOS=backfill

OverSubscribe=NO is not the same thing as OverSubscribe=EXCLUSIVE.
Comment 3 Luke Yeager 2020-06-09 16:20:58 MDT
Oh, we actually changed from EXCLUSIVE to NO just a few minutes ago based on the documentation here:

https://slurm.schedmd.com/archive/slurm-19.05.4/cons_res_share.html

We didn't really expect that change to help, because we were pretty sure we wanted  EXCLUSIVE, but we thought following the docs seemed pretty safe.

So we should set PreemptMode=CANCEL and go back to OverSubscribe=EXCLUSIVE, agreed?
Comment 4 Michael Hinton 2020-06-09 16:30:08 MDT
Sorry, you are right. With select/linear, it should be NO. But we highly recommend using select/cons_res or select/cons_tres with OverSubscribe=EXCLUSIVE instead. That gives you the same whole-node functionality as select/linear but with added flexibility. We also do not support select/linear very much and will likely remove it in the near future.
Comment 5 Michael Hinton 2020-06-09 16:36:13 MDT
(In reply to luke.yeager from comment #3)
> So we should set PreemptMode=CANCEL and go back to OverSubscribe=EXCLUSIVE,
> agreed?
PreemptMode=ON is an invalid setting, so yes, setting it to CANCEL or REQUEUE is the common practice. Though that is not related to the issue of multiple jobs on the same node.

OverSubscribe=EXCLUSIVE is an invalid setting for select/linear. Setting it to NO should probably fix the issue.
Comment 6 Michael Hinton 2020-06-10 09:30:18 MDT
Hi Luke, any updates? I'm going to go ahead and reduce the severity to 3, since configuration questions generally don't qualify as a sev 2.
Comment 7 Luke Yeager 2020-06-10 18:26:24 MDT
We've resolved the issue now. Thanks for the pointers! Closing.