Ticket 9209

Summary:	Multiple jobs on a node despite Oversubscribe=EXCLUSIVE
Product:	Slurm	Reporter:	Luke Yeager <lyeager>
Component:	Scheduling	Assignee:	Director of Support <support>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	3 - Medium Impact
Priority:	---
Version:	19.05.4
Hardware:	Linux
OS:	Linux
Site:	NVIDIA (PSLA)	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---
Attachments:	sanitized slurm.conf

Description Luke Yeager 2020-06-09 16:06:06 MDT

Created attachment 14614 [details]
sanitized slurm.conf

On our cluster, we are observing multiple jobs scheduled on the same node, despite having OverSubscribe=EXCLUSIVE set on all partitions.


Here is the smoking gun:

lyeager@login-01:~$ squeue -a -w node-00[59,60]
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            330337      main     bash   user_d  R      25:47      2 node-[0059-0060]
            330110  backfill     job1   user_b  R    1:40:32      1 node-0060
            330065  backfill     job2   user_a  R    1:33:48      1 node-0059


I've attached our [sanitized] slurm.conf. Is there any issue with our configuration that you can see? We have basically the same config on another cluster and have not seen any issues.

This is a very big problem for us.

Comment 1 Luke Yeager 2020-06-09 16:09:31 MDT

Ah, we just discovered that PreemptMode=ON when we meant to have PreemptMode=CANCEL. That's probably it. Checking now.

Comment 2 Michael Hinton 2020-06-09 16:12:21 MDT

I think this is the problem:

PartitionName=main      Default=NO   PriorityTier=2  DefaultTime=2:00:00  MaxTime=2:00:00  PreemptMode=OFF                 nodes=ALL  OverSubscribe=NO
PartitionName=backfill  Default=YES  PriorityTier=1  DefaultTime=0:30:00  MaxTime=8:00:00  PreemptMode=ON   GraceTime=600  nodes=ALL  OverSubscribe=NO  QOS=backfill

OverSubscribe=NO is not the same thing as OverSubscribe=EXCLUSIVE.

Comment 3 Luke Yeager 2020-06-09 16:20:58 MDT

Oh, we actually changed from EXCLUSIVE to NO just a few minutes ago based on the documentation here:

https://slurm.schedmd.com/archive/slurm-19.05.4/cons_res_share.html

We didn't really expect that change to help, because we were pretty sure we wanted  EXCLUSIVE, but we thought following the docs seemed pretty safe.

So we should set PreemptMode=CANCEL and go back to OverSubscribe=EXCLUSIVE, agreed?

Comment 4 Michael Hinton 2020-06-09 16:30:08 MDT

Sorry, you are right. With select/linear, it should be NO. But we highly recommend using select/cons_res or select/cons_tres with OverSubscribe=EXCLUSIVE instead. That gives you the same whole-node functionality as select/linear but with added flexibility. We also do not support select/linear very much and will likely remove it in the near future.

Comment 5 Michael Hinton 2020-06-09 16:36:13 MDT

(In reply to luke.yeager from comment #3)
> So we should set PreemptMode=CANCEL and go back to OverSubscribe=EXCLUSIVE,
> agreed?
PreemptMode=ON is an invalid setting, so yes, setting it to CANCEL or REQUEUE is the common practice. Though that is not related to the issue of multiple jobs on the same node.

OverSubscribe=EXCLUSIVE is an invalid setting for select/linear. Setting it to NO should probably fix the issue.

Comment 6 Michael Hinton 2020-06-10 09:30:18 MDT

Hi Luke, any updates? I'm going to go ahead and reduce the severity to 3, since configuration questions generally don't qualify as a sev 2.

Comment 7 Luke Yeager 2020-06-10 18:26:24 MDT

We've resolved the issue now. Thanks for the pointers! Closing.