6192 – jobs not being preempted (preempt/qos)

Ticket 6192 - jobs not being preempted (preempt/qos)

Summary: jobs not being preempted (preempt/qos)

Status:	RESOLVED DUPLICATE of ticket 5293

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Scheduling (show other tickets)
Version:	17.11.8
Hardware:	Linux Linux

Severity:	3 - Medium Impact
Assignee:	Jason Booth
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2018-12-06 16:37 MST by Ryan Day
Modified:	2018-12-07 16:39 MST (History)
CC List:	0 users

See Also:
Site:	LLNL
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
slum.conf (1.75 KB, text/x-matlab) 2018-12-07 11:38 MST, Ryan Day	Details
file included in slurm.conf (1.53 KB, text/x-matlab) 2018-12-07 11:38 MST, Ryan Day	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Ryan Day 2018-12-06 16:37:03 MST

I suspect I'm just missing a config thing here, but preempt/qos doesn't seem to be working for us right now. We have:

[day36@opal186:~]$ scontrol show config | grep -i preempt
PreemptMode             = CANCEL
PreemptType             = preempt/qos
[day36@opal186:~]$ sacctmgr show qos     
      Name   Priority  GraceTime    Preempt PreemptMode                                    Flags UsageThres UsageFactor       GrpTRES   GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit     GrpWall       MaxTRES MaxTRESPerNode   MaxTRESMins     MaxWall     MaxTRESPU MaxJobsPU MaxSubmitPU     MaxTRESPA MaxJobsPA MaxSubmitPA       MinTRES 
---------- ---------- ---------- ---------- ----------- ---------------------------------------- ---------- ----------- ------------- ------------- ------------- ------- --------- ----------- ------------- -------------- ------------- ----------- ------------- --------- ----------- ------------- --------- ----------- ------------- 
    normal    1000000   00:00:00    standby     cluster                                                        1.000000                                                                                                                                                                                                                      
   standby          1   00:00:00                cluster NoReserve,PartitionMaxNodes,OverPartQOS+               1.000000                                                                                                                                                                                                                      
...

but when I try to preempt a 'standby' job with a 'normal' one, the 'normal' job just sits waiting on resources:
[day36@opal186:~]$ squeue -p pdebug -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R %q"
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) QOS
             43736    pdebug hostname    day36 PD       0:00     32 (Resources) normal
             43734    pdebug    sleep    day36  R       3:49     32 opal[1-32] standby

Here's the slurmctld.log showing the 'normal' jobid 43736 waiting until 43734 exits normally after its sleep finishes:

[2018-12-06T14:32:52.894] sched: _slurm_rpc_allocate_resources JobId=43736 NodeList=(null) usec=264
[2018-12-06T14:32:58.642] backfill: Failed to start JobId=43736 with opal[1-32] avail: Requested nodes are busy
[2018-12-06T14:39:09.828] _job_complete: JobID=43734 State=0x1 NodeCnt=32 WEXITSTATUS 0
[2018-12-06T14:39:09.829] _job_complete: JobID=43734 State=0x8003 NodeCnt=32 done
[2018-12-06T14:39:14.211] sched: Allocate JobID=43736 NodeList=opal[1-32] #CPUs=1152 Partition=pdebug
[2018-12-06T14:39:14.693] _job_complete: JobID=43736 State=0x1 NodeCnt=32 WEXITSTATUS 0
[2018-12-06T14:39:14.694] _job_complete: JobID=43736 State=0x8003 NodeCnt=32 done

Comment 1 Jason Booth 2018-12-07 09:59:18 MST

Hi Ryan,

Can you check that the Preempt field is setup correctly in sacctmgr. 

jason@nh-blue:~/slurm/master$ sacctmgr list qos format=name,priority,preempt
      Name   Priority    Preempt
---------- ---------- ----------
    normal          0
     small     100000        big
       big          0


Preempt
Other QOS' this QOS can preempt.


You may be missing this step:
e.g.
 sacctmgr modify qos small set preempt=big

-Jason

Comment 2 Ryan Day 2018-12-07 10:20:45 MST

Hi Jason,

The 'normal' qos does have preempt over 'standby'. Preemption was working for us under 17.02, but doesn't seem to be working any more.

[day36@opal186:jobdatalua]$ sacctmgr list qos format=name,priority,preempt
      Name   Priority    Preempt 
---------- ---------- ---------- 
    normal    1000000    standby 
   standby          1            
  expedite    2000000    standby 
    exempt    1000000    standby 
pdebug_qu+    1000000            
[day36@opal186:jobdatalua]$

(In reply to Jason Booth from comment #1)
> Hi Ryan,
> 
> Can you check that the Preempt field is setup correctly in sacctmgr. 
> 
> jason@nh-blue:~/slurm/master$ sacctmgr list qos format=name,priority,preempt
>       Name   Priority    Preempt
> ---------- ---------- ----------
>     normal          0
>      small     100000        big
>        big          0
> 
> 
> Preempt
> Other QOS' this QOS can preempt.
> 
> 
> You may be missing this step:
> e.g.
>  sacctmgr modify qos small set preempt=big
> 
> -Jason

Comment 3 Jason Booth 2018-12-07 11:16:06 MST

Hi Ryan,

 I ran a few tests with the configuration you mentioned on 17.11.8 but could not recreate the issue. Preemption worked for me. 

 Please attach your slurm.conf, and the output of sprio. This will help me verify that the preemptor is a higher priority and that I am not overlooking anything from your configuration.

-Jason

Comment 4 Ryan Day 2018-12-07 11:38:19 MST

Created attachment 8560 [details]
slum.conf

Comment 5 Ryan Day 2018-12-07 11:38:55 MST

Created attachment 8561 [details]
file included in slurm.conf

Comment 6 Ryan Day 2018-12-07 11:42:21 MST

I'm switching things slightly on you here. The testbed cluster that I used for my first example is running priority/basic, so sprio doesn't do anything. I originally saw this on a production cluster running priority/multifactor though, so I've attached the slurm.conf from that cluster. Here's what the sprio, etc output look like on that cluster:

[day36@pascal83:~]$ squeue -p pvis -o "%i %R %q"
JOBID NODELIST(REASON) QOS
190499 (Resources) exempt
190498 pascal[1,3-13,15-20,22-31] standby
[day36@pascal83:~]$ squeue -p pvis -o "%i %R %q"
JOBID NODELIST(REASON) QOS
190499 (Resources) exempt
190498 pascal[1,3-13,15-20,22-31] standby
[day36@pascal83:~]$ sprio
          JOBID PARTITION   PRIORITY        AGE  FAIRSHARE        QOS
         190499 pvis         1082689          1      82689    1000000
[day36@pascal83:~]$ sacctmgr list qos format=name,priority,preempt
      Name   Priority    Preempt 
---------- ---------- ---------- 
    normal    1000000    standby 
   standby          1            
  expedite    2000000    standby 
    exempt    1000000    standby 
pdebug_qu+    1000000            
pdebug_ca+    1000000            
[day36@pascal83:~]$

I'll note that the 'pvis' partition is not the default partition, but it's otherwise set up the same as the default partition. Could that cause problems?

Comment 8 Jason Booth 2018-12-07 16:35:40 MST

Hi Ryan,

 I was able to duplicate this issue and it was fixed in the following commit:

d103d20039f4dad6b8b196400213fb08c2cbe04a

I would highly suggest that you upgrade to at least 17.11.12 since there are a number of issues that have been fixed since 17.11.8.

-Jason

Comment 9 Jason Booth 2018-12-07 16:39:04 MST

Closing this out for now since this is known issue that was fixed in 17.11.10

*** This ticket has been marked as a duplicate of ticket 5293 ***