Ticket 6192

Summary: jobs not being preempted (preempt/qos)
Product: Slurm Reporter: Ryan Day <day36>
Component: SchedulingAssignee: Jason Booth <jbooth>
Status: RESOLVED DUPLICATE QA Contact:
Severity: 3 - Medium Impact    
Priority: ---    
Version: 17.11.8   
Hardware: Linux   
OS: Linux   
Site: LLNL Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: slum.conf
file included in slurm.conf

Description Ryan Day 2018-12-06 16:37:03 MST
I suspect I'm just missing a config thing here, but preempt/qos doesn't seem to be working for us right now. We have:

[day36@opal186:~]$ scontrol show config | grep -i preempt
PreemptMode             = CANCEL
PreemptType             = preempt/qos
[day36@opal186:~]$ sacctmgr show qos     
      Name   Priority  GraceTime    Preempt PreemptMode                                    Flags UsageThres UsageFactor       GrpTRES   GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit     GrpWall       MaxTRES MaxTRESPerNode   MaxTRESMins     MaxWall     MaxTRESPU MaxJobsPU MaxSubmitPU     MaxTRESPA MaxJobsPA MaxSubmitPA       MinTRES 
---------- ---------- ---------- ---------- ----------- ---------------------------------------- ---------- ----------- ------------- ------------- ------------- ------- --------- ----------- ------------- -------------- ------------- ----------- ------------- --------- ----------- ------------- --------- ----------- ------------- 
    normal    1000000   00:00:00    standby     cluster                                                        1.000000                                                                                                                                                                                                                      
   standby          1   00:00:00                cluster NoReserve,PartitionMaxNodes,OverPartQOS+               1.000000                                                                                                                                                                                                                      
...

but when I try to preempt a 'standby' job with a 'normal' one, the 'normal' job just sits waiting on resources:
[day36@opal186:~]$ squeue -p pdebug -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R %q"
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) QOS
             43736    pdebug hostname    day36 PD       0:00     32 (Resources) normal
             43734    pdebug    sleep    day36  R       3:49     32 opal[1-32] standby

Here's the slurmctld.log showing the 'normal' jobid 43736 waiting until 43734 exits normally after its sleep finishes:

[2018-12-06T14:32:52.894] sched: _slurm_rpc_allocate_resources JobId=43736 NodeList=(null) usec=264
[2018-12-06T14:32:58.642] backfill: Failed to start JobId=43736 with opal[1-32] avail: Requested nodes are busy
[2018-12-06T14:39:09.828] _job_complete: JobID=43734 State=0x1 NodeCnt=32 WEXITSTATUS 0
[2018-12-06T14:39:09.829] _job_complete: JobID=43734 State=0x8003 NodeCnt=32 done
[2018-12-06T14:39:14.211] sched: Allocate JobID=43736 NodeList=opal[1-32] #CPUs=1152 Partition=pdebug
[2018-12-06T14:39:14.693] _job_complete: JobID=43736 State=0x1 NodeCnt=32 WEXITSTATUS 0
[2018-12-06T14:39:14.694] _job_complete: JobID=43736 State=0x8003 NodeCnt=32 done
Comment 1 Jason Booth 2018-12-07 09:59:18 MST
Hi Ryan,

Can you check that the Preempt field is setup correctly in sacctmgr. 

jason@nh-blue:~/slurm/master$ sacctmgr list qos format=name,priority,preempt
      Name   Priority    Preempt
---------- ---------- ----------
    normal          0
     small     100000        big
       big          0


Preempt
Other QOS' this QOS can preempt.


You may be missing this step:
e.g.
 sacctmgr modify qos small set preempt=big

-Jason
Comment 2 Ryan Day 2018-12-07 10:20:45 MST
Hi Jason,

The 'normal' qos does have preempt over 'standby'. Preemption was working for us under 17.02, but doesn't seem to be working any more.

[day36@opal186:jobdatalua]$ sacctmgr list qos format=name,priority,preempt
      Name   Priority    Preempt 
---------- ---------- ---------- 
    normal    1000000    standby 
   standby          1            
  expedite    2000000    standby 
    exempt    1000000    standby 
pdebug_qu+    1000000            
[day36@opal186:jobdatalua]$

(In reply to Jason Booth from comment #1)
> Hi Ryan,
> 
> Can you check that the Preempt field is setup correctly in sacctmgr. 
> 
> jason@nh-blue:~/slurm/master$ sacctmgr list qos format=name,priority,preempt
>       Name   Priority    Preempt
> ---------- ---------- ----------
>     normal          0
>      small     100000        big
>        big          0
> 
> 
> Preempt
> Other QOS' this QOS can preempt.
> 
> 
> You may be missing this step:
> e.g.
>  sacctmgr modify qos small set preempt=big
> 
> -Jason
Comment 3 Jason Booth 2018-12-07 11:16:06 MST
Hi Ryan,

 I ran a few tests with the configuration you mentioned on 17.11.8 but could not recreate the issue. Preemption worked for me. 

 Please attach your slurm.conf, and the output of sprio. This will help me verify that the preemptor is a higher priority and that I am not overlooking anything from your configuration.

-Jason
Comment 4 Ryan Day 2018-12-07 11:38:19 MST
Created attachment 8560 [details]
slum.conf
Comment 5 Ryan Day 2018-12-07 11:38:55 MST
Created attachment 8561 [details]
file included in slurm.conf
Comment 6 Ryan Day 2018-12-07 11:42:21 MST
I'm switching things slightly on you here. The testbed cluster that I used for my first example is running priority/basic, so sprio doesn't do anything. I originally saw this on a production cluster running priority/multifactor though, so I've attached the slurm.conf from that cluster. Here's what the sprio, etc output look like on that cluster:

[day36@pascal83:~]$ squeue -p pvis -o "%i %R %q"
JOBID NODELIST(REASON) QOS
190499 (Resources) exempt
190498 pascal[1,3-13,15-20,22-31] standby
[day36@pascal83:~]$ squeue -p pvis -o "%i %R %q"
JOBID NODELIST(REASON) QOS
190499 (Resources) exempt
190498 pascal[1,3-13,15-20,22-31] standby
[day36@pascal83:~]$ sprio
          JOBID PARTITION   PRIORITY        AGE  FAIRSHARE        QOS
         190499 pvis         1082689          1      82689    1000000
[day36@pascal83:~]$ sacctmgr list qos format=name,priority,preempt
      Name   Priority    Preempt 
---------- ---------- ---------- 
    normal    1000000    standby 
   standby          1            
  expedite    2000000    standby 
    exempt    1000000    standby 
pdebug_qu+    1000000            
pdebug_ca+    1000000            
[day36@pascal83:~]$

I'll note that the 'pvis' partition is not the default partition, but it's otherwise set up the same as the default partition. Could that cause problems?
Comment 8 Jason Booth 2018-12-07 16:35:40 MST
Hi Ryan,

 I was able to duplicate this issue and it was fixed in the following commit:

d103d20039f4dad6b8b196400213fb08c2cbe04a

I would highly suggest that you upgrade to at least 17.11.12 since there are a number of issues that have been fixed since 17.11.8.

-Jason
Comment 9 Jason Booth 2018-12-07 16:39:04 MST
Closing this out for now since this is known issue that was fixed in 17.11.10

*** This ticket has been marked as a duplicate of ticket 5293 ***