Ticket 13330

Summary:	qos preemption not triggering
Product:	Slurm	Reporter:	Todd Merritt <tmerritt>
Component:	Scheduling	Assignee:	Ben Roberts <ben>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	2 - High Impact
Priority:	---	CC:	cinek
Version:	21.08.5
Hardware:	Linux
OS:	Linux
Site:	U of AZ	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---
Attachments:	sdiag output slurmctld log slurm config debug5 slurmctld log slurmctld log from 20220207 slurmctld log

Description Todd Merritt 2022-02-03 17:10:19 MST

We've had this job queued all day that should have preempted lower priority jobs

root@ericidle:~ # scontrol show job 3000709
JobId=3000709 JobName=perftime_umachine_no_inline
   UserId=hwzhang0595(40014) GroupId=behroozi(31439) MCS_label=N/A
   Priority=3 Nice=0 Account=behroozi QOS=user_qos_behroozi
   JobState=PENDING Reason=Priority Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=1-00:00:00 TimeMin=N/A
   SubmitTime=2022-02-03T09:45:26 EligibleTime=2022-02-03T09:45:26
   AccrueTime=2022-02-03T09:45:26
   StartTime=2022-02-05T03:41:08 EndTime=2022-02-06T03:41:08 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-02-03T17:05:25 Scheduler=Main
   Partition=high_priority AllocNode:Sid=r2u08n2:57776
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=1-1 NumCPUs=94 NumTasks=94 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=94,mem=470G,node=1,billing=94
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryCPU=5G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/xdisk/behroozi/mig2020/extra/hwzhang0595/Codes/perftime_umachine_no_inline.pbs
   WorkDir=/xdisk/behroozi/mig2020/extra/hwzhang0595/Codes
   StdErr=/xdisk/behroozi/mig2020/extra/hwzhang0595/Codes/perftime_umachine_no_inline.out
   StdIn=/dev/null
   StdOut=/xdisk/behroozi/mig2020/extra/hwzhang0595/Codes/perftime_umachine_no_inline.out
   Power=
   MailUser=hwzhang0595@email.arizona.edu MailType=INVALID_DEPEND,BEGIN,END,FAIL,REQUEUE,STAGE_OUT
   
Please advise where to look for the reason. I'll send some additional diagnostic information but it should be preempting anything with the part_qos_windfall.

root@ericidle:~ # sacctmgr --parsable2 show qos user_qos_behroozi
Name|Priority|GraceTime|Preempt|PreemptExemptTime|PreemptMode|Flags|UsageThres|UsageFactor|GrpTRES|GrpTRESMins|GrpTRESRunMins|GrpJobs|GrpSubmit|GrpWall|MaxTRES|MaxTRESPerNode|MaxTRESMins|MaxWall|MaxTRESPU|MaxJobsPU|MaxSubmitPU|MaxTRESPA|MaxJobsPA|MaxSubmitPA|MinTRES
user_qos_behroozi|7|00:00:00|part_qos_windfall||cluster|OverPartQOS||1.000000|cpu=4047|cpu=33177600||2000|2000||||||||||||

Thanks,
Todd

Comment 1 Todd Merritt 2022-02-03 17:11:11 MST

Created attachment 23273 [details]
sdiag output

Comment 2 Todd Merritt 2022-02-03 17:13:31 MST

Created attachment 23274 [details]
slurmctld log

Comment 3 Marcin Stolarek 2022-02-04 05:08:41 MST

Todd,

Could you please share scontrol show job JOBs with the jobs that should be preempted to run 3000709?

cheers,
Marcin

Comment 4 Todd Merritt 2022-02-04 05:34:03 MST

root@ericidle:~ # squeue -p windfall --state R
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           2984636  windfall     Job1 jeongpil  R 3-06:01:54      4 r4u37n[1-2],r4u38n[1-2]
           2984635  windfall     Job1 jeongpil  R 3-06:02:54      4 r4u05n[1-2],r4u06n[1-2]
           2984634  windfall     Job1 jeongpil  R 3-06:03:26      4 r4u03n[1-2],r4u04n[1-2]
           2984633  windfall     Job1 jeongpil  R 3-06:03:54      4 r3u37n[1-2],r3u38n[1-2]
           2984632  windfall     Job1 jeongpil  R 3-06:04:54      4 r3u05n[1-2],r3u06n[1-2]
           2984631  windfall     Job1 jeongpil  R 3-06:05:54      4 r3u03n[1-2],r3u04n[1-2]
           2984630  windfall     Job2 jeongpil  R 3-06:06:54      4 r2u06n[1-2],r2u37n[1-2]
           2984629  windfall     Job2 jeongpil  R 3-06:07:54      4 r2u04n[1-2],r2u05n[1-2]
           2984628  windfall     Job2 jeongpil  R 3-06:08:54      4 r2u03n[1-2],r2u38n[1-2]
           2984627  windfall     Job2 jeongpil  R 3-06:09:54      4 r1u39n[1-2],r1u40n[1-2]
           2984626  windfall     Job2 jeongpil  R 3-06:10:54      4 r1u37n[1-2],r1u38n[1-2]
           2984625  windfall     Job2 jeongpil  R 3-06:11:54      4 r1u05n[1-2],r1u06n[1-2]
           2984624  windfall     Job2 jeongpil  R 3-06:12:54      4 r1u03n[1-2],r1u04n[1-2]
           2984637  windfall     Job1 jeongpil  R 3-06:00:54      3 r4u39n[1-2],r4u40n2
           2965309  windfall  gauss_9 epalikot  R    2:16:14      3 r2u29n2,r2u33n2,r4u10n2
           3004275  windfall gauss_11 epalikot  R    1:47:56      3 r1u34n2,r1u35n1,r1u36n1
           2965312  windfall gauss_14 epalikot  R    2:25:59      3 r4u11n1,r4u14n2,r4u36n2
           2981461  windfall  gauss_7 epalikot  R    2:25:59      3 r2u09n[1-2],r2u25n1
           3004298  windfall gauss_19 epalikot  R    1:39:41      2 r1u08n1,r1u25n1
           3004273  windfall  gauss_8 epalikot  R    1:47:55      2 r5u27n1,r5u29n1
           2965310  windfall gauss_10 epalikot  R    2:25:59      2 r1u10n2,r1u13n1
           2965316  windfall gauss_18 epalikot  R    2:25:59      2 r1u33n2,r2u34n1
           2981467  windfall gauss_15 epalikot  R    2:25:59      2 r1u15n2,r1u17n2
           2983446  windfall   n72s2- jeongpil  R 3-14:10:01      1 r5u07n1
           2983450  windfall    n72s3 jeongpil  R 3-14:21:01      1 r5u31n1
           2983449  windfall    n72s2 jeongpil  R 3-14:22:01      1 r5u11n1
           2983448  windfall    n72s1 jeongpil  R 3-14:23:01      1 r5u09n1
           2983447  windfall   n72s1- jeongpil  R 3-14:23:45      1 r5u08n1
           2916058  windfall be_p_t_e   ludwik  R       8:20      1 r3u13n1
           2991003  windfall      B_2  klshark  R       8:20      1 r1u28n1
           3003631  windfall model2-1 fatemehm  R      24:29      1 r3u14n1
           2866822  windfall be_s_d_4 fleonars  R      25:32      1 r3u27n1
           2886948  windfall   inf_11    bubin  R      25:32      1 r3u32n2
        2991005_14  windfall   humann aponsero  R      32:22      1 r2u25n1
           2916705  windfall     adn2  stepans  R      33:00      1 r3u08n1
           2926558  windfall     add3  stepans  R      33:00      1 r3u12n1
           3003630  windfall model2-1 fatemehm  R      33:00      1 r3u29n1
           2845533  windfall      B_5  klshark  R      33:56      1 r3u11n1
           2866750  windfall be_s_d_1 fleonars  R      33:56      1 r3u29n1
           2986445  windfall    b_s_6   monika  R      33:56      1 r4u36n2
           2909168  windfall     add1  stepans  R      49:27      1 r1u34n1
           3000726  windfall     urd9  stepans  R      49:27      1 r1u32n2
           2981683  windfall     tmd7  stepans  R      49:32      1 r1u29n2
           2430705  windfall       N3  klshark  R      50:22      1 r1u29n1
           2916374  windfall    b_p_7   monika  R      50:22      1 r1u32n2
           2986457  windfall    b_p_8   monika  R      50:22      1 r1u16n2
           2795925  windfall   be_s_6   trzask  R      50:23      1 r1u13n1
        2991005_26  windfall   humann aponsero  R      57:12      1 r3u14n2
           2991059  windfall     tmd6  stepans  R      57:46      1 r3u16n2
           3003629  windfall model2-1 fatemehm  R    1:06:00      1 r3u18n1
        2991005_24  windfall   humann aponsero  R    1:22:11      1 r3u17n2
           2633181  windfall  be_s_10   trzask  R    1:23:12      1 r3u25n1
           2742693  windfall be_p_t_1   teodar  R    1:23:12      1 r3u16n1
           2751961  windfall       N5  klshark  R    1:23:12      1 r3u17n2
           2759689  windfall li_1_inf    bubin  R    1:23:12      1 r3u35n2
           3003628  windfall model2-1 fatemehm  R    1:30:19      1 r3u36n1
           2823954  windfall fin_1700    bubin  R    1:31:13      1 r3u33n1
           3004274  windfall  gauss_6 epalikot  R    1:38:02      1 r1u36n1
           2991051  windfall     tmd5  stepans  R    1:38:22      1 r1u08n1
           2886928  windfall be_p_t_e   ludwik  R    1:39:10      1 r1u34n2
           2909170  windfall     tmd1  stepans  R    1:46:37      1 r1u36n2
         2991005_9  windfall   humann aponsero  R    1:57:55      1 r4u35n2
        2991005_23  windfall   humann aponsero  R    1:58:13      1 r4u30n1
           2899442  windfall be_s_t_9   teodar  R    1:58:54      1 r1u31n1
           2990980  windfall be_p_t_e   ludwik  R    1:58:54      1 r4u18n1
           2981466  windfall     urd6  stepans  R    2:14:01      1 r4u11n2
        2991005_15  windfall   humann aponsero  R    2:32:42      1 r4u17n2
        2991005_25  windfall   humann aponsero  R    2:32:42      1 r3u35n1
           2981457  windfall     urd4  stepans  R    2:32:44      1 r2u14n1
           3002836  windfall    n64s2 jeongpil  R    2:48:17      1 r4u14n2
           3002837  windfall    n64s3 jeongpil  R    2:48:17      1 r2u27n1
           2986182  windfall    lih_2   ludwik  R    2:48:37      1 r5u13n1
           2998473  windfall n54t3s1- jeongpil  R    6:24:43      1 r2u11n1
           3002842  windfall   n48s2- jeongpil  R    6:54:20      1 r1u10n2
           3002843  windfall   n48s1- jeongpil  R    6:54:20      1 r1u11n1
           3002844  windfall    n48s1 jeongpil  R    6:54:20      1 r1u11n1
           3002845  windfall    n48s2 jeongpil  R    6:54:20      1 r1u13n2
           3002846  windfall    n48s3 jeongpil  R    6:54:20      1 r1u13n2
           3002785  windfall   sn72s3 jeongpil  R    7:14:07      1 r2u27n2
           3002833  windfall   n64s2- jeongpil  R    7:14:07      1 r2u35n1
           3002834  windfall   n64s1- jeongpil  R    7:14:07      1 r3u12n2
           3002835  windfall    n64s1 jeongpil  R    7:14:07      1 r3u12n2
           3002838  windfall  sn64s1- jeongpil  R    7:14:07      1 r4u07n1
           3002839  windfall   sn64s1 jeongpil  R    7:14:07      1 r4u07n1
           3002840  windfall   sn64s2 jeongpil  R    7:14:07      1 r4u25n1
           3002841  windfall   sn64s3 jeongpil  R    7:14:07      1 r4u25n1
           2998472  windfall n54t3s2- jeongpil  R    8:14:01      1 r1u14n2
           3002781  windfall  sn72s2- jeongpil  R    8:14:01      1 r1u25n2
           3002782  windfall  sn72s1- jeongpil  R    8:14:01      1 r1u34n1
           3002783  windfall   sn72s1 jeongpil  R    8:14:01      1 r1u35n1
           3002784  windfall   sn72s2 jeongpil  R    8:14:01      1 r1u36n2
           2998327  windfall   sn54s1 jeongpil  R   12:13:15      1 r4u07n1
           2998328  windfall   sn54s2 jeongpil  R   12:13:15      1 r4u17n1
           2998329  windfall   sn54s3 jeongpil  R   12:13:15      1 r3u08n1
           2998304  windfall    n54s2 jeongpil  R   16:08:42      1 r2u30n1
           2998305  windfall    n54s3 jeongpil  R   16:08:42      1 r2u30n2
           2998474  windfall  n54t3s1 jeongpil  R 1-05:13:43      1 r2u14n1
           2998475  windfall  n54t3s2 jeongpil  R 1-05:13:43      1 r2u14n2
           2998476  windfall  n54t3s3 jeongpil  R 1-05:13:43      1 r2u14n2
           2998302  windfall   n54s1- jeongpil  R 1-05:20:27      1 r3u31n2
           2998303  windfall    n54s1 jeongpil  R 1-05:27:35      1 r1u11n1
           2998301  windfall   n54s2- jeongpil  R 1-05:50:04      1 r1u15n2
           2998325  windfall  sn54s2- jeongpil  R 1-05:50:04      1 r2u14n1
           2998326  windfall  sn54s1- jeongpil  R 1-05:50:04      1 r3u09n1
           2986458  windfall     tmd9  stepans  R 2-08:47:08      1 r2u17n1
           2973272  windfall  sn64s2- jeongpil  R 2-09:04:25      1 r4u15n2
           2926561  windfall     urd3  stepans  R 2-09:15:34      1 r4u28n1
           2931154  windfall     add8  stepans  R 2-09:15:34      1 r4u30n1
           2981682  windfall     add7  stepans  R 2-11:30:01      1 r4u18n2
           2981454  windfall     add4  stepans  R 2-18:43:19      1 r3u34n1
           2931157  windfall     urd8  stepans  R 3-05:21:10      1 r1u35n1
           2897796  windfall     add5  stepans  R 3-16:38:10      1 r2u07n2
           2849309  windfall be_p_t_e   ludwik  R 6-11:03:42      1 r4u17n2
           2931155  windfall     adn8  stepans  R 7-03:52:30      1 r1u14n2
           2406328  windfall     B_10  klshark  R 7-15:24:36      1 r1u07n1
           2769086  windfall be_p_t_e   ludwik  R 7-15:24:36      1 r1u13n2
           2849311  windfall       C7   ludwik  R 7-15:24:36      1 r2u09n1
           2875690  windfall be_s_d_6 fleonars  R 7-15:24:36      1 r1u18n2
           2886931  windfall       C1   ludwik  R 7-15:24:36      1 r2u13n1
           2886963  windfall      B_6  klshark  R 7-15:24:36      1 r2u15n1
           2897145  windfall be_s_d_1 fleonars  R 7-15:24:36      1 r1u29n2
           2899420  windfall be_s_d_9 fleonars  R 7-15:24:36      1 r1u18n2
           2899455  windfall li_d_11_   teodar  R 7-15:24:36      1 r1u27n1
           2916338  windfall  cp_p_1a  klshark  R 7-15:24:36      1 r2u25n2
           2916340  windfall      B_9  klshark  R 7-15:24:36      1 r2u14n2
           2924858  windfall be_s_d_5 fleonars  R 7-15:24:36      1 r2u36n1
           2929702  windfall be_p_t_e   ludwik  R 7-15:24:36      1 r2u15n2
           2897798  windfall     adn5  stepans  R 7-15:50:40      1 r4u11n2
           2931159  windfall     adn9  stepans  R 7-15:50:40      1 r4u26n2
           2886932  windfall       C6   ludwik  R 7-15:50:42      1 r4u10n2
           2873914  windfall      B_4  klshark  R 7-16:01:03      1 r3u35n1
           2916706  windfall     add2  stepans  R 8-06:12:51      1 r2u25n2
           2909169  windfall     adn1  stepans  R 8-06:38:09      1 r4u36n1
           2909630  windfall     adn0  stepans  R 8-06:38:09      1 r2u08n1
           2912417  windfall     li_3    bubin  R 8-06:38:09      1 r2u08n1

Thanks!

Comment 5 Marcin Stolarek 2022-02-04 06:21:22 MST

Isn't the job submitted to a different partition?
>Partition=high_priority[...]

I don't see this name in the config you shared with us before. Could you please share the current slurm.conf?

cheers,
Marcin

Comment 6 Todd Merritt 2022-02-04 06:35:49 MST

Created attachment 23277 [details]
slurm config

Yep, We added that partition recently to keep non-preemptible jobs from running on nodes that individual faculty purchased. The only difference is the node list that's associated with the "standard" partition.

Thanks!

Comment 7 Ben Roberts 2022-02-04 10:21:20 MST

Hi Todd,

Has preemption worked since making the change you mention (using the high_priority partition)?  Is just this user affected by this?  Can you send the output of 'scontrol show job <jobid>' for one of the running windfall jobs?  I would also like to see the different QOS's that you have defined.  Can I have you send the output of this command as well?
sacctmgr show qos format=name,preempt,preemptmode,flags

Thanks,
Ben

Comment 8 Todd Merritt 2022-02-04 10:43:03 MST

Presumably, it's been working. Our users are generally quick to let us know when it's not working and that has been in place since November. The other more recent change is that we upgraded from 20 to 21 at the end of January. It looks like there are a number of other jobs that I would expect to start that are also blocked

root@ericidle:~ # squeue -p high_priority
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           3000708 high_prio perftime hwzhang0 PD       0:00      1 (Priority)
           3000709 high_prio perftime hwzhang0 PD       0:00      1 (Priority)
           3000349 high_prio Sample_M  fgoeltl PD       0:00      1 (Resources)
           3000352 high_prio Sample_M  fgoeltl PD       0:00      1 (Priority)
           3000365 high_prio Sample_M  fgoeltl PD       0:00      1 (Priority)
           3000697 high_prio perftime hwzhang0 PD       0:00      1 (Priority)
 3005975_[738-785] high_prio LSSTxSO_    xfang PD       0:00      1 (Priority)

root@ericidle:~ # sacctmgr --parsable2 show qos format=name,preempt,preemptmode,flags
Name|Preempt|PreemptMode|Flags
normal||cluster|
part_qos_windfall|user_qos_idlecycles|cluster|
part_qos_standard|part_qos_windfall,user_qos_idlecycles|cluster|
user_qos_tmerritt|part_qos_windfall|cluster|OverPartQOS
user_qos_idlecycles||cluster|OverPartQOS
user_qos_nkchen|part_qos_windfall|cluster|OverPartQOS
user_qos_timeifler|part_qos_windfall|cluster|OverPartQOS
user_qos_josh|part_qos_windfall|cluster|OverPartQOS
user_qos_jlbredas|part_qos_windfall|cluster|OverPartQOS
user_qos_denard|part_qos_windfall|cluster|OverPartQOS
user_qos_kgklein|part_qos_windfall|cluster|OverPartQOS
user_qos_xytang|part_qos_windfall|cluster|OverPartQOS
user_qos_jrussell|part_qos_windfall|cluster|OverPartQOS
user_qos_chertkov|part_qos_windfall|cluster|OverPartQOS
user_qos_amainzer|part_qos_windfall|cluster|OverPartQOS
user_qos_douglase|part_qos_windfall|cluster|OverPartQOS
user_qos_sprinkjm|part_qos_windfall|cluster|OverPartQOS
user_qos_hanquist|part_qos_windfall|cluster|OverPartQOS
user_qos_cbender|part_qos_windfall|cluster|OverPartQOS
user_qos_gbesla|part_qos_windfall|cluster|OverPartQOS
user_qos_rgutenk|part_qos_windfall|cluster|OverPartQOS
user_qos_fgoeltl|part_qos_windfall|cluster|OverPartQOS
user_qos_sschwartz|part_qos_windfall|cluster|OverPartQOS
user_qos_hamden|part_qos_windfall|cluster|OverPartQOS
user_qos_yshirley|part_qos_windfall|cluster|OverPartQOS
user_qos_behroozi|part_qos_windfall|cluster|OverPartQOS
user_qos_|part_qos_windfall|cluster|OverPartQOS
user_qos_ludwik|part_qos_windfall|cluster|OverPartQOS
qual_qos_tzega|part_qos_windfall|cluster|OverPartQOS
qual_qos_latmarat|part_qos_windfall|cluster|OverPartQOS
qual_qos_ericlyons|part_qos_windfall|cluster|OverPartQOS
qual_qos_dukepauli|part_qos_windfall|cluster|OverPartQOS
qual_qos_faselh|part_qos_windfall|cluster|OverPartQOS
qual_qos_ruichang|part_qos_windfall|cluster|OverPartQOS

Here are a couple of jobs that I'd expect to be preempted

root@ericidle:~ # scontrol show job 2984636
JobId=2984636 JobName=Job1
   UserId=jeongpilsong(43439) GroupId=mazumdar(30580) MCS_label=N/A
   Priority=15 Nice=0 Account=windfall QOS=part_qos_windfall
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=3-11:10:52 TimeLimit=10-00:00:00 TimeMin=N/A
   SubmitTime=2022-01-31T23:18:52 EligibleTime=2022-01-31T23:18:52
   AccrueTime=2022-01-31T23:18:52
   StartTime=2022-01-31T23:31:31 EndTime=2022-02-10T23:31:31 Deadline=N/A
   PreemptEligibleTime=2022-01-31T23:31:31 PreemptTime=None
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-01-31T23:31:31 Scheduler=Main
   Partition=windfall AllocNode:Sid=wentletrap:25825
   ReqNodeList=(null) ExcNodeList=r1u07n[1-2],r1u08n[1-2],r1u09n[1-2],r1u10n[1-2],r1u11n[1-2],r1u12n[1-2],r1u13n[1-2],r1u14n[1-2],r1u15n[1-2],r1u16n[1-2],r1u17n[1-2],r1u18n[1-2],r1u25n[1-2],r1u26n[1-2],r1u27n[1-2],r1u28n[1-2],r1u29n[1-2],r1u30n[1-2],r1u31n[1-2],r1u32n[1-2],r1u33n[1-2],r1u34n[1-2],r1u35n[1-2],r1u36n[1-2],r2u07n[1-2],r2u08n[1-2],r2u09n[1-2],r2u10n[1-2],r2u11n[1-2],r2u12n[1-2],r2u13n[1-2],r2u14n[1-2],r2u15n[1-2],r2u16n[1-2],r2u17n[1-2],r2u18n[1-2],r2u25n[1-2],r2u26n[1-2],r2u27n[1-2],r2u28n[1-2],r2u29n[1-2],r2u30n[1-2],r2u31n[1-2],r2u32n[1-2],r2u33n[1-2],r2u34n[1-2],r2u35n[1-2],r2u36n[1-2],r3u07n[1-2],r3u08n[1-2],r3u09n[1-2],r3u10n[1-2],r3u11n[1-2],r3u12n[1-2],r3u13n[1-2],r3u14n[1-2],r3u15n[1-2],r3u16n[1-2],r3u17n[1-2],r3u18n[1-2],r3u25n[1-2],r3u26n[1-2],r3u27n[1-2],r3u28n[1-2],r3u29n[1-2],r3u30n[1-2],r3u31n[1-2],r3u32n[1-2],r3u33n[1-2],r3u34n[1-2],r3u35n[1-2],r3u36n[1-2],r4u07n[1-2],r4u08n[1-2],r4u09n[1-2],r4u10n[1-2],r4u11n[1-2],r4u12n[1-2],r4u13n[1-2],r4u14n[1-2],r4u15n[1-2],r4u16n[1-2],r4u17n[1-2],r4u18n[1-2],r4u25n[1-2],r4u26n[1-2],r4u27n[1-2],r4u28n[1-2],r4u29n[1-2],r4u30n[1-2],r4u31n[1-2],r4u32n[1-2],r4u33n[1-2],r4u34n[1-2],r4u35n[1-2],r4u36n[1-2],r5u13n1,r5u15n1,r5u17n1,r5u19n1,r5u24n1,r5u25n1,r5u27n1,r5u29n1
   NodeList=r4u37n[1-2],r4u38n[1-2]
   BatchHost=r4u37n1
   NumNodes=4 NumCPUs=376 NumTasks=376 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=376,mem=1504G,node=4,billing=376
   Socks/Node=* NtasksPerN:B:S:C=94:0:*:* CoreSpec=*
   MinCPUsNode=94 MinMemoryCPU=4G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=USER Contiguous=0 Licenses=(null) Network=(null)
   Command=/home/u19/jeongpilsong/mps/mps-hubbard-triangle/calc/9x6/nup18/u9/x2/v2.4/pbs
   WorkDir=/home/u19/jeongpilsong/mps/mps-hubbard-triangle/calc/9x6/nup18/u9/x2/v2.4
   StdErr=/home/u19/jeongpilsong/mps/mps-hubbard-triangle/calc/9x6/nup18/u9/x2/v2.4/output.out
   StdIn=/dev/null
   StdOut=/home/u19/jeongpilsong/mps/mps-hubbard-triangle/calc/9x6/nup18/u9/x2/v2.4/output.out
   Power=
   

root@ericidle:~ # scontrol show job 2984635
JobId=2984635 JobName=Job1
   UserId=jeongpilsong(43439) GroupId=mazumdar(30580) MCS_label=N/A
   Priority=15 Nice=0 Account=windfall QOS=part_qos_windfall
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=3-11:12:00 TimeLimit=10-00:00:00 TimeMin=N/A
   SubmitTime=2022-01-31T23:18:50 EligibleTime=2022-01-31T23:18:50
   AccrueTime=2022-01-31T23:18:50
   StartTime=2022-01-31T23:30:31 EndTime=2022-02-10T23:30:31 Deadline=N/A
   PreemptEligibleTime=2022-01-31T23:30:31 PreemptTime=None
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-01-31T23:30:31 Scheduler=Main
   Partition=windfall AllocNode:Sid=wentletrap:25825
   ReqNodeList=(null) ExcNodeList=r1u07n[1-2],r1u08n[1-2],r1u09n[1-2],r1u10n[1-2],r1u11n[1-2],r1u12n[1-2],r1u13n[1-2],r1u14n[1-2],r1u15n[1-2],r1u16n[1-2],r1u17n[1-2],r1u18n[1-2],r1u25n[1-2],r1u26n[1-2],r1u27n[1-2],r1u28n[1-2],r1u29n[1-2],r1u30n[1-2],r1u31n[1-2],r1u32n[1-2],r1u33n[1-2],r1u34n[1-2],r1u35n[1-2],r1u36n[1-2],r2u07n[1-2],r2u08n[1-2],r2u09n[1-2],r2u10n[1-2],r2u11n[1-2],r2u12n[1-2],r2u13n[1-2],r2u14n[1-2],r2u15n[1-2],r2u16n[1-2],r2u17n[1-2],r2u18n[1-2],r2u25n[1-2],r2u26n[1-2],r2u27n[1-2],r2u28n[1-2],r2u29n[1-2],r2u30n[1-2],r2u31n[1-2],r2u32n[1-2],r2u33n[1-2],r2u34n[1-2],r2u35n[1-2],r2u36n[1-2],r3u07n[1-2],r3u08n[1-2],r3u09n[1-2],r3u10n[1-2],r3u11n[1-2],r3u12n[1-2],r3u13n[1-2],r3u14n[1-2],r3u15n[1-2],r3u16n[1-2],r3u17n[1-2],r3u18n[1-2],r3u25n[1-2],r3u26n[1-2],r3u27n[1-2],r3u28n[1-2],r3u29n[1-2],r3u30n[1-2],r3u31n[1-2],r3u32n[1-2],r3u33n[1-2],r3u34n[1-2],r3u35n[1-2],r3u36n[1-2],r4u07n[1-2],r4u08n[1-2],r4u09n[1-2],r4u10n[1-2],r4u11n[1-2],r4u12n[1-2],r4u13n[1-2],r4u14n[1-2],r4u15n[1-2],r4u16n[1-2],r4u17n[1-2],r4u18n[1-2],r4u25n[1-2],r4u26n[1-2],r4u27n[1-2],r4u28n[1-2],r4u29n[1-2],r4u30n[1-2],r4u31n[1-2],r4u32n[1-2],r4u33n[1-2],r4u34n[1-2],r4u35n[1-2],r4u36n[1-2],r5u13n1,r5u15n1,r5u17n1,r5u19n1,r5u24n1,r5u25n1,r5u27n1,r5u29n1
   NodeList=r4u05n[1-2],r4u06n[1-2]
   BatchHost=r4u05n1
   NumNodes=4 NumCPUs=376 NumTasks=376 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=376,mem=1504G,node=4,billing=376
   Socks/Node=* NtasksPerN:B:S:C=94:0:*:* CoreSpec=*
   MinCPUsNode=94 MinMemoryCPU=4G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=USER Contiguous=0 Licenses=(null) Network=(null)
   Command=/home/u19/jeongpilsong/mps/mps-hubbard-triangle/calc/9x6/nup18/u9/x2/v2.3/pbs
   WorkDir=/home/u19/jeongpilsong/mps/mps-hubbard-triangle/calc/9x6/nup18/u9/x2/v2.3
   StdErr=/home/u19/jeongpilsong/mps/mps-hubbard-triangle/calc/9x6/nup18/u9/x2/v2.3/output.out
   StdIn=/dev/null
   StdOut=/home/u19/jeongpilsong/mps/mps-hubbard-triangle/calc/9x6/nup18/u9/x2/v2.3/output.out
   Power=
   

Thanks!

Comment 9 Ben Roberts 2022-02-04 11:27:41 MST

I thought there might be an issue with the jobs not having the preemptable QOS listed on the job, but only having it associated by being in the preemptable partition.  For the jobs you're showing that doesn't look like it's the case.  Can I have you get the output of squeue that shows the qos on the job as well as the partition and the time the job has been running?  
squeue -pwindfall --state=running -O jobid,partition,username,timeused,qos

Thanks,
Ben

Comment 10 Ben Roberts 2022-02-04 12:48:13 MST

I've still been looking into this on my side.  The reason I asked for that squeue output was because I wondered if there might be some interaction with the preempt_youngest_first parameter that I see you have set and the possibility that some of the windfall jobs might have another QOS associated with them.  I set up a few scenarios where I thought it might cause a problem, but I haven't been able to reproduce the behavior you're describing yet.  I would still like to see the squeue output I was asking for to eliminate that as a possibility (squeue -pwindfall --state=running -O jobid,partition,username,timeused,qos).  

In addition I wonder if you could temporarily increase the log level for a few minutes.  If job 3000709 is still queued then you can just let it run for 5 minutes.  If that job has run at this point can you submit another job like it and wait for a few minutes before setting the log level back where it is.  I see you have 'debug' logs enabled now, so the commands to do this would be:
scontrol setdebug debug2
scontrol setdebug debug

Thanks,
Ben

Comment 11 Todd Merritt 2022-02-04 15:36:07 MST

Hi Ben, 

Sorry I missed that request. Here's the output

root@ericidle:~ # squeue -pwindfall --state=running -O jobid,partition,username,timeused,qos
JOBID               PARTITION           USER                TIME                QOS                 
2984636             windfall            jeongpilsong        3-16:01:44          part_qos_windfall   
2984635             windfall            jeongpilsong        3-16:02:44          part_qos_windfall   
2984634             windfall            jeongpilsong        3-16:03:16          part_qos_windfall   
2984633             windfall            jeongpilsong        3-16:03:44          part_qos_windfall   
2984632             windfall            jeongpilsong        3-16:04:44          part_qos_windfall   
2984631             windfall            jeongpilsong        3-16:05:44          part_qos_windfall   
2984630             windfall            jeongpilsong        3-16:06:44          part_qos_windfall   
2984629             windfall            jeongpilsong        3-16:07:44          part_qos_windfall   
2984628             windfall            jeongpilsong        3-16:08:44          part_qos_windfall   
2984627             windfall            jeongpilsong        3-16:09:44          part_qos_windfall   
2984626             windfall            jeongpilsong        3-16:10:44          part_qos_windfall   
2984625             windfall            jeongpilsong        3-16:11:44          part_qos_windfall   
2984624             windfall            jeongpilsong        3-16:12:44          part_qos_windfall   
2984637             windfall            jeongpilsong        3-16:00:44          part_qos_windfall   
2965316             windfall            epalikot            8:28                part_qos_windfall   
3004298             windfall            epalikot            8:28                part_qos_windfall   
2983446             windfall            jeongpilsong        4-00:09:51          part_qos_windfall   
2983450             windfall            jeongpilsong        4-00:20:51          part_qos_windfall   
2983449             windfall            jeongpilsong        4-00:21:51          part_qos_windfall   
2983448             windfall            jeongpilsong        4-00:22:51          part_qos_windfall   
2983447             windfall            jeongpilsong        4-00:23:35          part_qos_windfall   
2991057             windfall            aponsero            5:21                part_qos_windfall   
2998473             windfall            jeongpilsong        14:58               part_qos_windfall   
3002781             windfall            jeongpilsong        14:58               part_qos_windfall   
3006949             windfall            jeongpilsong        2:29:35             part_qos_windfall   
3006716             windfall            jeongpilsong        2:44:40             part_qos_windfall   
3006698             windfall            jeongpilsong        2:52:54             part_qos_windfall   
3006699             windfall            jeongpilsong        2:52:54             part_qos_windfall   
2998304             windfall            jeongpilsong        3:00:52             part_qos_windfall   
2998665             windfall            klshark             4:18:38             part_qos_windfall   
3002836             windfall            jeongpilsong        12:48:07            part_qos_windfall   
3002837             windfall            jeongpilsong        12:48:07            part_qos_windfall   
2986182             windfall            ludwik              12:48:27            part_qos_windfall   
3002842             windfall            jeongpilsong        16:54:10            part_qos_windfall   
3002843             windfall            jeongpilsong        16:54:10            part_qos_windfall   
3002844             windfall            jeongpilsong        16:54:10            part_qos_windfall   
3002845             windfall            jeongpilsong        16:54:10            part_qos_windfall   
3002846             windfall            jeongpilsong        16:54:10            part_qos_windfall   
3002785             windfall            jeongpilsong        17:13:57            part_qos_windfall   
3002833             windfall            jeongpilsong        17:13:57            part_qos_windfall   
3002834             windfall            jeongpilsong        17:13:57            part_qos_windfall   
3002835             windfall            jeongpilsong        17:13:57            part_qos_windfall   
3002838             windfall            jeongpilsong        17:13:57            part_qos_windfall   
3002839             windfall            jeongpilsong        17:13:57            part_qos_windfall   
3002840             windfall            jeongpilsong        17:13:57            part_qos_windfall   
3002841             windfall            jeongpilsong        17:13:57            part_qos_windfall   
2998472             windfall            jeongpilsong        18:13:51            part_qos_windfall   
3002782             windfall            jeongpilsong        18:13:51            part_qos_windfall   
3002783             windfall            jeongpilsong        18:13:51            part_qos_windfall   
3002784             windfall            jeongpilsong        18:13:51            part_qos_windfall   
2998327             windfall            jeongpilsong        22:13:05            part_qos_windfall   
2998328             windfall            jeongpilsong        22:13:05            part_qos_windfall   
2998329             windfall            jeongpilsong        22:13:05            part_qos_windfall   
2998305             windfall            jeongpilsong        1-02:08:32          part_qos_windfall   
2998474             windfall            jeongpilsong        1-15:13:33          part_qos_windfall   
2998476             windfall            jeongpilsong        1-15:13:33          part_qos_windfall   
2998302             windfall            jeongpilsong        1-15:20:17          part_qos_windfall   
2998303             windfall            jeongpilsong        1-15:27:25          part_qos_windfall   
2998301             windfall            jeongpilsong        1-15:49:54          part_qos_windfall   
2998325             windfall            jeongpilsong        1-15:49:54          part_qos_windfall   
2998326             windfall            jeongpilsong        1-15:49:54          part_qos_windfall   
2973272             windfall            jeongpilsong        2-19:04:15          part_qos_windfall   
2849309             windfall            ludwik              6-21:03:32          part_qos_windfall   
2406328             windfall            klshark             8-01:24:26          part_qos_windfall   
2769086             windfall            ludwik              8-01:24:26          part_qos_windfall   
2849311             windfall            ludwik              8-01:24:26          part_qos_windfall   
2875690             windfall            fleonarski          8-01:24:26          part_qos_windfall   
2886931             windfall            ludwik              8-01:24:26          part_qos_windfall   
2886963             windfall            klshark             8-01:24:26          part_qos_windfall   
2897145             windfall            fleonarski          8-01:24:26          part_qos_windfall   
2899420             windfall            fleonarski          8-01:24:26          part_qos_windfall   
2899455             windfall            teodar              8-01:24:26          part_qos_windfall   
2916338             windfall            klshark             8-01:24:26          part_qos_windfall   
2916340             windfall            klshark             8-01:24:26          part_qos_windfall   
2924858             windfall            fleonarski          8-01:24:26          part_qos_windfall   
2929702             windfall            ludwik              8-01:24:26          part_qos_windfall   
2886932             windfall            ludwik              8-01:50:32          part_qos_windfall   
2873914             windfall            klshark             8-02:00:53          part_qos_windfall   
2912417             windfall            bubin               8-16:37:59          part_qos_windfall   
3008082             windfall            haydenfoote         23:17               part_qos_windfall   

The previous log was at level debug. I'll bump it to level debug5 for a minute and send you the log. Thanks!

Comment 12 Todd Merritt 2022-02-04 15:45:01 MST

Created attachment 23303 [details]
debug5 slurmctld log

Comment 13 Ben Roberts 2022-02-07 09:25:00 MST

Thanks for sending that output.  The squeue output confirms that my theory about some jobs possibly having a different QOS on the youngest jobs is incorrect.  I've been looking through the logs and I can see that a job (3008144) was able to preempt other jobs to get resources and start. 

[Feb 04 15:39:48.78004    328 sched_agent  0x7f8d166ec700] preempted JobId=2981461 has been requeued to reclaim resources for JobId=3008144
[Feb 04 15:39:48.78004    328 sched_agent  0x7f8d166ec700] preempted JobId=2981461 has been requeued to reclaim resources for JobId=3008144
[Feb 04 15:39:48.78752    328 sched_agent  0x7f8d166ec700] preempted JobId=3004275 has been requeued to reclaim resources for JobId=3008144
[Feb 04 15:39:48.79438    328 sched_agent  0x7f8d166ec700] preempted JobId=3004298 has been requeued to reclaim resources for JobId=3008144
[Feb 04 15:39:48.80094    328 sched_agent  0x7f8d166ec700] preempted JobId=2965316 has been requeued to reclaim resources for JobId=3008144
[Feb 04 15:39:48.80107    328 sched_agent  0x7f8d166ec700] debug3: sched: JobId=3008144. State=PENDING. Reason=Resources. Priority=25. Partition=standard.
...
[Feb 04 15:39:52.949298   328 sched_agent  0x7f8d166ec700] debug3: sched: JobId=3008144 initiated
[Feb 04 15:39:52.949314   328 sched_agent  0x7f8d166ec700] sched: Allocate JobId=3008144 NodeList=r3u13n1,r3u16n[1-2],r3u30n2,r3u36n1,r4u18n1,r4u30n[1-2],r4u34n2,r4u36n[1-2] #CPUs=220 Partition=standard


This job looks like it's in a different account though (acct=mazumdar).  I do see some log entries showing that the acct_policy information is being updated for the behroozi account, so jobs from that account are starting.

[Feb 04 15:35:19.82608    328 sigmgr       0x7f8d1560e700] debug2: acct_policy_job_begin: after adding JobId=2970860, assoc 10785(behroozi/hwzhang0595/standard) grp_used_tres_run_secs(cpu) is 13824000


Since I haven't been able to find a smoking gun from generic cases I think we will need to create a specific test case where we can see specifics of what's happening.  Can you either create a reservation on a node that we can test with or find a node that has a preemptible job running on it that we can focus on?  Once you have a target node selected I would like to see the node details and job details for the preemptible job that is running on the node.  Then if you can increase the log level temporarily again for slurmctld and have the user submit a job that targets that node specifically by adding the "-w <node name>" flag to sbatch.  Collect details about the preemptor job as well while it's pending.  Allow a few minutes to pass and then set the log level back to what it was.  To recap, the steps should look like this:

1.  Identify a node that currently has a preemptible job currently running or create a reservation for a node we can use to test with.
2.  scontrol show node <node name>
3.  scontrol show job <job id>   (for the preemptible job on that node)
4.  scontrol setdebug debug3     (debug3 should be plenty)
5.  Submit a job that requests that node and can preempt the existing job.  Use the 'sbatch -w <node name>' flag to target that node.
6.  scontrol show job <job id>   (for the preemptor job)
7.  Wait a few minutes.
8.  scontrol setdebug debug

Then if you could send the output collected along with the logs for this time period I'll look at what's happening.

Thanks,
Ben

Comment 14 Todd Merritt 2022-02-07 10:44:53 MST

root@ericidle:~ # scontrol show node r2u07n2
NodeName=r2u07n2 Arch=x86_64 CoresPerSocket=48 
   CPUAlloc=91 CPUTot=96 CPULoad=52.76
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=r2u07n2 NodeHostName=r2u07n2 Version=21.08.5
   OS=Linux 3.10.0-1160.53.1.el7.x86_64 #1 SMP Fri Jan 14 13:59:45 UTC 2022 
   RealMemory=515830 AllocMem=414720 FreeMem=435116 Sockets=2 Boards=1
   CoreSpecCount=2 CPUSpecList=0-1 
   State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=20 Owner=N/A MCS_label=N/A
   Partitions=windfall,standard,high_priority 
   BootTime=2022-01-26T14:03:06 SlurmdStartTime=2022-01-26T14:03:36
   LastBusyTime=2022-02-07T03:31:17
   CfgTRES=cpu=96,mem=515830M,billing=96
   AllocTRES=cpu=91,mem=405G
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s


root@ericidle:~ # scontrol show job 3022522
JobId=3022522 JobName=adn1
   UserId=stepans(90770) GroupId=ludwik(30004) MCS_label=N/A
   Priority=2 Nice=0 Account=windfall QOS=part_qos_windfall
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=04:42:36 TimeLimit=10-00:00:00 TimeMin=N/A
   SubmitTime=2022-02-07T05:51:50 EligibleTime=2022-02-07T05:51:50
   AccrueTime=2022-02-07T05:51:50
   StartTime=2022-02-07T05:55:15 EndTime=2022-02-17T05:55:15 Deadline=N/A
   PreemptEligibleTime=2022-02-07T05:55:15 PreemptTime=None
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-02-07T05:55:15 Scheduler=Backfill
   Partition=windfall AllocNode:Sid=wentletrap:10475
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=r2u07n2
   BatchHost=r2u07n2
   NumNodes=1 NumCPUs=30 NumTasks=30 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=30,mem=150G,node=1,billing=30
   Socks/Node=* NtasksPerN:B:S:C=30:0:*:* CoreSpec=*
   MinCPUsNode=30 MinMemoryCPU=5G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/xdisk/ludwik/stepans/puma/gnso1/adn1
   WorkDir=/xdisk/ludwik/stepans/puma/gnso1
   StdErr=/xdisk/ludwik/stepans/puma/gnso1/slurm-3022522.out
   StdIn=/dev/null
   StdOut=/xdisk/ludwik/stepans/puma/gnso1/slurm-3022522.out
   Power=
   

root@ericidle:~ # scontrol show job 3024254
JobId=3024254 JobName=slurm-standard-test
   UserId=tmerritt(7862) GroupId=hpcteam(30001) MCS_label=N/A
   Priority=2 Nice=0 Account=hpcteam QOS=part_qos_standard
   JobState=PENDING Reason=Priority Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=00:10:00 TimeMin=N/A
   SubmitTime=2022-02-07T10:38:58 EligibleTime=2022-02-07T10:38:58
   AccrueTime=2022-02-07T10:38:58
   StartTime=Unknown EndTime=Unknown Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-02-07T10:39:06 Scheduler=Main
   Partition=standard AllocNode:Sid=wentletrap:6769
   ReqNodeList=r2u07n2 ExcNodeList=(null)
   NodeList=(null)
   NumNodes=1-1 NumCPUs=30 NumTasks=30 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=30,mem=150G,node=1,billing=30
   Socks/Node=* NtasksPerN:B:S:C=30:0:*:* CoreSpec=*
   MinCPUsNode=30 MinMemoryCPU=5G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/home/u11/tmerritt/puma/puma-standard.scr
   WorkDir=/home/u11/tmerritt/puma
   StdErr=/home/u11/tmerritt/puma/slurm-standard-test.out
   StdIn=/dev/null
   StdOut=/home/u11/tmerritt/puma/slurm-standard-test.out
   Power=

Comment 15 Todd Merritt 2022-02-07 10:45:30 MST

Created attachment 23318 [details]
slurmctld log from 20220207

Comment 16 Ben Roberts 2022-02-07 14:12:58 MST

Thanks for gathering that information.  I think I have a better idea of what might be happening now.  I can see in the logs that this job is submitted correctly but when it's time to schedule the job it doesn't get fully evaluated.  The log entry looks like this:

[Feb 07 10:39:06.133238 27580 sched_agent  0x7f2c20714700] debug2: sched: JobId=3024254. unable to schedule in Partition=standard (per _failed_partition()). Retaining previous scheduling Reason=Priority. Desc=(null). Priority=2.

This message doesn't make it explicitly clear what the problem is, but if you look at the comments for the _failed_partition() function you can see that it checks to see if the nodes have already been reserved by higher priority jobs.

https://github.com/SchedMD/slurm/blob/cedf4cf35b1ac85a4e3af098a245083dab7c43ec/src/slurmctld/job_scheduler.c#L739-L753

To confirm that having a higher priority does cause this job to be scheduled on this node ahead of other jobs that might be trying to go there first, can I have you try increasing the priority for this job?  You can do this by running the following command:
scontrol update jobid=2844 priority=10000

You will need to be an administrator to set the priority like that.  Can I also have you send the output of 'sprio' so I can see the priority of this job relative to other jobs on the system?  

Thanks,
Ben

Comment 17 Todd Merritt 2022-02-07 15:35:06 MST

I think you may be on to something. When I increased the priority to max, it changes from pending on priority to pending on resources. After sitting in that state for a few seconds it does look like it preempted the jobs that I had expected it to. I had to submit a new test job. The job id is 3027192. I'll attach the slurmctld log as well.

Comment 18 Todd Merritt 2022-02-07 15:40:21 MST

Created attachment 23336 [details]
slurmctld log

Comment 19 Todd Merritt 2022-02-07 15:43:52 MST

Also the sprio output it here. The only really notable thing there is the only job that's asking for 20 full nodes with a priority of 75. Is it possible that there might have been a job like that in the queue when this issue appeared last Friday and it blocked all of the other standard jobs from being able to start? Is there a way I can assess what sprio might have looked like retroactively?

root@ericidle:~ # sprio
          JOBID PARTITION   PRIORITY       SITE    JOBSIZE
        1469937 standard           3          0          4
        2406218 windfall           2          0          3
        2416861 windfall           2          0          3
        2430705 windfall           2          0          3
        2650279 windfall           2          0          3
        2742693 windfall           2          0          3
        2746074 windfall           2          0          3
        2751961 windfall           2          0          3
        2765536 standard           3          0          4
        2795925 windfall           2          0          3
        2823954 windfall           2          0          3
        2824059 windfall           2          0          3
        2845470 windfall           2          0          3
        2845533 windfall           2          0          3
        2872954 standard           3          0          4
        2873914 windfall           2          0          3
        2873923 windfall           2          0          3
        2886928 windfall           2          0          3
        2886931 windfall           2          0          3
        2886932 windfall           2          0          3
        2886962 windfall           2          0          3
        2886963 windfall           2          0          3
        2899416 windfall           2          0          3
        2899455 windfall           2          0          3
        2905643 windfall           2          0          3
        2909238 windfall           2          0          3
        2909240 windfall           2          0          3
        2912326 windfall           2          0          3
        2916340 windfall           2          0          3
        2916374 windfall           2          0          3
        2924811 windfall           2          0          3
        2924858 windfall           2          0          3
        2924861 windfall           2          0          3
        2929768 windfall           2          0          3
        2965661 standard           7          0          8
        2968040 windfall           2          0          3
        2968901 windfall           2          0          3
        2971028 windfall           2          0          3
        2981014 windfall           2          0          3
        2985914 windfall           2          0          3
        2985915 windfall           2          0          3
        2985969 windfall           2          0          3
        2985971 windfall           2          0          3
        2985972 windfall           2          0          3
        2986182 windfall           2          0          3
        2986186 windfall           2          0          3
        2986199 windfall           2          0          3
        2986411 windfall           2          0          3
        2986445 windfall           2          0          3
        2986457 windfall           2          0          3
        2991074 windfall           2          0          3
        2997124 windfall           3          0          4
        2997127 windfall           3          0          4
        2997130 windfall           3          0          4
        2997131 windfall           3          0          4
        2997472 windfall           3          0          4
        2997570 windfall           3          0          4
        2997627 windfall           3          0          4
        2998619 windfall           2          0          3
        2998620 windfall           2          0          3
        2998652 windfall           2          0          3
        2998653 windfall           2          0          3
        3003793 windfall           2          0          3
        3003817 windfall           2          0          3
        3003819 windfall           2          0          3
        3003821 windfall           2          0          3
        3004102 standard           1          0          2
        3004276 windfall           2          0          3
        3008100 standard           2          0          3
        3008354 standard           3          0          4
        3017817 standard           2          0          2
        3017819 standard           2          0          2
        3017820 standard           2          0          2
        3017821 standard           2          0          2
        3017863 windfall           2          0          3
        3017864 windfall           2          0          3
        3017866 windfall           2          0          3
        3017884 windfall           2          0          3
        3017886 windfall           2          0          3
        3017962 windfall           2          0          3
        3019039 standard           1          0          2
        3019040 standard           1          0          2
        3019041 standard           1          0          2
        3019042 standard           1          0          2
        3019473 windfall           3          0          4
        3019474 windfall           3          0          4
        3019475 windfall           3          0          4
        3019476 windfall           3          0          4
        3019477 windfall           3          0          4
        3019478 windfall           3          0          4
        3019479 windfall           3          0          4
        3019480 windfall           3          0          4
        3019481 windfall           3          0          4
        3019482 windfall           3          0          4
        3019483 windfall           3          0          4
        3019484 windfall           3          0          4
        3019485 windfall           3          0          4
        3019486 windfall           3          0          4
        3019487 windfall           3          0          4
        3019488 windfall           3          0          4
        3019489 windfall           3          0          4
        3019490 windfall           3          0          4
        3019491 windfall           3          0          4
        3019492 windfall           3          0          4
        3019493 windfall           3          0          4
        3019494 windfall           3          0          4
        3019495 windfall           3          0          4
        3019496 windfall           3          0          4
        3019497 windfall           3          0          4
        3019498 windfall           3          0          4
        3019499 windfall           3          0          4
        3019500 windfall           3          0          4
        3019501 windfall           3          0          4
        3019502 windfall           3          0          4
        3019503 windfall           3          0          4
        3019504 windfall           3          0          4
        3019505 windfall           3          0          4
        3019506 windfall           3          0          4
        3019507 windfall           3          0          4
        3019508 windfall           3          0          4
        3019509 windfall           3          0          4
        3019510 windfall           3          0          4
        3019511 windfall           3          0          4
        3019675 standard           3          0          4
        3019676 standard           3          0          4
        3020274 windfall           3          0          4
        3020279 windfall           3          0          4
        3020671 standard           1          0          2
        3020672 standard           1          0          2
        3020673 standard           1          0          2
        3020674 standard           1          0          2
        3021604 windfall           1          0          2
        3021611 windfall           1          0          2
        3021612 windfall           1          0          2
        3021615 windfall           1          0          2
        3021616 windfall           1          0          2
        3021617 windfall           1          0          2
        3021687 standard           2          0          2
        3021690 standard           2          0          2
        3021691 standard           2          0          2
        3021692 standard           2          0          2
        3021693 standard           2          0          2
        3021694 standard           2          0          2
        3021695 standard           2          0          2
        3021713 windfall           1          0          2
        3021714 windfall           1          0          2
        3021718 standard           1          0          2
        3021754 windfall           1          0          2
        3021794 windfall           1          0          2
        3021795 windfall           1          0          2
        3021797 windfall           1          0          2
        3021854 windfall           2          0          2
        3021865 windfall           2          0          2
        3021983 windfall           2          0          3
        3022019 windfall           2          0          3
        3022021 windfall           2          0          3
        3022033 windfall           2          0          3
        3022035 windfall           1          0          2
        3022077 windfall           2          0          3
        3022203 windfall           1          0          2
        3022225 windfall           1          0          2
        3022367 windfall           2          0          3
        3022389 standard           1          0          2
        3022478 windfall           2          0          3
        3022479 windfall           2          0          3
        3022522 windfall           2          0          3
        3022523 windfall           2          0          3
        3022524 windfall           2          0          3
        3022730 windfall           2          0          3
        3022747 windfall           2          0          3
        3022749 windfall           2          0          3
        3022766 windfall           2          0          3
        3022784 windfall           2          0          3
        3022787 windfall           2          0          3
        3022805 windfall           2          0          3
        3022822 windfall           2          0          3
        3023026 windfall           2          0          3
        3023040 windfall           1          0          2
        3023041 windfall           1          0          2
        3023043 windfall           1          0          2
        3023046 windfall           2          0          3
        3023088 windfall           2          0          3
        3023382 windfall           1          0          2
        3023383 windfall           1          0          2
        3023384 windfall           1          0          2
        3023989 standard           2          0          2
        3024328 standard           2          0          2
        3024340 standard           2          0          2
        3024357 windfall           1          0          2
        3024358 windfall           1          0          2
        3024359 windfall           1          0          2
        3024438 standard           1          0          2
        3024440 standard           3          0          3
        3024443 windfall           1          0          2
        3024444 windfall           1          0          2
        3024445 windfall           1          0          2
        3024451 standard           2          0          3
        3024506 windfall           1          0          2
        3024507 windfall           1          0          2
        3024508 windfall           1          0          2
        3024603 windfall           1          0          2
        3024604 windfall           1          0          2
        3024605 windfall           1          0          2
        3024654 standard           2          0          2
        3025201 windfall           2          0          2
        3025771 standard           1          0          2
        3025772 standard          75          0         76
        3025782 standard           2          0          2
        3025784 standard           2          0          2
        3026360 standard           3          0          4
        3026438 standard           2          0          2
        3026460 standard           2          0          2
        3026514 standard           3          0          4
        3026515 standard           3          0          4
        3026516 standard           3          0          4
        3026517 standard           3          0          4
        3026518 standard           3          0          4
        3026519 standard           3          0          4
        3026520 standard           3          0          4
        3026521 standard           3          0          4
        3026522 standard           3          0          4
        3026523 standard           3          0          4
        3026524 standard           3          0          4
        3026527 standard           3          0          4
        3026528 standard           3          0          4
        3026529 standard           3          0          4
        3026530 standard           3          0          4
        3026531 standard           3          0          4
        3026532 standard           3          0          4
        3026533 standard           3          0          4
        3026537 standard           2          0          2
        3026873 standard           3          0          4
        3026885 standard           2          0          3
        3026948 standard           1          0          2
        3027060 high_prio          2          0          2
        3027061 high_prio          2          0          2
        3027062 high_prio          2          0          2
        3027063 high_prio          2          0          2
        3027064 high_prio          2          0          2
        3027065 high_prio          2          0          2
        3027066 high_prio          2          0          2
        3027067 high_prio          2          0          2
        3027068 high_prio          2          0          2
        3027069 high_prio          2          0          2
        3027070 high_prio          2          0          2
        3027071 high_prio          2          0          2
        3027072 high_prio          2          0          2
        3027073 high_prio          2          0          2
        3027074 high_prio          2          0          2
        3027075 high_prio          2          0          2
        3027076 high_prio          2          0          2
        3027077 high_prio          2          0          2
        3027078 high_prio          2          0          2
        3027079 high_prio          2          0          2
        3027080 high_prio          2          0          2
        3027081 high_prio          2          0          2
        3027082 high_prio          2          0          2
        3027083 high_prio          2          0          2
        3027084 high_prio          2          0          2
        3027085 high_prio          2          0          2
        3027086 high_prio          2          0          2
        3027087 high_prio          2          0          2
        3027088 high_prio          2          0          2
        3027089 high_prio          2          0          2
        3027090 high_prio          2          0          2
        3027091 high_prio          2          0          2
        3027092 high_prio          2          0          2
        3027093 high_prio          2          0          2
        3027094 high_prio          2          0          2
        3027095 high_prio          2          0          2
        3027096 high_prio          2          0          2
        3027097 high_prio          2          0          2
        3027098 high_prio          2          0          2
        3027099 high_prio          2          0          2
        3027100 high_prio          2          0          2
        3027101 high_prio          2          0          2
        3027102 high_prio          2          0          2
        3027103 high_prio          2          0          2
        3027104 high_prio          2          0          2
        3027105 high_prio          2          0          2
        3027106 high_prio          2          0          2
        3027107 high_prio          2          0          2
        3027108 high_prio          2          0          2
        3027109 high_prio          2          0          2
        3027110 high_prio          2          0          2
        3027111 high_prio          2          0          2
        3027112 high_prio          2          0          2
        3027113 high_prio          2          0          2
        3027114 high_prio          2          0          2
        3027115 high_prio          2          0          2
        3027116 high_prio          2          0          2
        3027117 high_prio          2          0          2
        3027118 high_prio          2          0          2
        3027119 high_prio          2          0          2
        3027120 high_prio          2          0          2
        3027121 high_prio          2          0          2
        3027122 high_prio          2          0          2
        3027123 high_prio          2          0          2
        3027124 high_prio          2          0          2
        3027125 high_prio          2          0          2
        3027126 high_prio          2          0          2
        3027127 high_prio          2          0          2
        3027128 high_prio          2          0          2
        3027129 high_prio          2          0          2
        3027130 high_prio          2          0          2
        3027131 high_prio          2          0          2
        3027132 high_prio          2          0          2
        3027133 high_prio          2          0          2
        3027134 high_prio          2          0          2
        3027135 high_prio          2          0          2
        3027136 high_prio          2          0          2
        3027137 high_prio          2          0          2
        3027138 high_prio          2          0          2
        3027139 high_prio          2          0          2
        3027140 high_prio          2          0          2
        3027141 high_prio          2          0          2
        3027142 high_prio          2          0          2
        3027143 high_prio          2          0          2
        3027144 high_prio          2          0          2
        3027145 high_prio          2          0          2
        3027146 high_prio          2          0          2
        3027147 high_prio          2          0          2
        3027148 high_prio          2          0          2
        3027149 high_prio          2          0          2
        3027150 high_prio          2          0          2
        3027151 high_prio          2          0          2
        3027152 high_prio          2          0          2
        3027153 high_prio          2          0          2
        3027154 high_prio          2          0          2
        3027155 high_prio          2          0          2
        3027156 high_prio          2          0          2
        3027157 high_prio          2          0          2
        3027158 high_prio          2          0          2
        3027159 high_prio          2          0          2
        3027160 high_prio          2          0          2
        3027161 high_prio          2          0          2
        3027162 standard           2          0          3
        3027194 high_prio          3          0          4
        3027195 standard           2          0          3
        3027196 high_prio          3          0          4
        3027197 high_prio          3          0          4
        3027212 standard           3          0          3
        3027224 high_prio          2          0          3
        3027225 standard           1          0          2
        3027226 high_prio          2          0          3
        3027227 high_prio          2          0          3
        3027228 high_prio          2          0          3
        3027229 high_prio          2          0          3
        3027230 high_prio          2          0          3
        3027231 standard           3          0          4
        3027232 high_prio          2          0          3

Thanks!

Comment 20 Ben Roberts 2022-02-08 10:39:08 MST

It is possible that there was a large job requesting a lot of resources that created a priority reservation that blocked other jobs from starting.  If you know the time frame you want to look at you can see if there was a high priority job by running a command like this:

sacct -X --starttime=<start_time> --endtime=<end_time> --format=jobid,priority

You would obviously want to replace <start_time> and <end_time> with the correct date and time values.  

I see that the only priority factor you're taking into consideration is the job size.  Is that still your intention.  Adding something like the job age might be something to consider.  You would need to evaluate the relationship of the age based priority with the size priority, but with the proper balance it could help prevent jobs from getting stuck for too long.  If you're interested in something like that and would like to see some simple examples let me know.

Thanks,
Ben

Comment 21 Todd Merritt 2022-02-08 10:56:01 MST

Thanks Ben, It looks like there are a few contenders in the window that we were seeing the delays
------------ ---------- -------- ---------- 
JobID          Priority   NNodes  Partition 
3002151_[0-+         41       11 high_prio+ 
3001840              42       17   standard 
3001842              42       17   standard 
3003374              42       17   standard 
3001816              46       20   standard 
3002338              46       20   standard 
2998983              47       19   standard 
2998413              48       17   standard 
3002144              55       24   standard 
2998917              60       24   standard 
3001815              64       26   standard 
3001794              65       26   standard 
3001797              65       26   standard 
3001798              65       26   standard 
3003013              65       23   standard 
2993839              75       20   standard 
2993501             193      100   standard 

We do still want to favor large jobs but if we could balance job age into the equation I think that would probably be helpful in avoiding the case that we're running into.

Thanks!

Comment 22 Ben Roberts 2022-02-08 12:50:44 MST

That does seem like the cause of the behavior you observed then. There are a few things to consider when adding age-based priority to the cluster. You need to determine how long is the longest you want a job to be queued before it reaches its maximum age-based priority boost. This value is configured as PriorityMaxAge. Then you need to determine the maximum priority a job will gain from being queued for that amount of time. This value is configured as PriorityWeightAge. Another relevant parameter is PriorityCalcPeriod, which determines how often the priority is re-calculated for jobs on the cluster.

I'll use the following parameters as an example:
PriorityWeightAge = 8000
PriorityWeightJobSize = 1000
PriorityMaxAge = 08:00:00
PriorityCalcPeriod = 00:05:00

This means that jobs get a maximum of 8000 additional priority after being queued for 8 hours. Before the full 8 hours is up, jobs will get a fraction of that total priority as they sit in the queue. So after a job has been queued for 1 hour it will get 1/8 of the overall PriorityWeightAge, so it would get 1000 priority from age. The calculation to see how long it has been queued would happen every 5 minutes so you would see smaller changes than that in practice.

Big jobs will accrue priority from age as well, so this would effectively let you determine how long a job can be queued before jobs of a certain size will no longer have more priority than that job. As an example, here I have a job that requests a single node that has been queued for about 30 minutes. If I submit a new 5 node job you can see that it gets more JobSize based priority, but the amount of time the 1 node job has been queued is enough to still give it greater overall priority.

$ squeue -t pending
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1108 debug wrap ben PD 0:00 1 (Resources)
1113 debug wrap ben PD 0:00 5 (Priority)

$ sprio
JOBID PARTITION PRIORITY SITE AGE JOBSIZE
1108 debug 481 0 426 56
1113 debug 321 0 44 278

If a job was submitted that was large enough to have greater priority than existing jobs at submission time then it would always have more priority than those jobs because all jobs would accrue age-based priority at the same rate.

The relationship between age and size is always site specific and you will have to figure out the right balance for your environment. Hopefully this helps explain how to find that balance though. Let me know if you have any additional questions about this.

Thanks,
Ben

Comment 23 Todd Merritt 2022-02-08 13:32:13 MST

Thanks! I'll play with those settings and see what works best for us. You can close this out.

Comment 24 Ben Roberts 2022-02-08 13:48:11 MST

I'm glad to hear that helps.  I'll close this ticket.

Thanks,
Ben