| Summary: | qos preemption not triggering | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Todd Merritt <tmerritt> |
| Component: | Scheduling | Assignee: | Ben Roberts <ben> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 2 - High Impact | ||
| Priority: | --- | CC: | cinek |
| Version: | 21.08.5 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | U of AZ | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: |
sdiag output
slurmctld log slurm config debug5 slurmctld log slurmctld log from 20220207 slurmctld log |
||
|
Description
Todd Merritt
2022-02-03 17:10:19 MST
Created attachment 23273 [details]
sdiag output
Created attachment 23274 [details]
slurmctld log
Todd, Could you please share scontrol show job JOBs with the jobs that should be preempted to run 3000709? cheers, Marcin root@ericidle:~ # squeue -p windfall --state R
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2984636 windfall Job1 jeongpil R 3-06:01:54 4 r4u37n[1-2],r4u38n[1-2]
2984635 windfall Job1 jeongpil R 3-06:02:54 4 r4u05n[1-2],r4u06n[1-2]
2984634 windfall Job1 jeongpil R 3-06:03:26 4 r4u03n[1-2],r4u04n[1-2]
2984633 windfall Job1 jeongpil R 3-06:03:54 4 r3u37n[1-2],r3u38n[1-2]
2984632 windfall Job1 jeongpil R 3-06:04:54 4 r3u05n[1-2],r3u06n[1-2]
2984631 windfall Job1 jeongpil R 3-06:05:54 4 r3u03n[1-2],r3u04n[1-2]
2984630 windfall Job2 jeongpil R 3-06:06:54 4 r2u06n[1-2],r2u37n[1-2]
2984629 windfall Job2 jeongpil R 3-06:07:54 4 r2u04n[1-2],r2u05n[1-2]
2984628 windfall Job2 jeongpil R 3-06:08:54 4 r2u03n[1-2],r2u38n[1-2]
2984627 windfall Job2 jeongpil R 3-06:09:54 4 r1u39n[1-2],r1u40n[1-2]
2984626 windfall Job2 jeongpil R 3-06:10:54 4 r1u37n[1-2],r1u38n[1-2]
2984625 windfall Job2 jeongpil R 3-06:11:54 4 r1u05n[1-2],r1u06n[1-2]
2984624 windfall Job2 jeongpil R 3-06:12:54 4 r1u03n[1-2],r1u04n[1-2]
2984637 windfall Job1 jeongpil R 3-06:00:54 3 r4u39n[1-2],r4u40n2
2965309 windfall gauss_9 epalikot R 2:16:14 3 r2u29n2,r2u33n2,r4u10n2
3004275 windfall gauss_11 epalikot R 1:47:56 3 r1u34n2,r1u35n1,r1u36n1
2965312 windfall gauss_14 epalikot R 2:25:59 3 r4u11n1,r4u14n2,r4u36n2
2981461 windfall gauss_7 epalikot R 2:25:59 3 r2u09n[1-2],r2u25n1
3004298 windfall gauss_19 epalikot R 1:39:41 2 r1u08n1,r1u25n1
3004273 windfall gauss_8 epalikot R 1:47:55 2 r5u27n1,r5u29n1
2965310 windfall gauss_10 epalikot R 2:25:59 2 r1u10n2,r1u13n1
2965316 windfall gauss_18 epalikot R 2:25:59 2 r1u33n2,r2u34n1
2981467 windfall gauss_15 epalikot R 2:25:59 2 r1u15n2,r1u17n2
2983446 windfall n72s2- jeongpil R 3-14:10:01 1 r5u07n1
2983450 windfall n72s3 jeongpil R 3-14:21:01 1 r5u31n1
2983449 windfall n72s2 jeongpil R 3-14:22:01 1 r5u11n1
2983448 windfall n72s1 jeongpil R 3-14:23:01 1 r5u09n1
2983447 windfall n72s1- jeongpil R 3-14:23:45 1 r5u08n1
2916058 windfall be_p_t_e ludwik R 8:20 1 r3u13n1
2991003 windfall B_2 klshark R 8:20 1 r1u28n1
3003631 windfall model2-1 fatemehm R 24:29 1 r3u14n1
2866822 windfall be_s_d_4 fleonars R 25:32 1 r3u27n1
2886948 windfall inf_11 bubin R 25:32 1 r3u32n2
2991005_14 windfall humann aponsero R 32:22 1 r2u25n1
2916705 windfall adn2 stepans R 33:00 1 r3u08n1
2926558 windfall add3 stepans R 33:00 1 r3u12n1
3003630 windfall model2-1 fatemehm R 33:00 1 r3u29n1
2845533 windfall B_5 klshark R 33:56 1 r3u11n1
2866750 windfall be_s_d_1 fleonars R 33:56 1 r3u29n1
2986445 windfall b_s_6 monika R 33:56 1 r4u36n2
2909168 windfall add1 stepans R 49:27 1 r1u34n1
3000726 windfall urd9 stepans R 49:27 1 r1u32n2
2981683 windfall tmd7 stepans R 49:32 1 r1u29n2
2430705 windfall N3 klshark R 50:22 1 r1u29n1
2916374 windfall b_p_7 monika R 50:22 1 r1u32n2
2986457 windfall b_p_8 monika R 50:22 1 r1u16n2
2795925 windfall be_s_6 trzask R 50:23 1 r1u13n1
2991005_26 windfall humann aponsero R 57:12 1 r3u14n2
2991059 windfall tmd6 stepans R 57:46 1 r3u16n2
3003629 windfall model2-1 fatemehm R 1:06:00 1 r3u18n1
2991005_24 windfall humann aponsero R 1:22:11 1 r3u17n2
2633181 windfall be_s_10 trzask R 1:23:12 1 r3u25n1
2742693 windfall be_p_t_1 teodar R 1:23:12 1 r3u16n1
2751961 windfall N5 klshark R 1:23:12 1 r3u17n2
2759689 windfall li_1_inf bubin R 1:23:12 1 r3u35n2
3003628 windfall model2-1 fatemehm R 1:30:19 1 r3u36n1
2823954 windfall fin_1700 bubin R 1:31:13 1 r3u33n1
3004274 windfall gauss_6 epalikot R 1:38:02 1 r1u36n1
2991051 windfall tmd5 stepans R 1:38:22 1 r1u08n1
2886928 windfall be_p_t_e ludwik R 1:39:10 1 r1u34n2
2909170 windfall tmd1 stepans R 1:46:37 1 r1u36n2
2991005_9 windfall humann aponsero R 1:57:55 1 r4u35n2
2991005_23 windfall humann aponsero R 1:58:13 1 r4u30n1
2899442 windfall be_s_t_9 teodar R 1:58:54 1 r1u31n1
2990980 windfall be_p_t_e ludwik R 1:58:54 1 r4u18n1
2981466 windfall urd6 stepans R 2:14:01 1 r4u11n2
2991005_15 windfall humann aponsero R 2:32:42 1 r4u17n2
2991005_25 windfall humann aponsero R 2:32:42 1 r3u35n1
2981457 windfall urd4 stepans R 2:32:44 1 r2u14n1
3002836 windfall n64s2 jeongpil R 2:48:17 1 r4u14n2
3002837 windfall n64s3 jeongpil R 2:48:17 1 r2u27n1
2986182 windfall lih_2 ludwik R 2:48:37 1 r5u13n1
2998473 windfall n54t3s1- jeongpil R 6:24:43 1 r2u11n1
3002842 windfall n48s2- jeongpil R 6:54:20 1 r1u10n2
3002843 windfall n48s1- jeongpil R 6:54:20 1 r1u11n1
3002844 windfall n48s1 jeongpil R 6:54:20 1 r1u11n1
3002845 windfall n48s2 jeongpil R 6:54:20 1 r1u13n2
3002846 windfall n48s3 jeongpil R 6:54:20 1 r1u13n2
3002785 windfall sn72s3 jeongpil R 7:14:07 1 r2u27n2
3002833 windfall n64s2- jeongpil R 7:14:07 1 r2u35n1
3002834 windfall n64s1- jeongpil R 7:14:07 1 r3u12n2
3002835 windfall n64s1 jeongpil R 7:14:07 1 r3u12n2
3002838 windfall sn64s1- jeongpil R 7:14:07 1 r4u07n1
3002839 windfall sn64s1 jeongpil R 7:14:07 1 r4u07n1
3002840 windfall sn64s2 jeongpil R 7:14:07 1 r4u25n1
3002841 windfall sn64s3 jeongpil R 7:14:07 1 r4u25n1
2998472 windfall n54t3s2- jeongpil R 8:14:01 1 r1u14n2
3002781 windfall sn72s2- jeongpil R 8:14:01 1 r1u25n2
3002782 windfall sn72s1- jeongpil R 8:14:01 1 r1u34n1
3002783 windfall sn72s1 jeongpil R 8:14:01 1 r1u35n1
3002784 windfall sn72s2 jeongpil R 8:14:01 1 r1u36n2
2998327 windfall sn54s1 jeongpil R 12:13:15 1 r4u07n1
2998328 windfall sn54s2 jeongpil R 12:13:15 1 r4u17n1
2998329 windfall sn54s3 jeongpil R 12:13:15 1 r3u08n1
2998304 windfall n54s2 jeongpil R 16:08:42 1 r2u30n1
2998305 windfall n54s3 jeongpil R 16:08:42 1 r2u30n2
2998474 windfall n54t3s1 jeongpil R 1-05:13:43 1 r2u14n1
2998475 windfall n54t3s2 jeongpil R 1-05:13:43 1 r2u14n2
2998476 windfall n54t3s3 jeongpil R 1-05:13:43 1 r2u14n2
2998302 windfall n54s1- jeongpil R 1-05:20:27 1 r3u31n2
2998303 windfall n54s1 jeongpil R 1-05:27:35 1 r1u11n1
2998301 windfall n54s2- jeongpil R 1-05:50:04 1 r1u15n2
2998325 windfall sn54s2- jeongpil R 1-05:50:04 1 r2u14n1
2998326 windfall sn54s1- jeongpil R 1-05:50:04 1 r3u09n1
2986458 windfall tmd9 stepans R 2-08:47:08 1 r2u17n1
2973272 windfall sn64s2- jeongpil R 2-09:04:25 1 r4u15n2
2926561 windfall urd3 stepans R 2-09:15:34 1 r4u28n1
2931154 windfall add8 stepans R 2-09:15:34 1 r4u30n1
2981682 windfall add7 stepans R 2-11:30:01 1 r4u18n2
2981454 windfall add4 stepans R 2-18:43:19 1 r3u34n1
2931157 windfall urd8 stepans R 3-05:21:10 1 r1u35n1
2897796 windfall add5 stepans R 3-16:38:10 1 r2u07n2
2849309 windfall be_p_t_e ludwik R 6-11:03:42 1 r4u17n2
2931155 windfall adn8 stepans R 7-03:52:30 1 r1u14n2
2406328 windfall B_10 klshark R 7-15:24:36 1 r1u07n1
2769086 windfall be_p_t_e ludwik R 7-15:24:36 1 r1u13n2
2849311 windfall C7 ludwik R 7-15:24:36 1 r2u09n1
2875690 windfall be_s_d_6 fleonars R 7-15:24:36 1 r1u18n2
2886931 windfall C1 ludwik R 7-15:24:36 1 r2u13n1
2886963 windfall B_6 klshark R 7-15:24:36 1 r2u15n1
2897145 windfall be_s_d_1 fleonars R 7-15:24:36 1 r1u29n2
2899420 windfall be_s_d_9 fleonars R 7-15:24:36 1 r1u18n2
2899455 windfall li_d_11_ teodar R 7-15:24:36 1 r1u27n1
2916338 windfall cp_p_1a klshark R 7-15:24:36 1 r2u25n2
2916340 windfall B_9 klshark R 7-15:24:36 1 r2u14n2
2924858 windfall be_s_d_5 fleonars R 7-15:24:36 1 r2u36n1
2929702 windfall be_p_t_e ludwik R 7-15:24:36 1 r2u15n2
2897798 windfall adn5 stepans R 7-15:50:40 1 r4u11n2
2931159 windfall adn9 stepans R 7-15:50:40 1 r4u26n2
2886932 windfall C6 ludwik R 7-15:50:42 1 r4u10n2
2873914 windfall B_4 klshark R 7-16:01:03 1 r3u35n1
2916706 windfall add2 stepans R 8-06:12:51 1 r2u25n2
2909169 windfall adn1 stepans R 8-06:38:09 1 r4u36n1
2909630 windfall adn0 stepans R 8-06:38:09 1 r2u08n1
2912417 windfall li_3 bubin R 8-06:38:09 1 r2u08n1
Thanks!
Isn't the job submitted to a different partition?
>Partition=high_priority[...]
I don't see this name in the config you shared with us before. Could you please share the current slurm.conf?
cheers,
Marcin
Created attachment 23277 [details]
slurm config
Yep, We added that partition recently to keep non-preemptible jobs from running on nodes that individual faculty purchased. The only difference is the node list that's associated with the "standard" partition.
Thanks!
Hi Todd, Has preemption worked since making the change you mention (using the high_priority partition)? Is just this user affected by this? Can you send the output of 'scontrol show job <jobid>' for one of the running windfall jobs? I would also like to see the different QOS's that you have defined. Can I have you send the output of this command as well? sacctmgr show qos format=name,preempt,preemptmode,flags Thanks, Ben Presumably, it's been working. Our users are generally quick to let us know when it's not working and that has been in place since November. The other more recent change is that we upgraded from 20 to 21 at the end of January. It looks like there are a number of other jobs that I would expect to start that are also blocked
root@ericidle:~ # squeue -p high_priority
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
3000708 high_prio perftime hwzhang0 PD 0:00 1 (Priority)
3000709 high_prio perftime hwzhang0 PD 0:00 1 (Priority)
3000349 high_prio Sample_M fgoeltl PD 0:00 1 (Resources)
3000352 high_prio Sample_M fgoeltl PD 0:00 1 (Priority)
3000365 high_prio Sample_M fgoeltl PD 0:00 1 (Priority)
3000697 high_prio perftime hwzhang0 PD 0:00 1 (Priority)
3005975_[738-785] high_prio LSSTxSO_ xfang PD 0:00 1 (Priority)
root@ericidle:~ # sacctmgr --parsable2 show qos format=name,preempt,preemptmode,flags
Name|Preempt|PreemptMode|Flags
normal||cluster|
part_qos_windfall|user_qos_idlecycles|cluster|
part_qos_standard|part_qos_windfall,user_qos_idlecycles|cluster|
user_qos_tmerritt|part_qos_windfall|cluster|OverPartQOS
user_qos_idlecycles||cluster|OverPartQOS
user_qos_nkchen|part_qos_windfall|cluster|OverPartQOS
user_qos_timeifler|part_qos_windfall|cluster|OverPartQOS
user_qos_josh|part_qos_windfall|cluster|OverPartQOS
user_qos_jlbredas|part_qos_windfall|cluster|OverPartQOS
user_qos_denard|part_qos_windfall|cluster|OverPartQOS
user_qos_kgklein|part_qos_windfall|cluster|OverPartQOS
user_qos_xytang|part_qos_windfall|cluster|OverPartQOS
user_qos_jrussell|part_qos_windfall|cluster|OverPartQOS
user_qos_chertkov|part_qos_windfall|cluster|OverPartQOS
user_qos_amainzer|part_qos_windfall|cluster|OverPartQOS
user_qos_douglase|part_qos_windfall|cluster|OverPartQOS
user_qos_sprinkjm|part_qos_windfall|cluster|OverPartQOS
user_qos_hanquist|part_qos_windfall|cluster|OverPartQOS
user_qos_cbender|part_qos_windfall|cluster|OverPartQOS
user_qos_gbesla|part_qos_windfall|cluster|OverPartQOS
user_qos_rgutenk|part_qos_windfall|cluster|OverPartQOS
user_qos_fgoeltl|part_qos_windfall|cluster|OverPartQOS
user_qos_sschwartz|part_qos_windfall|cluster|OverPartQOS
user_qos_hamden|part_qos_windfall|cluster|OverPartQOS
user_qos_yshirley|part_qos_windfall|cluster|OverPartQOS
user_qos_behroozi|part_qos_windfall|cluster|OverPartQOS
user_qos_|part_qos_windfall|cluster|OverPartQOS
user_qos_ludwik|part_qos_windfall|cluster|OverPartQOS
qual_qos_tzega|part_qos_windfall|cluster|OverPartQOS
qual_qos_latmarat|part_qos_windfall|cluster|OverPartQOS
qual_qos_ericlyons|part_qos_windfall|cluster|OverPartQOS
qual_qos_dukepauli|part_qos_windfall|cluster|OverPartQOS
qual_qos_faselh|part_qos_windfall|cluster|OverPartQOS
qual_qos_ruichang|part_qos_windfall|cluster|OverPartQOS
Here are a couple of jobs that I'd expect to be preempted
root@ericidle:~ # scontrol show job 2984636
JobId=2984636 JobName=Job1
UserId=jeongpilsong(43439) GroupId=mazumdar(30580) MCS_label=N/A
Priority=15 Nice=0 Account=windfall QOS=part_qos_windfall
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=3-11:10:52 TimeLimit=10-00:00:00 TimeMin=N/A
SubmitTime=2022-01-31T23:18:52 EligibleTime=2022-01-31T23:18:52
AccrueTime=2022-01-31T23:18:52
StartTime=2022-01-31T23:31:31 EndTime=2022-02-10T23:31:31 Deadline=N/A
PreemptEligibleTime=2022-01-31T23:31:31 PreemptTime=None
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-01-31T23:31:31 Scheduler=Main
Partition=windfall AllocNode:Sid=wentletrap:25825
ReqNodeList=(null) ExcNodeList=r1u07n[1-2],r1u08n[1-2],r1u09n[1-2],r1u10n[1-2],r1u11n[1-2],r1u12n[1-2],r1u13n[1-2],r1u14n[1-2],r1u15n[1-2],r1u16n[1-2],r1u17n[1-2],r1u18n[1-2],r1u25n[1-2],r1u26n[1-2],r1u27n[1-2],r1u28n[1-2],r1u29n[1-2],r1u30n[1-2],r1u31n[1-2],r1u32n[1-2],r1u33n[1-2],r1u34n[1-2],r1u35n[1-2],r1u36n[1-2],r2u07n[1-2],r2u08n[1-2],r2u09n[1-2],r2u10n[1-2],r2u11n[1-2],r2u12n[1-2],r2u13n[1-2],r2u14n[1-2],r2u15n[1-2],r2u16n[1-2],r2u17n[1-2],r2u18n[1-2],r2u25n[1-2],r2u26n[1-2],r2u27n[1-2],r2u28n[1-2],r2u29n[1-2],r2u30n[1-2],r2u31n[1-2],r2u32n[1-2],r2u33n[1-2],r2u34n[1-2],r2u35n[1-2],r2u36n[1-2],r3u07n[1-2],r3u08n[1-2],r3u09n[1-2],r3u10n[1-2],r3u11n[1-2],r3u12n[1-2],r3u13n[1-2],r3u14n[1-2],r3u15n[1-2],r3u16n[1-2],r3u17n[1-2],r3u18n[1-2],r3u25n[1-2],r3u26n[1-2],r3u27n[1-2],r3u28n[1-2],r3u29n[1-2],r3u30n[1-2],r3u31n[1-2],r3u32n[1-2],r3u33n[1-2],r3u34n[1-2],r3u35n[1-2],r3u36n[1-2],r4u07n[1-2],r4u08n[1-2],r4u09n[1-2],r4u10n[1-2],r4u11n[1-2],r4u12n[1-2],r4u13n[1-2],r4u14n[1-2],r4u15n[1-2],r4u16n[1-2],r4u17n[1-2],r4u18n[1-2],r4u25n[1-2],r4u26n[1-2],r4u27n[1-2],r4u28n[1-2],r4u29n[1-2],r4u30n[1-2],r4u31n[1-2],r4u32n[1-2],r4u33n[1-2],r4u34n[1-2],r4u35n[1-2],r4u36n[1-2],r5u13n1,r5u15n1,r5u17n1,r5u19n1,r5u24n1,r5u25n1,r5u27n1,r5u29n1
NodeList=r4u37n[1-2],r4u38n[1-2]
BatchHost=r4u37n1
NumNodes=4 NumCPUs=376 NumTasks=376 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=376,mem=1504G,node=4,billing=376
Socks/Node=* NtasksPerN:B:S:C=94:0:*:* CoreSpec=*
MinCPUsNode=94 MinMemoryCPU=4G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=USER Contiguous=0 Licenses=(null) Network=(null)
Command=/home/u19/jeongpilsong/mps/mps-hubbard-triangle/calc/9x6/nup18/u9/x2/v2.4/pbs
WorkDir=/home/u19/jeongpilsong/mps/mps-hubbard-triangle/calc/9x6/nup18/u9/x2/v2.4
StdErr=/home/u19/jeongpilsong/mps/mps-hubbard-triangle/calc/9x6/nup18/u9/x2/v2.4/output.out
StdIn=/dev/null
StdOut=/home/u19/jeongpilsong/mps/mps-hubbard-triangle/calc/9x6/nup18/u9/x2/v2.4/output.out
Power=
root@ericidle:~ # scontrol show job 2984635
JobId=2984635 JobName=Job1
UserId=jeongpilsong(43439) GroupId=mazumdar(30580) MCS_label=N/A
Priority=15 Nice=0 Account=windfall QOS=part_qos_windfall
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=3-11:12:00 TimeLimit=10-00:00:00 TimeMin=N/A
SubmitTime=2022-01-31T23:18:50 EligibleTime=2022-01-31T23:18:50
AccrueTime=2022-01-31T23:18:50
StartTime=2022-01-31T23:30:31 EndTime=2022-02-10T23:30:31 Deadline=N/A
PreemptEligibleTime=2022-01-31T23:30:31 PreemptTime=None
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-01-31T23:30:31 Scheduler=Main
Partition=windfall AllocNode:Sid=wentletrap:25825
ReqNodeList=(null) ExcNodeList=r1u07n[1-2],r1u08n[1-2],r1u09n[1-2],r1u10n[1-2],r1u11n[1-2],r1u12n[1-2],r1u13n[1-2],r1u14n[1-2],r1u15n[1-2],r1u16n[1-2],r1u17n[1-2],r1u18n[1-2],r1u25n[1-2],r1u26n[1-2],r1u27n[1-2],r1u28n[1-2],r1u29n[1-2],r1u30n[1-2],r1u31n[1-2],r1u32n[1-2],r1u33n[1-2],r1u34n[1-2],r1u35n[1-2],r1u36n[1-2],r2u07n[1-2],r2u08n[1-2],r2u09n[1-2],r2u10n[1-2],r2u11n[1-2],r2u12n[1-2],r2u13n[1-2],r2u14n[1-2],r2u15n[1-2],r2u16n[1-2],r2u17n[1-2],r2u18n[1-2],r2u25n[1-2],r2u26n[1-2],r2u27n[1-2],r2u28n[1-2],r2u29n[1-2],r2u30n[1-2],r2u31n[1-2],r2u32n[1-2],r2u33n[1-2],r2u34n[1-2],r2u35n[1-2],r2u36n[1-2],r3u07n[1-2],r3u08n[1-2],r3u09n[1-2],r3u10n[1-2],r3u11n[1-2],r3u12n[1-2],r3u13n[1-2],r3u14n[1-2],r3u15n[1-2],r3u16n[1-2],r3u17n[1-2],r3u18n[1-2],r3u25n[1-2],r3u26n[1-2],r3u27n[1-2],r3u28n[1-2],r3u29n[1-2],r3u30n[1-2],r3u31n[1-2],r3u32n[1-2],r3u33n[1-2],r3u34n[1-2],r3u35n[1-2],r3u36n[1-2],r4u07n[1-2],r4u08n[1-2],r4u09n[1-2],r4u10n[1-2],r4u11n[1-2],r4u12n[1-2],r4u13n[1-2],r4u14n[1-2],r4u15n[1-2],r4u16n[1-2],r4u17n[1-2],r4u18n[1-2],r4u25n[1-2],r4u26n[1-2],r4u27n[1-2],r4u28n[1-2],r4u29n[1-2],r4u30n[1-2],r4u31n[1-2],r4u32n[1-2],r4u33n[1-2],r4u34n[1-2],r4u35n[1-2],r4u36n[1-2],r5u13n1,r5u15n1,r5u17n1,r5u19n1,r5u24n1,r5u25n1,r5u27n1,r5u29n1
NodeList=r4u05n[1-2],r4u06n[1-2]
BatchHost=r4u05n1
NumNodes=4 NumCPUs=376 NumTasks=376 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=376,mem=1504G,node=4,billing=376
Socks/Node=* NtasksPerN:B:S:C=94:0:*:* CoreSpec=*
MinCPUsNode=94 MinMemoryCPU=4G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=USER Contiguous=0 Licenses=(null) Network=(null)
Command=/home/u19/jeongpilsong/mps/mps-hubbard-triangle/calc/9x6/nup18/u9/x2/v2.3/pbs
WorkDir=/home/u19/jeongpilsong/mps/mps-hubbard-triangle/calc/9x6/nup18/u9/x2/v2.3
StdErr=/home/u19/jeongpilsong/mps/mps-hubbard-triangle/calc/9x6/nup18/u9/x2/v2.3/output.out
StdIn=/dev/null
StdOut=/home/u19/jeongpilsong/mps/mps-hubbard-triangle/calc/9x6/nup18/u9/x2/v2.3/output.out
Power=
Thanks!
I thought there might be an issue with the jobs not having the preemptable QOS listed on the job, but only having it associated by being in the preemptable partition. For the jobs you're showing that doesn't look like it's the case. Can I have you get the output of squeue that shows the qos on the job as well as the partition and the time the job has been running? squeue -pwindfall --state=running -O jobid,partition,username,timeused,qos Thanks, Ben I've still been looking into this on my side. The reason I asked for that squeue output was because I wondered if there might be some interaction with the preempt_youngest_first parameter that I see you have set and the possibility that some of the windfall jobs might have another QOS associated with them. I set up a few scenarios where I thought it might cause a problem, but I haven't been able to reproduce the behavior you're describing yet. I would still like to see the squeue output I was asking for to eliminate that as a possibility (squeue -pwindfall --state=running -O jobid,partition,username,timeused,qos). In addition I wonder if you could temporarily increase the log level for a few minutes. If job 3000709 is still queued then you can just let it run for 5 minutes. If that job has run at this point can you submit another job like it and wait for a few minutes before setting the log level back where it is. I see you have 'debug' logs enabled now, so the commands to do this would be: scontrol setdebug debug2 scontrol setdebug debug Thanks, Ben Hi Ben, Sorry I missed that request. Here's the output root@ericidle:~ # squeue -pwindfall --state=running -O jobid,partition,username,timeused,qos JOBID PARTITION USER TIME QOS 2984636 windfall jeongpilsong 3-16:01:44 part_qos_windfall 2984635 windfall jeongpilsong 3-16:02:44 part_qos_windfall 2984634 windfall jeongpilsong 3-16:03:16 part_qos_windfall 2984633 windfall jeongpilsong 3-16:03:44 part_qos_windfall 2984632 windfall jeongpilsong 3-16:04:44 part_qos_windfall 2984631 windfall jeongpilsong 3-16:05:44 part_qos_windfall 2984630 windfall jeongpilsong 3-16:06:44 part_qos_windfall 2984629 windfall jeongpilsong 3-16:07:44 part_qos_windfall 2984628 windfall jeongpilsong 3-16:08:44 part_qos_windfall 2984627 windfall jeongpilsong 3-16:09:44 part_qos_windfall 2984626 windfall jeongpilsong 3-16:10:44 part_qos_windfall 2984625 windfall jeongpilsong 3-16:11:44 part_qos_windfall 2984624 windfall jeongpilsong 3-16:12:44 part_qos_windfall 2984637 windfall jeongpilsong 3-16:00:44 part_qos_windfall 2965316 windfall epalikot 8:28 part_qos_windfall 3004298 windfall epalikot 8:28 part_qos_windfall 2983446 windfall jeongpilsong 4-00:09:51 part_qos_windfall 2983450 windfall jeongpilsong 4-00:20:51 part_qos_windfall 2983449 windfall jeongpilsong 4-00:21:51 part_qos_windfall 2983448 windfall jeongpilsong 4-00:22:51 part_qos_windfall 2983447 windfall jeongpilsong 4-00:23:35 part_qos_windfall 2991057 windfall aponsero 5:21 part_qos_windfall 2998473 windfall jeongpilsong 14:58 part_qos_windfall 3002781 windfall jeongpilsong 14:58 part_qos_windfall 3006949 windfall jeongpilsong 2:29:35 part_qos_windfall 3006716 windfall jeongpilsong 2:44:40 part_qos_windfall 3006698 windfall jeongpilsong 2:52:54 part_qos_windfall 3006699 windfall jeongpilsong 2:52:54 part_qos_windfall 2998304 windfall jeongpilsong 3:00:52 part_qos_windfall 2998665 windfall klshark 4:18:38 part_qos_windfall 3002836 windfall jeongpilsong 12:48:07 part_qos_windfall 3002837 windfall jeongpilsong 12:48:07 part_qos_windfall 2986182 windfall ludwik 12:48:27 part_qos_windfall 3002842 windfall jeongpilsong 16:54:10 part_qos_windfall 3002843 windfall jeongpilsong 16:54:10 part_qos_windfall 3002844 windfall jeongpilsong 16:54:10 part_qos_windfall 3002845 windfall jeongpilsong 16:54:10 part_qos_windfall 3002846 windfall jeongpilsong 16:54:10 part_qos_windfall 3002785 windfall jeongpilsong 17:13:57 part_qos_windfall 3002833 windfall jeongpilsong 17:13:57 part_qos_windfall 3002834 windfall jeongpilsong 17:13:57 part_qos_windfall 3002835 windfall jeongpilsong 17:13:57 part_qos_windfall 3002838 windfall jeongpilsong 17:13:57 part_qos_windfall 3002839 windfall jeongpilsong 17:13:57 part_qos_windfall 3002840 windfall jeongpilsong 17:13:57 part_qos_windfall 3002841 windfall jeongpilsong 17:13:57 part_qos_windfall 2998472 windfall jeongpilsong 18:13:51 part_qos_windfall 3002782 windfall jeongpilsong 18:13:51 part_qos_windfall 3002783 windfall jeongpilsong 18:13:51 part_qos_windfall 3002784 windfall jeongpilsong 18:13:51 part_qos_windfall 2998327 windfall jeongpilsong 22:13:05 part_qos_windfall 2998328 windfall jeongpilsong 22:13:05 part_qos_windfall 2998329 windfall jeongpilsong 22:13:05 part_qos_windfall 2998305 windfall jeongpilsong 1-02:08:32 part_qos_windfall 2998474 windfall jeongpilsong 1-15:13:33 part_qos_windfall 2998476 windfall jeongpilsong 1-15:13:33 part_qos_windfall 2998302 windfall jeongpilsong 1-15:20:17 part_qos_windfall 2998303 windfall jeongpilsong 1-15:27:25 part_qos_windfall 2998301 windfall jeongpilsong 1-15:49:54 part_qos_windfall 2998325 windfall jeongpilsong 1-15:49:54 part_qos_windfall 2998326 windfall jeongpilsong 1-15:49:54 part_qos_windfall 2973272 windfall jeongpilsong 2-19:04:15 part_qos_windfall 2849309 windfall ludwik 6-21:03:32 part_qos_windfall 2406328 windfall klshark 8-01:24:26 part_qos_windfall 2769086 windfall ludwik 8-01:24:26 part_qos_windfall 2849311 windfall ludwik 8-01:24:26 part_qos_windfall 2875690 windfall fleonarski 8-01:24:26 part_qos_windfall 2886931 windfall ludwik 8-01:24:26 part_qos_windfall 2886963 windfall klshark 8-01:24:26 part_qos_windfall 2897145 windfall fleonarski 8-01:24:26 part_qos_windfall 2899420 windfall fleonarski 8-01:24:26 part_qos_windfall 2899455 windfall teodar 8-01:24:26 part_qos_windfall 2916338 windfall klshark 8-01:24:26 part_qos_windfall 2916340 windfall klshark 8-01:24:26 part_qos_windfall 2924858 windfall fleonarski 8-01:24:26 part_qos_windfall 2929702 windfall ludwik 8-01:24:26 part_qos_windfall 2886932 windfall ludwik 8-01:50:32 part_qos_windfall 2873914 windfall klshark 8-02:00:53 part_qos_windfall 2912417 windfall bubin 8-16:37:59 part_qos_windfall 3008082 windfall haydenfoote 23:17 part_qos_windfall The previous log was at level debug. I'll bump it to level debug5 for a minute and send you the log. Thanks! Created attachment 23303 [details]
debug5 slurmctld log
Thanks for sending that output. The squeue output confirms that my theory about some jobs possibly having a different QOS on the youngest jobs is incorrect. I've been looking through the logs and I can see that a job (3008144) was able to preempt other jobs to get resources and start. [Feb 04 15:39:48.78004 328 sched_agent 0x7f8d166ec700] preempted JobId=2981461 has been requeued to reclaim resources for JobId=3008144 [Feb 04 15:39:48.78004 328 sched_agent 0x7f8d166ec700] preempted JobId=2981461 has been requeued to reclaim resources for JobId=3008144 [Feb 04 15:39:48.78752 328 sched_agent 0x7f8d166ec700] preempted JobId=3004275 has been requeued to reclaim resources for JobId=3008144 [Feb 04 15:39:48.79438 328 sched_agent 0x7f8d166ec700] preempted JobId=3004298 has been requeued to reclaim resources for JobId=3008144 [Feb 04 15:39:48.80094 328 sched_agent 0x7f8d166ec700] preempted JobId=2965316 has been requeued to reclaim resources for JobId=3008144 [Feb 04 15:39:48.80107 328 sched_agent 0x7f8d166ec700] debug3: sched: JobId=3008144. State=PENDING. Reason=Resources. Priority=25. Partition=standard. ... [Feb 04 15:39:52.949298 328 sched_agent 0x7f8d166ec700] debug3: sched: JobId=3008144 initiated [Feb 04 15:39:52.949314 328 sched_agent 0x7f8d166ec700] sched: Allocate JobId=3008144 NodeList=r3u13n1,r3u16n[1-2],r3u30n2,r3u36n1,r4u18n1,r4u30n[1-2],r4u34n2,r4u36n[1-2] #CPUs=220 Partition=standard This job looks like it's in a different account though (acct=mazumdar). I do see some log entries showing that the acct_policy information is being updated for the behroozi account, so jobs from that account are starting. [Feb 04 15:35:19.82608 328 sigmgr 0x7f8d1560e700] debug2: acct_policy_job_begin: after adding JobId=2970860, assoc 10785(behroozi/hwzhang0595/standard) grp_used_tres_run_secs(cpu) is 13824000 Since I haven't been able to find a smoking gun from generic cases I think we will need to create a specific test case where we can see specifics of what's happening. Can you either create a reservation on a node that we can test with or find a node that has a preemptible job running on it that we can focus on? Once you have a target node selected I would like to see the node details and job details for the preemptible job that is running on the node. Then if you can increase the log level temporarily again for slurmctld and have the user submit a job that targets that node specifically by adding the "-w <node name>" flag to sbatch. Collect details about the preemptor job as well while it's pending. Allow a few minutes to pass and then set the log level back to what it was. To recap, the steps should look like this: 1. Identify a node that currently has a preemptible job currently running or create a reservation for a node we can use to test with. 2. scontrol show node <node name> 3. scontrol show job <job id> (for the preemptible job on that node) 4. scontrol setdebug debug3 (debug3 should be plenty) 5. Submit a job that requests that node and can preempt the existing job. Use the 'sbatch -w <node name>' flag to target that node. 6. scontrol show job <job id> (for the preemptor job) 7. Wait a few minutes. 8. scontrol setdebug debug Then if you could send the output collected along with the logs for this time period I'll look at what's happening. Thanks, Ben root@ericidle:~ # scontrol show node r2u07n2 NodeName=r2u07n2 Arch=x86_64 CoresPerSocket=48 CPUAlloc=91 CPUTot=96 CPULoad=52.76 AvailableFeatures=(null) ActiveFeatures=(null) Gres=(null) NodeAddr=r2u07n2 NodeHostName=r2u07n2 Version=21.08.5 OS=Linux 3.10.0-1160.53.1.el7.x86_64 #1 SMP Fri Jan 14 13:59:45 UTC 2022 RealMemory=515830 AllocMem=414720 FreeMem=435116 Sockets=2 Boards=1 CoreSpecCount=2 CPUSpecList=0-1 State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=20 Owner=N/A MCS_label=N/A Partitions=windfall,standard,high_priority BootTime=2022-01-26T14:03:06 SlurmdStartTime=2022-01-26T14:03:36 LastBusyTime=2022-02-07T03:31:17 CfgTRES=cpu=96,mem=515830M,billing=96 AllocTRES=cpu=91,mem=405G CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s root@ericidle:~ # scontrol show job 3022522 JobId=3022522 JobName=adn1 UserId=stepans(90770) GroupId=ludwik(30004) MCS_label=N/A Priority=2 Nice=0 Account=windfall QOS=part_qos_windfall JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=04:42:36 TimeLimit=10-00:00:00 TimeMin=N/A SubmitTime=2022-02-07T05:51:50 EligibleTime=2022-02-07T05:51:50 AccrueTime=2022-02-07T05:51:50 StartTime=2022-02-07T05:55:15 EndTime=2022-02-17T05:55:15 Deadline=N/A PreemptEligibleTime=2022-02-07T05:55:15 PreemptTime=None SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-02-07T05:55:15 Scheduler=Backfill Partition=windfall AllocNode:Sid=wentletrap:10475 ReqNodeList=(null) ExcNodeList=(null) NodeList=r2u07n2 BatchHost=r2u07n2 NumNodes=1 NumCPUs=30 NumTasks=30 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=30,mem=150G,node=1,billing=30 Socks/Node=* NtasksPerN:B:S:C=30:0:*:* CoreSpec=* MinCPUsNode=30 MinMemoryCPU=5G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=/xdisk/ludwik/stepans/puma/gnso1/adn1 WorkDir=/xdisk/ludwik/stepans/puma/gnso1 StdErr=/xdisk/ludwik/stepans/puma/gnso1/slurm-3022522.out StdIn=/dev/null StdOut=/xdisk/ludwik/stepans/puma/gnso1/slurm-3022522.out Power= root@ericidle:~ # scontrol show job 3024254 JobId=3024254 JobName=slurm-standard-test UserId=tmerritt(7862) GroupId=hpcteam(30001) MCS_label=N/A Priority=2 Nice=0 Account=hpcteam QOS=part_qos_standard JobState=PENDING Reason=Priority Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=00:10:00 TimeMin=N/A SubmitTime=2022-02-07T10:38:58 EligibleTime=2022-02-07T10:38:58 AccrueTime=2022-02-07T10:38:58 StartTime=Unknown EndTime=Unknown Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-02-07T10:39:06 Scheduler=Main Partition=standard AllocNode:Sid=wentletrap:6769 ReqNodeList=r2u07n2 ExcNodeList=(null) NodeList=(null) NumNodes=1-1 NumCPUs=30 NumTasks=30 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=30,mem=150G,node=1,billing=30 Socks/Node=* NtasksPerN:B:S:C=30:0:*:* CoreSpec=* MinCPUsNode=30 MinMemoryCPU=5G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=/home/u11/tmerritt/puma/puma-standard.scr WorkDir=/home/u11/tmerritt/puma StdErr=/home/u11/tmerritt/puma/slurm-standard-test.out StdIn=/dev/null StdOut=/home/u11/tmerritt/puma/slurm-standard-test.out Power= Created attachment 23318 [details]
slurmctld log from 20220207
Thanks for gathering that information. I think I have a better idea of what might be happening now. I can see in the logs that this job is submitted correctly but when it's time to schedule the job it doesn't get fully evaluated. The log entry looks like this: [Feb 07 10:39:06.133238 27580 sched_agent 0x7f2c20714700] debug2: sched: JobId=3024254. unable to schedule in Partition=standard (per _failed_partition()). Retaining previous scheduling Reason=Priority. Desc=(null). Priority=2. This message doesn't make it explicitly clear what the problem is, but if you look at the comments for the _failed_partition() function you can see that it checks to see if the nodes have already been reserved by higher priority jobs. https://github.com/SchedMD/slurm/blob/cedf4cf35b1ac85a4e3af098a245083dab7c43ec/src/slurmctld/job_scheduler.c#L739-L753 To confirm that having a higher priority does cause this job to be scheduled on this node ahead of other jobs that might be trying to go there first, can I have you try increasing the priority for this job? You can do this by running the following command: scontrol update jobid=2844 priority=10000 You will need to be an administrator to set the priority like that. Can I also have you send the output of 'sprio' so I can see the priority of this job relative to other jobs on the system? Thanks, Ben I think you may be on to something. When I increased the priority to max, it changes from pending on priority to pending on resources. After sitting in that state for a few seconds it does look like it preempted the jobs that I had expected it to. I had to submit a new test job. The job id is 3027192. I'll attach the slurmctld log as well. Created attachment 23336 [details]
slurmctld log
Also the sprio output it here. The only really notable thing there is the only job that's asking for 20 full nodes with a priority of 75. Is it possible that there might have been a job like that in the queue when this issue appeared last Friday and it blocked all of the other standard jobs from being able to start? Is there a way I can assess what sprio might have looked like retroactively?
root@ericidle:~ # sprio
JOBID PARTITION PRIORITY SITE JOBSIZE
1469937 standard 3 0 4
2406218 windfall 2 0 3
2416861 windfall 2 0 3
2430705 windfall 2 0 3
2650279 windfall 2 0 3
2742693 windfall 2 0 3
2746074 windfall 2 0 3
2751961 windfall 2 0 3
2765536 standard 3 0 4
2795925 windfall 2 0 3
2823954 windfall 2 0 3
2824059 windfall 2 0 3
2845470 windfall 2 0 3
2845533 windfall 2 0 3
2872954 standard 3 0 4
2873914 windfall 2 0 3
2873923 windfall 2 0 3
2886928 windfall 2 0 3
2886931 windfall 2 0 3
2886932 windfall 2 0 3
2886962 windfall 2 0 3
2886963 windfall 2 0 3
2899416 windfall 2 0 3
2899455 windfall 2 0 3
2905643 windfall 2 0 3
2909238 windfall 2 0 3
2909240 windfall 2 0 3
2912326 windfall 2 0 3
2916340 windfall 2 0 3
2916374 windfall 2 0 3
2924811 windfall 2 0 3
2924858 windfall 2 0 3
2924861 windfall 2 0 3
2929768 windfall 2 0 3
2965661 standard 7 0 8
2968040 windfall 2 0 3
2968901 windfall 2 0 3
2971028 windfall 2 0 3
2981014 windfall 2 0 3
2985914 windfall 2 0 3
2985915 windfall 2 0 3
2985969 windfall 2 0 3
2985971 windfall 2 0 3
2985972 windfall 2 0 3
2986182 windfall 2 0 3
2986186 windfall 2 0 3
2986199 windfall 2 0 3
2986411 windfall 2 0 3
2986445 windfall 2 0 3
2986457 windfall 2 0 3
2991074 windfall 2 0 3
2997124 windfall 3 0 4
2997127 windfall 3 0 4
2997130 windfall 3 0 4
2997131 windfall 3 0 4
2997472 windfall 3 0 4
2997570 windfall 3 0 4
2997627 windfall 3 0 4
2998619 windfall 2 0 3
2998620 windfall 2 0 3
2998652 windfall 2 0 3
2998653 windfall 2 0 3
3003793 windfall 2 0 3
3003817 windfall 2 0 3
3003819 windfall 2 0 3
3003821 windfall 2 0 3
3004102 standard 1 0 2
3004276 windfall 2 0 3
3008100 standard 2 0 3
3008354 standard 3 0 4
3017817 standard 2 0 2
3017819 standard 2 0 2
3017820 standard 2 0 2
3017821 standard 2 0 2
3017863 windfall 2 0 3
3017864 windfall 2 0 3
3017866 windfall 2 0 3
3017884 windfall 2 0 3
3017886 windfall 2 0 3
3017962 windfall 2 0 3
3019039 standard 1 0 2
3019040 standard 1 0 2
3019041 standard 1 0 2
3019042 standard 1 0 2
3019473 windfall 3 0 4
3019474 windfall 3 0 4
3019475 windfall 3 0 4
3019476 windfall 3 0 4
3019477 windfall 3 0 4
3019478 windfall 3 0 4
3019479 windfall 3 0 4
3019480 windfall 3 0 4
3019481 windfall 3 0 4
3019482 windfall 3 0 4
3019483 windfall 3 0 4
3019484 windfall 3 0 4
3019485 windfall 3 0 4
3019486 windfall 3 0 4
3019487 windfall 3 0 4
3019488 windfall 3 0 4
3019489 windfall 3 0 4
3019490 windfall 3 0 4
3019491 windfall 3 0 4
3019492 windfall 3 0 4
3019493 windfall 3 0 4
3019494 windfall 3 0 4
3019495 windfall 3 0 4
3019496 windfall 3 0 4
3019497 windfall 3 0 4
3019498 windfall 3 0 4
3019499 windfall 3 0 4
3019500 windfall 3 0 4
3019501 windfall 3 0 4
3019502 windfall 3 0 4
3019503 windfall 3 0 4
3019504 windfall 3 0 4
3019505 windfall 3 0 4
3019506 windfall 3 0 4
3019507 windfall 3 0 4
3019508 windfall 3 0 4
3019509 windfall 3 0 4
3019510 windfall 3 0 4
3019511 windfall 3 0 4
3019675 standard 3 0 4
3019676 standard 3 0 4
3020274 windfall 3 0 4
3020279 windfall 3 0 4
3020671 standard 1 0 2
3020672 standard 1 0 2
3020673 standard 1 0 2
3020674 standard 1 0 2
3021604 windfall 1 0 2
3021611 windfall 1 0 2
3021612 windfall 1 0 2
3021615 windfall 1 0 2
3021616 windfall 1 0 2
3021617 windfall 1 0 2
3021687 standard 2 0 2
3021690 standard 2 0 2
3021691 standard 2 0 2
3021692 standard 2 0 2
3021693 standard 2 0 2
3021694 standard 2 0 2
3021695 standard 2 0 2
3021713 windfall 1 0 2
3021714 windfall 1 0 2
3021718 standard 1 0 2
3021754 windfall 1 0 2
3021794 windfall 1 0 2
3021795 windfall 1 0 2
3021797 windfall 1 0 2
3021854 windfall 2 0 2
3021865 windfall 2 0 2
3021983 windfall 2 0 3
3022019 windfall 2 0 3
3022021 windfall 2 0 3
3022033 windfall 2 0 3
3022035 windfall 1 0 2
3022077 windfall 2 0 3
3022203 windfall 1 0 2
3022225 windfall 1 0 2
3022367 windfall 2 0 3
3022389 standard 1 0 2
3022478 windfall 2 0 3
3022479 windfall 2 0 3
3022522 windfall 2 0 3
3022523 windfall 2 0 3
3022524 windfall 2 0 3
3022730 windfall 2 0 3
3022747 windfall 2 0 3
3022749 windfall 2 0 3
3022766 windfall 2 0 3
3022784 windfall 2 0 3
3022787 windfall 2 0 3
3022805 windfall 2 0 3
3022822 windfall 2 0 3
3023026 windfall 2 0 3
3023040 windfall 1 0 2
3023041 windfall 1 0 2
3023043 windfall 1 0 2
3023046 windfall 2 0 3
3023088 windfall 2 0 3
3023382 windfall 1 0 2
3023383 windfall 1 0 2
3023384 windfall 1 0 2
3023989 standard 2 0 2
3024328 standard 2 0 2
3024340 standard 2 0 2
3024357 windfall 1 0 2
3024358 windfall 1 0 2
3024359 windfall 1 0 2
3024438 standard 1 0 2
3024440 standard 3 0 3
3024443 windfall 1 0 2
3024444 windfall 1 0 2
3024445 windfall 1 0 2
3024451 standard 2 0 3
3024506 windfall 1 0 2
3024507 windfall 1 0 2
3024508 windfall 1 0 2
3024603 windfall 1 0 2
3024604 windfall 1 0 2
3024605 windfall 1 0 2
3024654 standard 2 0 2
3025201 windfall 2 0 2
3025771 standard 1 0 2
3025772 standard 75 0 76
3025782 standard 2 0 2
3025784 standard 2 0 2
3026360 standard 3 0 4
3026438 standard 2 0 2
3026460 standard 2 0 2
3026514 standard 3 0 4
3026515 standard 3 0 4
3026516 standard 3 0 4
3026517 standard 3 0 4
3026518 standard 3 0 4
3026519 standard 3 0 4
3026520 standard 3 0 4
3026521 standard 3 0 4
3026522 standard 3 0 4
3026523 standard 3 0 4
3026524 standard 3 0 4
3026527 standard 3 0 4
3026528 standard 3 0 4
3026529 standard 3 0 4
3026530 standard 3 0 4
3026531 standard 3 0 4
3026532 standard 3 0 4
3026533 standard 3 0 4
3026537 standard 2 0 2
3026873 standard 3 0 4
3026885 standard 2 0 3
3026948 standard 1 0 2
3027060 high_prio 2 0 2
3027061 high_prio 2 0 2
3027062 high_prio 2 0 2
3027063 high_prio 2 0 2
3027064 high_prio 2 0 2
3027065 high_prio 2 0 2
3027066 high_prio 2 0 2
3027067 high_prio 2 0 2
3027068 high_prio 2 0 2
3027069 high_prio 2 0 2
3027070 high_prio 2 0 2
3027071 high_prio 2 0 2
3027072 high_prio 2 0 2
3027073 high_prio 2 0 2
3027074 high_prio 2 0 2
3027075 high_prio 2 0 2
3027076 high_prio 2 0 2
3027077 high_prio 2 0 2
3027078 high_prio 2 0 2
3027079 high_prio 2 0 2
3027080 high_prio 2 0 2
3027081 high_prio 2 0 2
3027082 high_prio 2 0 2
3027083 high_prio 2 0 2
3027084 high_prio 2 0 2
3027085 high_prio 2 0 2
3027086 high_prio 2 0 2
3027087 high_prio 2 0 2
3027088 high_prio 2 0 2
3027089 high_prio 2 0 2
3027090 high_prio 2 0 2
3027091 high_prio 2 0 2
3027092 high_prio 2 0 2
3027093 high_prio 2 0 2
3027094 high_prio 2 0 2
3027095 high_prio 2 0 2
3027096 high_prio 2 0 2
3027097 high_prio 2 0 2
3027098 high_prio 2 0 2
3027099 high_prio 2 0 2
3027100 high_prio 2 0 2
3027101 high_prio 2 0 2
3027102 high_prio 2 0 2
3027103 high_prio 2 0 2
3027104 high_prio 2 0 2
3027105 high_prio 2 0 2
3027106 high_prio 2 0 2
3027107 high_prio 2 0 2
3027108 high_prio 2 0 2
3027109 high_prio 2 0 2
3027110 high_prio 2 0 2
3027111 high_prio 2 0 2
3027112 high_prio 2 0 2
3027113 high_prio 2 0 2
3027114 high_prio 2 0 2
3027115 high_prio 2 0 2
3027116 high_prio 2 0 2
3027117 high_prio 2 0 2
3027118 high_prio 2 0 2
3027119 high_prio 2 0 2
3027120 high_prio 2 0 2
3027121 high_prio 2 0 2
3027122 high_prio 2 0 2
3027123 high_prio 2 0 2
3027124 high_prio 2 0 2
3027125 high_prio 2 0 2
3027126 high_prio 2 0 2
3027127 high_prio 2 0 2
3027128 high_prio 2 0 2
3027129 high_prio 2 0 2
3027130 high_prio 2 0 2
3027131 high_prio 2 0 2
3027132 high_prio 2 0 2
3027133 high_prio 2 0 2
3027134 high_prio 2 0 2
3027135 high_prio 2 0 2
3027136 high_prio 2 0 2
3027137 high_prio 2 0 2
3027138 high_prio 2 0 2
3027139 high_prio 2 0 2
3027140 high_prio 2 0 2
3027141 high_prio 2 0 2
3027142 high_prio 2 0 2
3027143 high_prio 2 0 2
3027144 high_prio 2 0 2
3027145 high_prio 2 0 2
3027146 high_prio 2 0 2
3027147 high_prio 2 0 2
3027148 high_prio 2 0 2
3027149 high_prio 2 0 2
3027150 high_prio 2 0 2
3027151 high_prio 2 0 2
3027152 high_prio 2 0 2
3027153 high_prio 2 0 2
3027154 high_prio 2 0 2
3027155 high_prio 2 0 2
3027156 high_prio 2 0 2
3027157 high_prio 2 0 2
3027158 high_prio 2 0 2
3027159 high_prio 2 0 2
3027160 high_prio 2 0 2
3027161 high_prio 2 0 2
3027162 standard 2 0 3
3027194 high_prio 3 0 4
3027195 standard 2 0 3
3027196 high_prio 3 0 4
3027197 high_prio 3 0 4
3027212 standard 3 0 3
3027224 high_prio 2 0 3
3027225 standard 1 0 2
3027226 high_prio 2 0 3
3027227 high_prio 2 0 3
3027228 high_prio 2 0 3
3027229 high_prio 2 0 3
3027230 high_prio 2 0 3
3027231 standard 3 0 4
3027232 high_prio 2 0 3
Thanks!
It is possible that there was a large job requesting a lot of resources that created a priority reservation that blocked other jobs from starting. If you know the time frame you want to look at you can see if there was a high priority job by running a command like this: sacct -X --starttime=<start_time> --endtime=<end_time> --format=jobid,priority You would obviously want to replace <start_time> and <end_time> with the correct date and time values. I see that the only priority factor you're taking into consideration is the job size. Is that still your intention. Adding something like the job age might be something to consider. You would need to evaluate the relationship of the age based priority with the size priority, but with the proper balance it could help prevent jobs from getting stuck for too long. If you're interested in something like that and would like to see some simple examples let me know. Thanks, Ben Thanks Ben, It looks like there are a few contenders in the window that we were seeing the delays ------------ ---------- -------- ---------- JobID Priority NNodes Partition 3002151_[0-+ 41 11 high_prio+ 3001840 42 17 standard 3001842 42 17 standard 3003374 42 17 standard 3001816 46 20 standard 3002338 46 20 standard 2998983 47 19 standard 2998413 48 17 standard 3002144 55 24 standard 2998917 60 24 standard 3001815 64 26 standard 3001794 65 26 standard 3001797 65 26 standard 3001798 65 26 standard 3003013 65 23 standard 2993839 75 20 standard 2993501 193 100 standard We do still want to favor large jobs but if we could balance job age into the equation I think that would probably be helpful in avoiding the case that we're running into. Thanks! That does seem like the cause of the behavior you observed then. There are a few things to consider when adding age-based priority to the cluster. You need to determine how long is the longest you want a job to be queued before it reaches its maximum age-based priority boost. This value is configured as PriorityMaxAge. Then you need to determine the maximum priority a job will gain from being queued for that amount of time. This value is configured as PriorityWeightAge. Another relevant parameter is PriorityCalcPeriod, which determines how often the priority is re-calculated for jobs on the cluster.
I'll use the following parameters as an example:
PriorityWeightAge = 8000
PriorityWeightJobSize = 1000
PriorityMaxAge = 08:00:00
PriorityCalcPeriod = 00:05:00
This means that jobs get a maximum of 8000 additional priority after being queued for 8 hours. Before the full 8 hours is up, jobs will get a fraction of that total priority as they sit in the queue. So after a job has been queued for 1 hour it will get 1/8 of the overall PriorityWeightAge, so it would get 1000 priority from age. The calculation to see how long it has been queued would happen every 5 minutes so you would see smaller changes than that in practice.
Big jobs will accrue priority from age as well, so this would effectively let you determine how long a job can be queued before jobs of a certain size will no longer have more priority than that job. As an example, here I have a job that requests a single node that has been queued for about 30 minutes. If I submit a new 5 node job you can see that it gets more JobSize based priority, but the amount of time the 1 node job has been queued is enough to still give it greater overall priority.
$ squeue -t pending
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1108 debug wrap ben PD 0:00 1 (Resources)
1113 debug wrap ben PD 0:00 5 (Priority)
$ sprio
JOBID PARTITION PRIORITY SITE AGE JOBSIZE
1108 debug 481 0 426 56
1113 debug 321 0 44 278
If a job was submitted that was large enough to have greater priority than existing jobs at submission time then it would always have more priority than those jobs because all jobs would accrue age-based priority at the same rate.
The relationship between age and size is always site specific and you will have to figure out the right balance for your environment. Hopefully this helps explain how to find that balance though. Let me know if you have any additional questions about this.
Thanks,
Ben
Thanks! I'll play with those settings and see what works best for us. You can close this out. I'm glad to hear that helps. I'll close this ticket. Thanks, Ben |