Coming from bug 5579 to decouple this issue from there in a separate bug.
Hi Jülich colleagues, this has been fixed in following commit, available since 18.08.7: https://github.com/SchedMD/slurm/commit/cb599ecfcc24706e Behavior before: alex@polaris:~/t$ sbatch --exclusive : --exclusive --wrap "sleep 9999" Submitted batch job 20001 alex@polaris:~/t$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 20001+0 p1 wrap alex PD 0:00 1 (None) 20001+1 p1 wrap alex PD 0:00 1 (None) alex@polaris:~/t$ sbatch --exclusive --wrap "sleep 9999" Submitted batch job 20003 alex@polaris:~/t$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 20001+0 p1 wrap alex PD 0:00 1 (None) 20001+1 p1 wrap alex PD 0:00 1 (None) 20003 p1 wrap alex R 0:01 1 compute1 alex@polaris:~/t$ Hetjob is higher priority but regular job is allocated resources by main scheduler while hetjob waits for backfill cycle. Behavior after patch: alex@polaris:~/t$ sbatch --exclusive : --exclusive --wrap "sleep 9999" Submitted batch job 20010 alex@polaris:~/t$ sbatch --exclusive --wrap "sleep 9999" Submitted batch job 20012 alex@polaris:~/t$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 20010+0 p1 wrap alex PD 0:00 1 (None) 20010+1 p1 wrap alex PD 0:00 1 (None) 20012 p1 wrap alex PD 0:00 1 (Priority) alex@polaris:~/t$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 20010+0 p1 wrap alex PD 0:00 1 (None) 20010+1 p1 wrap alex PD 0:00 1 (None) 20012 p1 wrap alex PD 0:00 1 (Priority) alex@polaris:~/t$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 20012 p1 wrap alex PD 0:00 1 (Priority) 20010+0 p1 wrap alex R 0:04 1 compute1 20010+1 p1 wrap alex R 0:04 1 compute2 alex@polaris:~/t$
Hi Alejandro! Great job, thanks! :) Any possibility for a backport/patch for 17.11? Still it will take a few more months until we can actually update to 18.08 or jump to 19.05... Best Regards, Valantis
(In reply to Chrysovalantis Paschoulas from comment #6) > Hi Alejandro! > > Great job, thanks! :) > > Any possibility for a backport/patch for 17.11? Still it will take a few > more months until we can actually update to 18.08 or jump to 19.05... > > Best Regards, > Valantis I'd rather wait for more patches that are to come in bug 6710 and bug 6594, and once checked-in into 18.08 I can prepare a single patch for 17.11 combining all the different fixes related to hetjobs into a single standalone backport, if that sounds good to you. Right now I'm not sure what you currently have backported, and all the different fixes change same area of code that's why I prefer to combine everything once all is checked-in.
(In reply to Alejandro Sanchez from comment #7) > (In reply to Chrysovalantis Paschoulas from comment #6) > > Hi Alejandro! > > > > Great job, thanks! :) > > > > Any possibility for a backport/patch for 17.11? Still it will take a few > > more months until we can actually update to 18.08 or jump to 19.05... > > > > Best Regards, > > Valantis > > I'd rather wait for more patches that are to come in bug 6710 and bug 6594, > and once checked-in into 18.08 I can prepare a single patch for 17.11 > combining all the different fixes related to hetjobs into a single > standalone backport, if that sounds good to you. Right now I'm not sure what > you currently have backported, and all the different fixes change same area > of code that's why I prefer to combine everything once all is checked-in. I agree with you, that would be great! Thanks :) We would also like to avoid any mess with the patches..