We are using cons_res and have LLN set our our RDTN partition. [root@es-slurm ~]# scontrol show part rdtn|grep LLN MaxNodes=1 MaxTime=16:00:00 MinNodes=0 LLN=YES MaxCPUsPerNode=UNLIMITED If someone submits with --ntasks-per-node=1, then all of their jobs seem to pile up on the same node: Matthew.Ezell@gaea10:~/llnfail> sinfo --local -N -n dtn[01-16] -p rdtn -o '%10N %8T %C' NODELIST STATE CPUS(A/I/O/T) dtn01 mixed 3/125/0/128 dtn02 mixed 4/124/0/128 dtn03 drained 0/0/128/128 dtn04 mixed 3/125/0/128 dtn05 mixed 3/125/0/128 dtn06 mixed 4/124/0/128 dtn07 mixed 3/125/0/128 dtn08 mixed 3/125/0/128 dtn09 mixed 2/126/0/128 dtn10 mixed 3/125/0/128 dtn11 mixed 3/125/0/128 dtn12 mixed 2/126/0/128 dtn13 mixed 2/126/0/128 dtn14 mixed 3/125/0/128 dtn15 drained 0/0/128/128 dtn16 down* 0/0/128/128 Matthew.Ezell@gaea10:~/llnfail> for i in $(seq 1 20);do sbatch -p rdtn -Mes --ntasks-per-node=1 --wrap "hostname && sleep 600";done Submitted batch job 69430135 on cluster es Submitted batch job 69430136 on cluster es Submitted batch job 69430137 on cluster es Submitted batch job 69430138 on cluster es Submitted batch job 69430139 on cluster es Submitted batch job 69430140 on cluster es Submitted batch job 69430141 on cluster es Submitted batch job 69430142 on cluster es Submitted batch job 69430143 on cluster es Submitted batch job 69430144 on cluster es Submitted batch job 69430145 on cluster es Submitted batch job 69430146 on cluster es Submitted batch job 69430147 on cluster es Submitted batch job 69430148 on cluster es Submitted batch job 69430149 on cluster es Submitted batch job 69430150 on cluster es Submitted batch job 69430151 on cluster es Submitted batch job 69430152 on cluster es Submitted batch job 69430153 on cluster es Submitted batch job 69430154 on cluster es Matthew.Ezell@gaea10:~/llnfail> sinfo --local -N -n dtn[01-16] -p rdtn -o '%10N %8T %C' NODELIST STATE CPUS(A/I/O/T) dtn01 mixed 23/105/0/128 dtn02 mixed 4/124/0/128 dtn03 drained 0/0/128/128 dtn04 mixed 3/125/0/128 dtn05 mixed 2/126/0/128 dtn06 mixed 4/124/0/128 dtn07 mixed 3/125/0/128 dtn08 mixed 2/126/0/128 dtn09 mixed 1/127/0/128 dtn10 mixed 3/125/0/128 dtn11 mixed 3/125/0/128 dtn12 mixed 2/126/0/128 dtn13 mixed 2/126/0/128 dtn14 mixed 3/125/0/128 dtn15 drained 0/0/128/128 dtn16 down* 0/0/128/128 Matthew.Ezell@gaea10:~/llnfail> cat *|sort|uniq -c 20 dtn01 If you don't specify --ntasks-per-node, they seems to be spread out as you might expect <snip> Matthew.Ezell@gaea10:~/llnfail> cat *|sort|uniq -c 2 dtn01 1 dtn02 1 dtn04 2 dtn05 1 dtn06 1 dtn07 2 dtn08 3 dtn09 1 dtn10 1 dtn11 2 dtn12 2 dtn13 1 dtn14 This is not the behavior I expected - can you help me understand if this is a misconfiguration on our part or a Slurm bug? Thanks.
Hi Matt. We will need to look into this and get back to you once we analyze this a bit more.
Matt, I did reproduce your issue on both 19.05 and master branch, the issue is not specific to cons_res (cons_tres works the same way). I see where the issue comes from, but will need to work a little bit longer on the case to find appropriate solution. cheers, Marcin
Matt, I discussed the bug with our senior developer and we concluded that fixing it requires changes deeply in select plugin. We cannot introduce in a released version. I'll continue working on the patch for master branch. cheers, Marcin
(In reply to Marcin Stolarek from comment #5) > Matt, > > I discussed the bug with our senior developer and we concluded that fixing > it requires changes deeply in select plugin. We cannot introduce in a > released version. I'll continue working on the patch for master branch. > > cheers, > Marcin Thanks for the update. Luckily this cluster is for data transfer (not MPI), so --ntasks-per-node > 1 doesn't make sense. We've asked users to omit this parameter in the meantime. What's the likelihood that you will be able to get a viable before the 20.11 code cutoff? Are we looking at 21.08?
Matt, I have a patch I'm passing to review now. I think that we should be able to fix it ahead of 20.11 release, however, it's not something I can guarantee today. The patch can't be easily applied to Slurm 19.05 cons_res, it can be adopted to cons_tres though. Starting from version 20.02 the patch is compatible with both consumable resources plugins. cheers, Marcin
Matt, The issue is now fixed on our master branch by the following commits: 37d40e94ff Fix _eval_nodes_lln in cons_tres and cons_res 9339f36a00 No logic change - remove redundant line from _eval_nodes_lln 5ce271ee13 Use avail_res->max_cpus as number of CPUs available on node b88992a6ac No logic change - use avail_cpus instead of max_cpus Those will be released in Slurm 20.11 as touching the fundamental part of the resource selection, but we didn't see any regression and technically you can apply them on top of 20.02 (there is no protocol/compatibility change). (For 19.05 as described in comment 8). I'm closing the bug report now. Should you have any questions please don't hesitate to reopen. cheers, Marcin