Splitting this from Bug 9670 In the case of CR_Core_Memory and --hint=nomultithread (--threads-per-core=1) the job memory required per node is calculated differently for whole node allocation vs specific --exclusive added to srun. This doesn't happen for CR_CPU (CPUs are consumable resources so we "charge" only for used threads, because --threads-per-core=1 makes use of only one thread per core total number of CPUs is half the number of those in case of CR_Core. Node configuration: NodeName=DEFAULT RealMemory=900 CPUS=72 Sockets=2 CoresPerSocket=18 ThreadsPerCore=2 1)SelectTypeParameters = CR_CPU_MEMORY # srun --mem-per-cpu=10 -n2 -c18 --exclusive --hint=nomultithread /bin/bash -c "scontrol show job $SLURM_JOB_ID | grep mem; echo STEP; scontrol show step $SLURM_JOB_ID | grep mem" TRES=cpu=72,mem=360M,node=1,billing=72 TRES=cpu=72,mem=360M,node=1,billing=72 STEP STEP TRES=cpu=36,mem=360M,node=1 TRES=cpu=36,mem=360M,node=1 # srun --mem-per-cpu=10 -n2 -c18 --hint=nomultithread /bin/bash -c "scontrol show job $SLURM_JOB_ID | grep mem; echo STEP; scontrol show step $SLURM_JOB_ID | grep mem" TRES=cpu=36,mem=360M,node=1,billing=36 TRES=cpu=36,mem=360M,node=1,billing=36 STEP STEP TRES=cpu=36,mem=360M,node=1 TRES=cpu=36,mem=360M,node=1 2)SelectTypeParameters = CR_Core_Memory ***Job memory for non-exclusive allocation is higher than step*** # srun --mem-per-cpu=10 -n2 -c18 --hint=nomultithread /bin/bash -c "scontrol show job $SLURM_JOB_ID | grep mem; echo STEP; scontrol show step $SLURM_JOB_ID | grep mem" TRES=cpu=72,mem=720M,node=1,billing=72 TRES=cpu=72,mem=720M,node=1,billing=72 STEP STEP TRES=cpu=36,mem=360M,node=1 TRES=cpu=36,mem=360M,node=1 # srun --mem-per-cpu=10 -n2 --exclusive -c18 --hint=nomultithread /bin/bash -c "scontrol show job $SLURM_JOB_ID | grep mem; echo STEP; scontrol show step $SLURM_JOB_ID | grep mem" TRES=cpu=72,mem=360M,node=1,billing=72 STEP TRES=cpu=72,mem=360M,node=1,billing=72 STEP TRES=cpu=36,mem=360M,node=1 TRES=cpu=36,mem=360M,node=1
I have a patch altering this behavior - adjusting number of allocated CPUs to specified --threads-per-core (used behind the scene by --hint=nomultithread to request 1 thread per core). I'm passing it to the review queue now. Let me know if you want to give it a try locally. cheers, Marcin
*** Ticket 9931 has been marked as a duplicate of this ticket. ***
*** Ticket 9153 has been marked as a duplicate of this ticket. ***
Handling of the case in comment 0 should be improved by 49a7d7f9fb, which will be part of 20.11.0 release. cheers, Marcin
*** Ticket 10262 has been marked as a duplicate of this ticket. ***
Hi, Could you commit this fix 49a7d7f9fb to 20.02 branch? I applied it to our 20.02.6 build and it solved the scheduling problem.
Tommi, The change in behavior made in the related commit is probably reasonable for the majority of users, but since it's not a clear bug fix but a change we don't want to backport it to older major releases. If it's feasible for you you can always keep the commit in your tree, it's a one place change that shouldn't conflict with other changes happening on 20.02. cheers, Marcin
*** Ticket 10434 has been marked as a duplicate of this ticket. ***