Dear Slurm Team, we have a problem with different partitions and DefMemPerCPU. For example, we habe 2 partitions: "medium-fmz" and "medium-fas": gwdu105:3 14:17:38 ~ # scontrol show partition medium-fas PartitionName=medium-fas AllowGroups=ALL AllowAccounts=ALL AllowQos=normal,long,short AllocNodes=ALL Default=NO QoS=N/A DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=dmp[011-082] PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO OverTimeLimit=NONE PreemptMode=OFF State=UP TotalCPUs=1728 TotalNodes=72 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerCPU=5333 MaxMemPerNode=UNLIMITED gwdu105:3 14:20:12 ~ # scontrol show partition medium-fmz PartitionName=medium-fmz AllowGroups=ALL AllowAccounts=ALL AllowQos=normal,long,short AllocNodes=ALL Default=NO QoS=N/A DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=gwdd[001-168],gwdd[173-176] PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO OverTimeLimit=NONE PreemptMode=OFF State=UP TotalCPUs=3440 TotalNodes=172 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerCPU=3200 MaxMemPerNode=UNLIMITED The nodes have different memory and different amount of cores. We default to the amount of memory per CPU of each node class. For "medium-fas", this is "5333", for "medium-fmz", this is "3200". This is working when submitting to one of these partitions: tehlers@gwdu101:..users/tehlers> srun --pty -p medium-fas -t 05:00 -n 1 -N 1 --qos=short bash tehlers@dmp014:..users/tehlers> scontrol show job $SLURM_JOB_ID | grep MinMemoryCPU MinCPUsNode=1 MinMemoryCPU=5333M MinTmpDiskNode=0 tehlers@dmp014:..users/tehlers> cat /sys/fs/cgroup/memory/slurm/uid_$UID/job_$SLURM_JOB_ID/memory.limit_in_bytes 5592055808 tehlers@gwdu101:..users/tehlers> srun --pty -p medium-fmz -t 05:00 -n 1 -N 1 --qos=short bash tehlers@gwdd106:..users/tehlers> scontrol show job $SLURM_JOB_ID | grep MinMemoryCPU MinCPUsNode=1 MinMemoryCPU=3200M MinTmpDiskNode=0 tehlers@gwdd106:..users/tehlers> cat /sys/fs/cgroup/memory/slurm/uid_$UID/job_$SLURM_JOB_ID/memory.limit_in_bytes 3355443200 When I now submit to both partitions, Slurm uses one of the partitions, where the nodes are free at first. To be able to steer this however, I submit to both partitions, but request a specific node "dmp014" from the "medium-fas" partition: tehlers@gwdu101:..users/tehlers> srun --pty -w dmp014 -p medium-fmz,medium-fas -t 05:00 -n 1 -N 1 --qos=short bash tehlers@dmp014:..users/tehlers> cat /sys/fs/cgroup/memory/slurm/uid_$UID/job_$SLURM_JOB_ID/memory.limit_in_bytes 3355443200 tehlers@dmp014:..users/tehlers> scontrol show job $SLURM_JOB_ID | grep MinMemoryCPU MinCPUsNode=1 MinMemoryCPU=3200M MinTmpDiskNode=0 And you see, the allocated amount of memory is *not* from "medium-fas", but from "medium-fmz"! We already found out that this depends on the order of the partitions in the request string: tehlers@gwdu101:..users/tehlers> srun --pty -w dmp014 -p medium-fas,medium-fmz -t 05:00 -n 1 -N 1 --qos=short bash tehlers@dmp014:..users/tehlers> scontrol show job $SLURM_JOB_ID | grep MinMemoryCPU MinCPUsNode=1 MinMemoryCPU=5333M MinTmpDiskNode=0 tehlers@dmp014:..users/tehlers> cat /sys/fs/cgroup/memory/slurm/uid_$UID/job_$SLURM_JOB_ID/memory.limit_in_bytes 5592055808 This seems to be a bug, that DefMemPerCPU is not updated after the actual partition is chosen, when the job starts. It always uses only the first found definition when submitting (the one from the first partition in the list) and this is not updated later. Is this a know issue? May this be fixed in 19.05? Thanks Tim Ehlers
Hi Tim, I can reproduce this also in 19.05. There's historically been a lot of discussion[1] around this problem (I guess that's why Kilian added himself to CC since he was involved in some of them). Can you attach your current slurm.conf? I'm interested in a few options like EnforcePartLimits. [1] Some related commits and bugs: 17.11.7 https://github.com/SchedMD/slurm/commit/bf4cb0b1b0 17.11.8 https://github.com/SchedMD/slurm/commit/f07f53fc13 17.11.8 https://github.com/SchedMD/slurm/commit/d52d8f4f0c
Created attachment 10153 [details] slurm.conf
Created attachment 10154 [details] partitions.conf
Created attachment 10155 [details] nodes.conf sure, appended (3 files). To explain: We are using "dummy" partitions like "medium", the users should submit to and check for these partitions in "job_submit.lua". If submitted to "medium", the submit string is changed to "-p medium-faz,medium-fmz", as you advised us in the course in Goettingen. If we now can't honor DefMemPerCPU per partition, the whole mechanism would be kind of useless... :( Best
Hi. Just as an update I've triggered the review process for a patch for 19.05.
(In reply to Alejandro Sanchez from comment #10) > Hi. Just as an update I've triggered the review process for a patch for > 19.05. Thanks!
Tim, this has been fixed in the following commit available since Slurm 19.05.0rc2: https://github.com/SchedMD/slurm/commit/8a1e5a5250b3ce469c