Created attachment 27960 [details] slurm config file Hi, When submitting a job to a GPU cluster, we're not quite understanding why the step_0 cgroup only gets 16 cores instead of the expected 32. This is the job submission command: /usr/bin/salloc --reservation=maintenance2022Q4 --cpus-per-gpu=32 --gres=gpu:1 --job-name=INTERACTIVE --mail-type=NONE --nodes=1 --ntasks-per-node=32 --ntasks=32 --time=3-00:00:00 /usr/bin/srun --chdir=/user/gent/400/vsc40003 --cpu-bind=none --export=USER,HOME,TERM --mem=0 --mpi=none --nodes=1 --ntasks=1 --pty /bin/bash -i -l salloc: Granted job allocation 40271421 salloc: Waiting for resource configuration salloc: Nodes node3303.joltik.os are ready for job Which then yields: [vsc40003@node3303 ~]$ nproc 16 Looking at the job's info I see: TRES=cpu=32,mem=262080M,node=1,billing=33,gres/gpu=1 Looking at the cgroups, I see: [root@node3303 job_40271421]# cat cpuset.cpus 0-31 Idem for step_extern But [root@node3303 step_0]# cat cpuset.cpus 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 Somehow this job step only gets the even cores. Is this something expected, do we need to configure something differently? When not asking for any GPUs, we do see that 32 cores are assigned to this job step. Our cgroup config is: AllowedSwapSpace=0 CgroupAutomount=yes ConstrainCores=yes ConstrainDevices=yes ConstrainRAMSpace=yes ConstrainSwapSpace=yes -- Andy
Andy, I can't easily reproduce the behavior. Could you please attach the output of `lstopo-no-graphics` and your gres.conf? cheers, Marcin
Could you please take a look at last comment. cheers, Marcin
Hi, We're in the process of changing the config, and will see if this gets fixed. -- Andy
Any update from your side?
Hi, We're trying the upstream/slurm-22.05 branch to see if this works out better, but so far, no luck afaik. -- Andy
Is this ticket effectively a duplicate of Bug 15614?
Is there anything else I can help you with in the bug report?
this is ok now. you can close this ticket