Note: this description was written by Matt Davis, adding on CC Config: Nodes have Sockets=8 ThreadsPerCore=2 CoresPerSocket=8 CPUs=128. Description: As documented, when a step is invoked with the --gpus and --ntasks-per-gpu options set (without explicitly setting --ntasks), the number of tasks for the step is calculated. This appears to happen after the distribution for the job is already determined (see example 1), leading to inconsistencies. It is also documented that the number of CPUs needed for the step will be automatically increased if necessary to allow for any calculated task count. This does not appear to be the case anymore, possibly due to the implication of --exact when --cpus-per-task is explicitly set. The cpu allocation appears to be done before that recalculation is made (see example 2). Example 1: The distribution across nodes is cyclic because the implicit --nnodes=2 and --ntasks=2 when srun is invoked. The task count is recalculated to ntasks-per-gpu*gpus=16 after the distribution is set. Block distribution is expected since now ntasks>nnodes. $ salloc -N2 --gpus=16 $ srun -l --ntasks-per-gpu=1 bash -c 'echo $(hostname): $(grep Cpus_allowed_list /proc/self/status)' | sort -nk1 0: borg001: Cpus_allowed_list: 0 1: borg002: Cpus_allowed_list: 0 2: borg001: Cpus_allowed_list: 8 3: borg002: Cpus_allowed_list: 8 4: borg001: Cpus_allowed_list: 16 5: borg002: Cpus_allowed_list: 16 6: borg001: Cpus_allowed_list: 24 7: borg002: Cpus_allowed_list: 24 8: borg001: Cpus_allowed_list: 32 9: borg002: Cpus_allowed_list: 32 10: borg001: Cpus_allowed_list: 40 11: borg002: Cpus_allowed_list: 40 12: borg001: Cpus_allowed_list: 48 13: borg002: Cpus_allowed_list: 48 14: borg001: Cpus_allowed_list: 56 15: borg002: Cpus_allowed_list: 56 Example 2: In the first srun, the total number of cpus for the job is calculated using the implied value of --ntasks=1, leading to ntasks*cpus-per-task=2 cpus allocated for the step. The recalculation of steps to ntasks-per-gpu*gpus=2 happens after the cpu allocation, leading to the two tasks sharing the two allocated cpus. In the second srun, --ntasks=2 is made explicit so the total number of cores to be allocated is set correctly. The third srun show that the cpus are correctly allocated and bound when --exact is not implied. $ salloc -N1 --gpus=8 $ srun -l -c2 --gpus=2 --ntasks-per-gpu=1 bash -c 'grep Cpus_allowed_list /proc/self/status' | sort -nk1 0: Cpus_allowed_list: 48,56 1: Cpus_allowed_list: 48,56 $ srun -l -n2 -c2 --gpus=2 --ntasks-per-gpu=1 bash -c 'grep Cpus_allowed_list /proc/self/status' | sort -nk1 0: Cpus_allowed_list: 48-49 1: Cpus_allowed_list: 56-57 $ srun -l --gpus=2 --ntasks-per-gpu=1 bash -c 'grep Cpus_allowed_list /proc/self/status' | sort -nk1 0: Cpus_allowed_list: 48 1: Cpus_allowed_list: 56
Hi I can create some of the described behaviors. Could you send me gres.conf and slurm.conf? Dominik
Created attachment 25396 [details] slurm.conf
Created attachment 25397 [details] gres.conf
Hi Did you have a chance to test the patch from bug 14229? If yes, how does it change behaviors described in the initial comment? I have a patch that fixes the first cases from Example 2, and it is waiting for QA. Dominik
(In reply to Dominik Bartkiewicz from comment #5) > Hi > > Did you have a chance to test the patch from bug 14229? > If yes, how does it change behaviors described in the initial comment? > I have a patch that fixes the first cases from Example 2, and it is waiting > for QA. > > Dominik I have been out of the office since the 15th. We will test the patch from 14229 this week. Thanks!
(In reply to Dominik Bartkiewicz from comment #5) > Did you have a chance to test the patch from bug 14229? > If yes, how does it change behaviors described in the initial comment? > I have a patch that fixes the first cases from Example 2, and it is waiting > for QA. With the patch from 14229, the behavior listed in this bug seems to be unchanged.
Hi Sorry that this took so long. Those commits fix the reported issues and will be included in the next 22.05 release. https://github.com/SchedMD/slurm/commit/2eb61bb7bb https://github.com/SchedMD/slurm/commit/cbdac16a19 Please let me know if you have any additional questions or if this ticket is ready to close. Dominik
I am out of the office from the 7th through the 12th.