Hello, take this job #!/bin/bash #SBATCH -t 1:02:00 #SBATCH -N 1 #SBATCH -n 13 #SBATCH --gres=gpu:2 #SBATCH --tres-bind=gres/gpu:verbose,map:0,1 srun -l /home/plazonic/gputestranks.sh sleep 200 and the gputestranks.sh is #!/bin/bash echo `hostname`,`nvidia-smi --query-gpu=pci.bus_id --format=csv`,$CUDA_VISIBLE_DEVICES When this gets scheduled on GPUs 0,1 the output is 0: gpu-bind: usable_gres=0x1; bit_alloc=0x3; local_inx=2; global_list=0; local_list=0 1: gpu-bind: usable_gres=0x1; bit_alloc=0x3; local_inx=2; global_list=0; local_list=0 2: gpu-bind: usable_gres=0x1; bit_alloc=0x3; local_inx=2; global_list=0; local_list=0 3: gpu-bind: usable_gres=0x1; bit_alloc=0x3; local_inx=2; global_list=0; local_list=0 4: gpu-bind: usable_gres=0x1; bit_alloc=0x3; local_inx=2; global_list=0; local_list=0 5: gpu-bind: usable_gres=0x1; bit_alloc=0x3; local_inx=2; global_list=0; local_list=0 6: gpu-bind: usable_gres=0x1; bit_alloc=0x3; local_inx=2; global_list=0; local_list=0 7: gpu-bind: usable_gres=0x1; bit_alloc=0x3; local_inx=2; global_list=0; local_list=0 8: gpu-bind: usable_gres=0x1; bit_alloc=0x3; local_inx=2; global_list=0; local_list=0 9: gpu-bind: usable_gres=0x1; bit_alloc=0x3; local_inx=2; global_list=0; local_list=0 10: gpu-bind: usable_gres=0x1; bit_alloc=0x3; local_inx=2; global_list=0; local_list=0 11: gpu-bind: usable_gres=0x1; bit_alloc=0x3; local_inx=2; global_list=0; local_list=0 12: gpu-bind: usable_gres=0x1; bit_alloc=0x3; local_inx=2; global_list=0; local_list=0 7: mcmillan-r1g1,pci.bus_id 00000000:04:00.0,0 12: mcmillan-r1g1,pci.bus_id 00000000:03:00.0,0 0: mcmillan-r1g1,pci.bus_id 00000000:03:00.0,0 3: mcmillan-r1g1,pci.bus_id 00000000:04:00.0,0 1: mcmillan-r1g1,pci.bus_id 00000000:04:00.0,0 2: mcmillan-r1g1,pci.bus_id 00000000:03:00.0,0 11: mcmillan-r1g1,pci.bus_id 00000000:04:00.0,0 9: mcmillan-r1g1,pci.bus_id 00000000:04:00.0,0 8: mcmillan-r1g1,pci.bus_id 00000000:03:00.0,0 5: mcmillan-r1g1,pci.bus_id 00000000:04:00.0,0 10: mcmillan-r1g1,pci.bus_id 00000000:03:00.0,0 4: mcmillan-r1g1,pci.bus_id 00000000:03:00.0,0 6: mcmillan-r1g1,pci.bus_id 00000000:03:00.0,0 Exactly as expected. When it gets scheduled on GPUs #2,3 we get this 0: slurmstepd: error: Bind request gres/gpu:verbose,map:0,1 does not specify any devices within the allocation for task 0. Binding to the first device in the allocation instead. 1: slurmstepd: error: Bind request gres/gpu:verbose,map:0,1 does not specify any devices within the allocation for task 1. Binding to the first device in the allocation instead. 4: slurmstepd: error: Bind request gres/gpu:verbose,map:0,1 does not specify any devices within the allocation for task 4. Binding to the first device in the allocation instead. 4: gpu-bind: usable_gres=0x1; bit_alloc=0xC; local_inx=2; global_list=2; local_list=0 5: slurmstepd: error: Bind request gres/gpu:verbose,map:0,1 does not specify any devices within the allocation for task 5. Binding to the first device in the allocation instead. 5: gpu-bind: usable_gres=0x1; bit_alloc=0xC; local_inx=2; global_list=2; local_list=0 6: slurmstepd: error: Bind request gres/gpu:verbose,map:0,1 does not specify any devices within the allocation for task 6. Binding to the first device in the allocation instead. 6: gpu-bind: usable_gres=0x1; bit_alloc=0xC; local_inx=2; global_list=2; local_list=0 7: slurmstepd: error: Bind request gres/gpu:verbose,map:0,1 does not specify any devices within the allocation for task 7. Binding to the first device in the allocation instead. 7: gpu-bind: usable_gres=0x1; bit_alloc=0xC; local_inx=2; global_list=2; local_list=0 8: slurmstepd: error: Bind request gres/gpu:verbose,map:0,1 does not specify any devices within the allocation for task 8. Binding to the first device in the allocation instead. 8: gpu-bind: usable_gres=0x1; bit_alloc=0xC; local_inx=2; global_list=2; local_list=0 9: slurmstepd: error: Bind request gres/gpu:verbose,map:0,1 does not specify any devices within the allocation for task 9. Binding to the first device in the allocation instead. 9: gpu-bind: usable_gres=0x1; bit_alloc=0xC; local_inx=2; global_list=2; local_list=0 10: slurmstepd: error: Bind request gres/gpu:verbose,map:0,1 does not specify any devices within the allocation for task 10. Binding to the first device in the allocation instead. 10: gpu-bind: usable_gres=0x1; bit_alloc=0xC; local_inx=2; global_list=2; local_list=0 11: slurmstepd: error: Bind request gres/gpu:verbose,map:0,1 does not specify any devices within the allocation for task 11. Binding to the first device in the allocation instead. 11: gpu-bind: usable_gres=0x1; bit_alloc=0xC; local_inx=2; global_list=2; local_list=0 0: gpu-bind: usable_gres=0x1; bit_alloc=0xC; local_inx=2; global_list=2; local_list=0 1: gpu-bind: usable_gres=0x1; bit_alloc=0xC; local_inx=2; global_list=2; local_list=0 2: slurmstepd: error: Bind request gres/gpu:verbose,map:0,1 does not specify any devices within the allocation for task 2. Binding to the first device in the allocation instead. 2: gpu-bind: usable_gres=0x1; bit_alloc=0xC; local_inx=2; global_list=2; local_list=0 3: slurmstepd: error: Bind request gres/gpu:verbose,map:0,1 does not specify any devices within the allocation for task 3. Binding to the first device in the allocation instead. 3: gpu-bind: usable_gres=0x1; bit_alloc=0xC; local_inx=2; global_list=2; local_list=0 12: slurmstepd: error: Bind request gres/gpu:verbose,map:0,1 does not specify any devices within the allocation for task 12. Binding to the first device in the allocation instead. 12: gpu-bind: usable_gres=0x1; bit_alloc=0xC; local_inx=2; global_list=2; local_list=0 7: mcmillan-r1g1,pci.bus_id 00000000:83:00.0,0 11: mcmillan-r1g1,pci.bus_id 00000000:83:00.0,0 2: mcmillan-r1g1,pci.bus_id 00000000:82:00.0,0 5: mcmillan-r1g1,pci.bus_id 00000000:83:00.0,0 4: mcmillan-r1g1,pci.bus_id 00000000:82:00.0,0 6: mcmillan-r1g1,pci.bus_id 00000000:82:00.0,0 1: mcmillan-r1g1,pci.bus_id 00000000:83:00.0,0 10: mcmillan-r1g1,pci.bus_id 00000000:82:00.0,0 12: mcmillan-r1g1,pci.bus_id 00000000:82:00.0,0 0: mcmillan-r1g1,pci.bus_id 00000000:82:00.0,0 3: mcmillan-r1g1,pci.bus_id 00000000:83:00.0,0 8: mcmillan-r1g1,pci.bus_id 00000000:82:00.0,0 9: mcmillan-r1g1,pci.bus_id 00000000:83:00.0,0 Allocation clearly did what we wanted - we have alternate GPUs given to tasks - but why errors by slurmstepd? Are these fake errors or what is going on here? BTW this is the nvidia-smi on that machine +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 570.124.06 Driver Version: 570.124.06 CUDA Version: 12.8 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Tesla P100-PCIE-16GB On | 00000000:03:00.0 Off | 0 | | N/A 33C P0 27W / 250W | 0MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 Tesla P100-PCIE-16GB On | 00000000:04:00.0 Off | 0 | | N/A 35C P0 26W / 250W | 0MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 2 Tesla P100-PCIE-16GB On | 00000000:82:00.0 Off | 0 | | N/A 33C P0 26W / 250W | 0MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 3 Tesla P100-PCIE-16GB On | 00000000:83:00.0 Off | 0 | | N/A 32C P0 27W / 250W | 0MiB / 16384MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ Josko
Hi Josko, As per [1]: "If the task/cgroup plugin is used and ConstrainDevices is set in cgroup.conf, then the gres IDs are zero-based indexes relative to the gres allocated to the job (e.g. the first gres is 0, even if the global ID is 3). Otherwise, the gres IDs are global IDs, and all gres on each node in the job should be allocated for predictable binding results." I am assuming that ConstrainDevices is not used in the cluster you are testing. If that point is correct, the behaviour is expected, as it fallbacks to the default behaviour when the mapping is not possible. Regards, Carlos. [1] https://slurm.schedmd.com/sbatch.html#OPT_map:%3Clist%3E
Sorry, but we do have things configured correctly: [root@mcmillan-r1g1 ~]# scontrol show config |grep TaskPlugin TaskPlugin = task/cgroup,task/affinity TaskPluginParam = (null type) [root@mcmillan-r1g1 ~]# cat /etc/slurm/cgroup.conf ### Managed by puppet - do not change # # Slurm cgroup support configuration file # # See man slurm.conf and man cgroup.conf for further # information on cgroup configuration parameters #-- ConstrainCores=yes ConstrainRAMSpace=yes ConstrainDevices=yes After all, it probably would not work at all with map=0,1.
What do you mean exactly with: > After all, it probably would not work at all with map=0,1. Thank you!
I meant that if we did not configure ConstrainDevices=yes and used task/cgroup that in the case where the job is given GPUs #2,3 that it probably would not work as expected with --tres-bind=gres/gpu:map:0,1 - but it does.
Josko, I reproduced your issue and see the problem. I think it may be a bug. I am investigating a potential solution. -Scott