It would be nice to have an option in the slurm.conf to set a default auto-binding for the cases that auto-binding doesn't work. This would be seperate from TaskPluginParams and would allow the user to still override the cpu binding. brian@compy:~/slurm/14.11/compy$ srun -n3 ~/tools/whereami 0 compy1 - MASK:0xff 2 compy1 - MASK:0xff 1 compy1 - MASK:0xff
This will also help the case where auto binding doesn't work when using the --exclusive flag. brian@compy:~/slurm/14.11/compy$ srun -n2 --exclusive ~/tools/whereami 1 compy1 - MASK:0xff 0 compy1 - MASK:0xff debug: binding tasks:2 to nodes:1 sockets:1:0 cores:4:0 threads:8 lllp_distribution jobid [73970] auto binding off: mask_cpu brian@compy:~/slurm/14.11/compy$ srun -n2 ~/tools/whereami 0 compy1 - MASK:0x1 1 compy1 - MASK:0x10 debug: binding tasks:2 to nodes:0 sockets:0:1 cores:1:0 threads:2 lllp_distribution jobid [73971] implicit auto binding: threads, dist 2
Added in the following commits: 14.11: Added TaskpluginParam=autobind=threads (only thread because the protocol had to be changed to uint32_t to handle extra bits). https://github.com/SchedMD/slurm/commit/ea51f870c19b92f9e251cbdef54b1e9013da959a 15.08: Added sockets and cores to autobind option. https://github.com/SchedMD/slurm/commit/955ce4476fab0b26669d1710dc912b412194b709 E.g. brian@compy:~/slurm/master2/compy$ srun -n1 ~/tools/whereami | sort -h 0 compy1 - MASK:0x11 [Mar 5 09:43:54.256386 32577 0x7fbe9bfa0700] debug: binding tasks:1 to nodes:0 sockets:0:1 cores:1:0 threads:2 [Mar 5 09:43:54.256398 32577 0x7fbe9bfa0700] lllp_distribution jobid [5208] implicit auto binding: cores, dist 1 brian@compy:~/slurm/master2/compy$ srun -n2 ~/tools/whereami | sort -h 0 compy1 - MASK:0x1 1 compy1 - MASK:0x10 [Mar 5 09:43:58.229396 32577 0x7fbe9bfa0700] debug: binding tasks:2 to nodes:0 sockets:0:1 cores:1:0 threads:2 [Mar 5 09:43:58.229403 32577 0x7fbe9bfa0700] lllp_distribution jobid [5209] implicit auto binding: threads, dist 2 brian@compy:~/slurm/master2/compy$ srun -n2 --exclusive ~/tools/whereami | sort -h 0 compy1 - MASK:0x1 1 compy1 - MASK:0x10 [Mar 5 09:44:24.78206 32577 0x7fbe9bfa0700] debug: binding tasks:2 to nodes:1 sockets:1:0 cores:4:0 threads:8 [Mar 5 09:44:24.78225 32577 0x7fbe9bfa0700] lllp_distribution jobid [5210] default auto binding: threads, dist 2 brian@compy:~/slurm/master2/compy$ srun -n3 ~/tools/whereami | sort -h 0 compy1 - MASK:0x1 1 compy1 - MASK:0x10 2 compy1 - MASK:0x2 [Mar 5 09:44:34.256716 32577 0x7fbe9bfa0700] debug: binding tasks:3 to nodes:0 sockets:0:1 cores:2:0 threads:4 [Mar 5 09:44:34.256736 32577 0x7fbe9bfa0700] lllp_distribution jobid [5211] default auto binding: threads, dist 2 brian@compy:~/slurm/master2/compy$ srun -n4 ~/tools/whereami | sort -h 0 compy1 - MASK:0x1 1 compy1 - MASK:0x10 2 compy1 - MASK:0x2 3 compy1 - MASK:0x20 [Mar 5 09:45:14.246323 32577 0x7fbe9bfa0700] debug: binding tasks:4 to nodes:0 sockets:0:1 cores:2:0 threads:4 [Mar 5 09:45:14.246353 32577 0x7fbe9bfa0700] lllp_distribution jobid [5212] implicit auto binding: threads, dist 2 brian@compy:~/slurm/master2/compy$ srun -n4 --exclusive ~/tools/whereami | sort -h 0 compy1 - MASK:0x11 1 compy1 - MASK:0x22 2 compy1 - MASK:0x44 3 compy1 - MASK:0x88 [Mar 5 09:45:21.189590 32577 0x7fbe9bfa0700] debug: binding tasks:4 to nodes:1 sockets:1:0 cores:4:0 threads:8 [Mar 5 09:45:21.189600 32577 0x7fbe9bfa0700] lllp_distribution jobid [5213] implicit auto binding: cores, dist 2 brian@compy:~/slurm/master2/compy$ srun -n5 ~/tools/whereami | sort -h 0 compy1 - MASK:0x1 1 compy1 - MASK:0x10 2 compy1 - MASK:0x2 3 compy1 - MASK:0x20 4 compy1 - MASK:0x4 [Mar 5 09:45:29.166387 32577 0x7fbe9bfa0700] debug: binding tasks:5 to nodes:0 sockets:0:1 cores:3:0 threads:6 [Mar 5 09:45:29.166408 32577 0x7fbe9bfa0700] lllp_distribution jobid [5214] default auto binding: threads, dist 2 brian@compy:~/slurm/master2/compy$ srun -n6 ~/tools/whereami | sort -h 0 compy1 - MASK:0x1 1 compy1 - MASK:0x10 2 compy1 - MASK:0x2 3 compy1 - MASK:0x20 4 compy1 - MASK:0x4 5 compy1 - MASK:0x40 [Mar 5 09:45:41.702859 32577 0x7fbe9bfa0700] debug: binding tasks:6 to nodes:0 sockets:0:1 cores:3:0 threads:6 [Mar 5 09:45:41.702878 32577 0x7fbe9bfa0700] lllp_distribution jobid [5215] implicit auto binding: threads, dist 2 brian@compy:~/slurm/master2/compy$ srun -n6 --exclusive ~/tools/whereami | sort -h 0 compy1 - MASK:0x1 1 compy1 - MASK:0x10 2 compy1 - MASK:0x2 3 compy1 - MASK:0x20 4 compy1 - MASK:0x4 5 compy1 - MASK:0x40 [Mar 5 09:45:49.62847 32577 0x7fbe9bfa0700] debug: binding tasks:6 to nodes:1 sockets:1:0 cores:4:0 threads:8 [Mar 5 09:45:49.62877 32577 0x7fbe9bfa0700] lllp_distribution jobid [5216] default auto binding: threads, dist 2