Created attachment 26092 [details] slurm.conf file On an internal system, we're seeing an issue where srun by itself works fine, but specifying --threads-per-core=1 or --hint=nomultithread is failing with "Cannot request more threads per core than the job allocation". This happens both in an salloc and when running srun by itself. dgloe@hotlum-login:~> srun hostname x1000c0s7b1n1 dgloe@hotlum-login:~> srun --threads-per-core=1 hostname srun: error: Unable to create step for job 6276: Cannot request more threads per core than the job allocation dgloe@hotlum-login:~> srun --hint=nomultithread hostname srun: error: Unable to create step for job 6277: Cannot request more threads per core than the job allocation dgloe@hotlum-login:~> salloc --threads-per-core=1 salloc: Granted job allocation 6278 salloc: Waiting for resource configuration salloc: Nodes x1000c0s7b1n1 are ready for job dgloe@hotlum-login:~> srun --threads-per-core=1 hostname srun: error: Unable to create step for job 6278: Cannot request more threads per core than the job allocation
I can reproduce this behavior. It looks to be an issue with select/linear specifically. I will keep you posted as I know more. Thanks, Skyler
This is a regression caused in 21.08.6. I have created a patch that is out for review.
Out of curiosity, why are you using `select/linear` instead of `select/cons_tres`? `select/cons_tres` can be used for whole node allocations too. Also I notice in your slurm.conf that you could simply the node section with the meta node NodeName=DEFAULT. ``` # slurm.conf NodeName=DEFAULT RealMemory=512000 Sockets=2 CoresPerSocket=64 ThreadsPerCore=2 State=idle NodeName=x1000c0s0b0n0 NodeName=x1000c0s0b0n1 NodeName=x1000c0s0b1n0 ... (omitted) ... NodeName=x1000c7s7b1n1 ```
This is how the system was set up by the admins, I'm not sure why they used select/linear. I've recommended them to use select/cons_res, which is what we typically use. Is there an advantage to use select/cons_tres instead of select/cons_res?
`select/cons_tres` is a super set of `select/cons_res` and has more features than `select/linear`. https://slurm.schedmd.com/cons_res.html#using_cons_tres > Slurm's default select/linear plugin is using a best fit algorithm based on > number of consecutive nodes. The same node allocation approach is used with > select/cons_res and select/cons_tres for consistency. > Consumable Trackable Resources (cons_tres) plugin provides all the same > functionality provided by the Consumable Resources (cons_res) plugin. It also > includes additional functionality specifically related to GPUs. > The --exclusive srun option allows users to request nodes in exclusive mode > even when consumable resources is enabled. See the srun man page for details.
Commit c728da23f8 merged in for 21.08.9, 22.05.4, and 23.02.0pre1. Please not that 21.08.9 does not have a planned release date and may not be released at all. Fixes are always upward propagated, so please consider 22.05.4 for the future should 21.08.9 not be released. Cheers, Skyler