We have a number of users having unexpected resource scheduling when they use "--gpus:4 --ntasks=1", resulting in wasted GPU cycles. If they don't specify "-N 1", there is a reasonable chance that Slurm has their job span multiple nodes and multiple tasks despite the --ntasks=1 request. We have plenty of nodes with 4 GPUs, so that is not a limitation. Is this working as expected? --ntasks is just a suggestion and we need to explicitly specific -N 1? Thanks, Kaylea
Kaylea, Slurm will allocate multiple nodes to satisfy the gpu requirement. Even if you asked for only 1 task. If you want it to only use 1 node you should specify -N1. -Scott
Understood. We will update our documentation accordingly.
Kaylea, After talking with some others, it seems that this is a bug and will change in 22.05.5. In 22.05.5 slurm should properly limit jobs with --ntasks=1 to 1 node. If such a node is not available this error will be given. >srun: error: Unable to allocate resources: Requested node configuration is not available -Scott
That's great news! I look forward to the fix. In the meantime, we will encourage the users to use -N 1.
Glad I could help.