We have at least two scenarios where the allocation node count and step node count are different, but we have to explicitly list the step node count to get the correct behavior. We are using cons_tres with CR_PACK_NODES Scenario 1: an ensemble where a group of nodes is allocated, but multiple parallel steps are intended to use them. ezy@login1:~> salloc -N4 -A STF002 -t 30:00 salloc: Granted job allocation 45957 salloc: Waiting for resource configuration salloc: Nodes spock[13-16] are ready for job ezy@spock13:~> srun -n1 hostname srun: Warning: can't run 1 processes on 4 nodes, setting nnodes to 1 spock13 Additionally, every step much be allocated onto each node, but with CR_PACK_NODES this results in a weird distribution: ezy@spock13:~> srun -n10 hostname | sort |uniq -c 7 spock13 1 spock14 1 spock15 1 spock16 it packed the tasks on the first node, but still had to put one task on each node (all 10 processes ideally would have landed on the first node). Scenario 2: users allocate "extra" nodes so they can survive node failures and restart during the same allocation (avoid having to wait in the queue again). Without explicitly listing a node count, Slurm will spread the step across all the nodes. Even if I unset SLURM_NNODES and SLURM_JOB_NUM_NODES it still seems to attempt to use all the nodes. Due to Bug #11494 I can't reset the node count to NO_VAL in cli_filter. Is there any way to avoid having the allocation node count "bleed over" to the step allocation?
Matt, I wanted to follow-up with you on the progress here. I have a behavioral change patch that I'm passing to our QA now. The patch removes the srun side enforcement for minimal requested nodes for step inside allocation if CR_PACK_NODES set. The code is only effective when SLURM_JOB_NUM_NODES is not set, as you know, it's normally treated as an input option for srun. I'll keep you posted on the review progress. cheers, Marcin
Matt, The behavior-changing commit is now in our master branch and will be released in Slurm 21.08[1]. I'm closing the bug report as fixed now - should you have any questions please reopen. cheers, Marcin [1]https://github.com/SchedMD/slurm/commit/e942cadb345cc2acbfb2b7155f40eead93b64b43
So users will still need to manually unset SLURM_JOB_NUM_NODES *and* SLURM_NNODES to actually see a behavior change? I'm glad that it's now possible, but it's still going to cause some issues here when users forget to (or don't know to) unset those. Based on the name, I would not expect SLURM_JOB_NUM_NODES to impact node count for a step allocation (only a job allocation). But I guess it was added as an alias to replace SLURM_NNODES, which sounds like it should impact both. I guess there's no sane way to make the srun ignore those environment variables.
Matt, >So users will still need to manually unset SLURM_JOB_NUM_NODES *and* SLURM_NNODES to actually see a behavior change? No - it's not the case after the patch, if CR_Pack_Nodes is enabled on 21.08 you'll see: ># salloc -N2 --exclusive >salloc: Pending job allocation 7 >salloc: job 7 queued and waiting for resources >salloc: job 7 has been allocated resources >salloc: Granted job allocation 7 >[salloc] bash-4.2# srun -n3 /bin/bash -c 'echo $SLURMD_NODENAME' >test01 >test01 >test01 or if the user explicitly sets number of nodes for step: >[salloc] bash-4.2# srun -n3 -N2 /bin/bash -c 'echo $SLURMD_NODENAME' >test01 >test01 >test02 or if we need more CPUs (for instance because of the default 1 CPU per task): >[salloc] bash-4.2# srun -n 64 /bin/bash -c 'echo $SLURMD_NODENAME' | sort | uniq -c > 64 test01 >[salloc] bash-4.2# srun -n 67 /bin/bash -c 'echo $SLURMD_NODENAME' | sort | uniq -c > 64 test01 > 3 test02 Let me know what you think. cheers, Marcin
(In reply to Marcin Stolarek from comment #15) > Let me know what you think. This sounds perfect. I misunderstood from your previous comment: > The code is only effective when SLURM_JOB_NUM_NODES is not set and assumed it was being set by the batch job itself. Thanks!
>and assumed it was being set by the batch job itself. Ah.. sorry it's obviously me not letting you know that the final approach taken was different than I actually thought about at the beginning. cheers, Marcin