Hello, the change to exclusive behavior of srun has been causing some of our users grief (with jobs failing with no easy way to fix) and while srun --whole option would usually be the way to go this is not always that easy to do in some workflows. E.g. with petsc-e which uses mpiexec under the covers. Would it be possible to add --whole to sbatch and/or SLURM_WHOLE input environment variable so that we have a way to default to the old behavior easily. Thanks! Josko
Josko, I checked the code and it looks like documentation only issue. In my simple test using SLURM_WHOLE input variable works as expected: >(copy paste from Bug 10383 coment 26) >[salloc] bash-4.2# srun --ntasks-per-node=1 /bin/bash -c 'if [ $SLURM_NODEID -eq 0 ]; then scontrol show step; fi' >StepId=58889.0 UserId=0 StartTime=2020-12-14T12:09:34 TimeLimit=UNLIMITED > State=RUNNING Partition=par1 NodeList=test[01,08] > Nodes=2 CPUs=2 Tasks=2 Name=bash Network=(null) > TRES=cpu=2,mem=0,node=2 > ResvPorts=12043-12044 > CPUFreqReq=Default Dist=Cyclic > SrunHost:Pid=slurmctl:28541 >[salloc] bash-4.2# export SLURM_WHOLE=1 >[salloc] bash-4.2# srun --ntasks-per-node=1 /bin/bash -c 'if [ $SLURM_NODEID -eq 0 ]; then scontrol show step; fi' >StepId=58889.1 UserId=0 StartTime=2020-12-14T12:09:40 TimeLimit=UNLIMITED > State=RUNNING Partition=par1 NodeList=test[01,08] > Nodes=2 CPUs=64 Tasks=2 Name=bash Network=(null) > TRES=cpu=64,mem=0,node=2 > ResvPorts=12045-12046 > CPUFreqReq=Default Dist=Cyclic > SrunHost:Pid=slurmctl:28616 I'll prepare a documentation fix and keep you posted on the progress. Let me know if you notice any issue with input variable functionality though. cheers, Marcin
Great, I verified that it does work. It still might be nice to have it also in sbatch but this is good enough for us. Thanks!
Josko, The documentation fix is now merged[1]. We prefer not to add the option to sbatch, since it's really not related with allocation/batch step and we think that exclusive steps with isolated resources are really more intuitive and better way to go for the future. We may rethink that if we see more customers interested. Should you have any question please reopen. cheers, Marcin [1]https://github.com/SchedMD/slurm/commit/4c36b604451172bb6bea9c5e931273efe80275b3