Dear team, We have the following issue: We submit a job, and the number of tasks seems to "fill" the node. In this case, this is a 16 cpus node, and the task was executed 8 time. I would expect the number of tasks to be 1. ************** [sagon@master carrac] $ cat test.sh #!/bin/bash #SBATCH --cpus-per-task=2 #SBATCH --partition=shared-EL7 srun hostname ************** ************** [sagon@master carrac] $ sbatch test.sh Submitted batch job 40617634 [sagon@master carrac] $ scontrol show job 40617634 JobId=40617634 JobName=test.sh UserId=sagon(240477) GroupId=unige(1000) MCS_label=N/A Priority=33528 Nice=0 Account=rossigno QOS=normal JobState=COMPLETED Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:02 TimeLimit=00:01:00 TimeMin=N/A SubmitTime=2020-11-23T18:12:57 EligibleTime=2020-11-23T18:12:57 AccrueTime=2020-11-23T18:12:57 StartTime=2020-11-23T18:13:20 EndTime=2020-11-23T18:13:22 Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-11-23T18:13:20 Partition=shared-EL7 AllocNode:Sid=master:232090 ReqNodeList=(null) ExcNodeList=(null) NodeList=node007 BatchHost=node007 NumNodes=1 NumCPUs=16 NumTasks=0 CPUs/Task=2 ReqB:S:C:T=0:0:*:* TRES=cpu=16,mem=48000M,node=1,billing=16 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=2 MinMemoryCPU=3000M MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null) Command=/home/sagon/tests/carrac/test.sh WorkDir=/home/sagon/tests/carrac StdErr=/home/sagon/tests/carrac/slurm-40617634.out StdIn=/dev/null StdOut=/home/sagon/tests/carrac/slurm-40617634.out Power= ************** ************** [sagon@master carrac] $ cat /etc/slurm/slurm.d/nodes.conf | grep PartitionName=shared-EL7 PartitionName=shared-EL7 Nodes=node[005[...],247-264] Shared=EXCLUSIVE Priority=1 Default=NO DefaultTime=00:01:00 MaxTime=12:00:00 State=UP ************** We are using SelectType=select/cons_tres in slurm.conf. Thanks for the help Yann
Hello Yann, This is expected behavior. From https://slurm.schedmd.com/srun.html#OPT_cpus-per-task: "If -c (--cpus-per-task) is specified without -n, as many tasks will be allocated per node as possible while satisfying the -c restriction." So just specify -n to limit this default behavior. Thanks, -Michael
Hello, many thanks for your prompt reply. Indeed I wasn't specifying the number of tasks and I thought it was 1 per default. Is this a new behavior? If I submit the same job on a partition which doesn't contains the flag "Shared=EXCLUSIVE" there is only one task created. Is this expected? Thanks
(In reply to Yann from comment #3) > Indeed I wasn't specifying the number of tasks and I thought it was 1 per > default. Is this a new behavior? The default is n=1. But when you specify -c, it can override that default. > If I submit the same job on a partition which doesn't contains the flag > "Shared=EXCLUSIVE" there is only one task created. Is this expected? Yes. With EXCLUSIVE, you get the whole node. So the # of CPUs allocated is 16, and -c2 means that the number of tasks will increase to 8. Without EXCLUSIVE, you will probably only get 1 task, leading to 1 CPU allocated by default. On my system, -c2 makes it fail because there are not enough CPUs. Note that `scontrol show job` does not show the number of tasks increase, but `scontrol show steps` does.
I'll go ahead and mark this as resolved. Feel free to reopen if you have further questions. Thanks! -Michael
Thanks for your help. My comment about this functionality: I would say that for me it would have been more useful if the number of cpus per tasks would increases to fill the node instead of the number of tasks. In our use case the user wasn't using a MPI job and so it's multithreaded job was launched 8 times with 2 cores instead of once with 16 cores. Best