Ticket 10332

Summary: Job just needs 1 core but won't run unless node is empty
Product: Slurm Reporter: Torkil Svensgaard <torkil>
Component: ConfigurationAssignee: Dominik Bartkiewicz <bart>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: bart, rkv
Version: 20.11.0   
Hardware: Linux   
OS: Linux   
Site: DRCMR Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: slurm.conf

Description Torkil Svensgaard 2020-12-02 00:40:55 MST
From slurm.conf:

"
NodeName=bigger9 CPUs=48 Boards=1 SocketsPerBoard=1 CoresPerSocket=24 ThreadsPerCore=2 RealMemory=257552 Gres=gpu:1

PartitionName=debug Nodes=ALL MaxTime=INFINITE Shared=NO LLN=YES State=UP
PartitionName=HPC Nodes=ALL Default=YES MaxTime=INFINITE Shared=NO LLN=YES State=UP
PartitionName=application Nodes=ALL MaxTime=INFINITE Shared=YES LLN=YES State=UP
"

I then start a terminal on the application partition and submit a batch job, and the latter won't run until I clear the application queue even though it only needs a single core. What am I missing?

torkil@joe:/home/torkil$ sinfo
PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug          up   infinite      1    mix bigger9
HPC*           up   infinite      1    mix bigger9
application    up   infinite      1    mix bigger9
torkil@joe:/home/torkil$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
               746       HPC test_gpu   torkil PD       0:00      1 (Resources) 
               745 applicati xfce4-te   torkil  R      57:51      1 bigger9 
"

Batch job submitted by sbatch:

"
torkil@joe:/home/torkil/slurm$ cat test_gpu.sh 
#!/bin/bash
#SBATCH --partition=HPC
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=1 
#SBATCH --ntasks=1

nvidia-smi
"

Mvh.

Torkil
Comment 1 Torkil Svensgaard 2020-12-02 00:42:14 MST
*** Ticket 10333 has been marked as a duplicate of this ticket. ***
Comment 2 Torkil Svensgaard 2020-12-02 00:52:04 MST
Created attachment 16912 [details]
slurm.conf
Comment 3 Dominik Bartkiewicz 2020-12-02 02:03:02 MST
Hi

I think on those nodes is no enough free memory.
Could you send me output from?
sinfo -p HPC -O "AllocMem,Memory,NodeList,StateLong,Features,CPUsState"

Setting DefMemPerCPU is a good way to protect from such issues.

Dominik
Comment 4 Torkil Svensgaard 2020-12-02 02:19:16 MST
(In reply to Dominik Bartkiewicz from comment #3)
> Hi
> 
> I think on those nodes is no enough free memory.

There should be plenty of memory, the node is just running a single terminal and has 256GB RAM.

> Could you send me output from?
> sinfo -p HPC -O "AllocMem,Memory,NodeList,StateLong,Features,CPUsState"

"
# sinfo -p HPC -O "AllocMem,Memory,NodeList,StateLong,Features,CPUsState"
ALLOCMEM            MEMORY              NODELIST            STATE               AVAIL_FEATURES      CPUS(A/I/O/T)       
0                   257552              bigger9             mixed               (null)              2/46/0/48           
"
Comment 5 Dominik Bartkiewicz 2020-12-02 02:45:44 MST
Hi

Now I realized that you use 'SelectTypeParameters=CR_Core'.
Could you send me outputs from:
scontrol -d show job
scontrol -d show node

Dominik
Comment 6 Torkil Svensgaard 2020-12-02 03:07:52 MST
(In reply to Dominik Bartkiewicz from comment #5)
 
> Now I realized that you use 'SelectTypeParameters=CR_Core'.
> Could you send me outputs from:
> scontrol -d show job

"
# scontrol -d show job
JobId=745 JobName=xfce4-terminal
   UserId=torkil(1018) GroupId=torkil(1018) MCS_label=N/A
   Priority=4294901711 Nice=0 Account=(null) QOS=(null)
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=03:30:17 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2020-12-02T07:36:12 EligibleTime=2020-12-02T07:36:12
   AccrueTime=Unknown
   StartTime=2020-12-02T07:36:12 EndTime=Unknown Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-12-02T07:36:12
   Partition=application AllocNode:Sid=0.0.0.0:6132
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=bigger9
   BatchHost=bigger9
   NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=2,node=1,billing=2
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   JOB_GRES=(null)
     Nodes=bigger9 CPU_IDs=0-1 Mem=0 GRES=
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=xfce4-terminal
   WorkDir=/home/torkil
   Power=
   NtasksPerTRES:0

JobId=746 JobName=test_gpu.sh
   UserId=torkil(1018) GroupId=torkil(1018) MCS_label=N/A
   Priority=4294901710 Nice=0 Account=(null) QOS=(null)
   JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2020-12-02T07:36:22 EligibleTime=2020-12-02T07:36:22
   AccrueTime=2020-12-02T07:36:22
   StartTime=Unknown EndTime=Unknown Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-12-02T11:06:26
   Partition=HPC AllocNode:Sid=0.0.0.0:933761
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=1,node=1,billing=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/home/torkil/slurm/test_gpu.sh
   WorkDir=/home/torkil/slurm
   StdErr=/home/torkil/slurm/slurm-746.out
   StdIn=/dev/null
   StdOut=/home/torkil/slurm/slurm-746.out
   Power=
   TresPerNode=gpu:1
   NtasksPerTRES:0
"
> scontrol -d show node

"
# scontrol -d show node
NodeName=bigger9 Arch=x86_64 CoresPerSocket=24 
   CPUAlloc=2 CPUTot=48 CPULoad=0.00
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=gpu:1
   GresDrain=N/A
   GresUsed=gpu:0
   NodeAddr=bigger9 NodeHostName=bigger9 Version=20.11.0
   OS=Linux 4.18.0-193.28.1.el8_2.x86_64 #1 SMP Thu Oct 22 00:20:22 UTC 2020 
   RealMemory=257552 AllocMem=0 FreeMem=254680 Sockets=1 Boards=1
   State=MIXED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=debug,HPC,application 
   BootTime=2020-12-01T08:55:01 SlurmdStartTime=2020-12-01T11:43:02
   CfgTRES=cpu=48,mem=257552M,billing=48
   AllocTRES=cpu=2
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   Comment=(null)
"

Mvh.

Torkil
Comment 7 Torkil Svensgaard 2020-12-02 03:33:53 MST
If I submit the same sbatch job to the application partion instead of the HPC partition it runs right away. So the issue seems to be that related to partitions.

Mvh.

Torkil
Comment 8 Dominik Bartkiewicz 2020-12-02 04:18:00 MST
Hi

Jobs from 'HPC' cannot use this node if it is running job in sharing partitions. This is known and documented behavior.

Check https://slurm.schedmd.com/cons_res_share.html
Section "Nodes in Multiple Partitions" describes all possible combinations.

Dominik
Comment 9 Torkil Svensgaard 2020-12-03 05:21:24 MST
(In reply to Dominik Bartkiewicz from comment #8)
 
> Jobs from 'HPC' cannot use this node if it is running job in sharing
> partitions. 

Thanks, feel free to close the ticket.

Mvh.

Torkil