Ticket 10332

Summary:	Job just needs 1 core but won't run unless node is empty
Product:	Slurm	Reporter:	Torkil Svensgaard <torkil>
Component:	Configuration	Assignee:	Dominik Bartkiewicz <bart>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	4 - Minor Issue
Priority:	---	CC:	bart, rkv
Version:	20.11.0
Hardware:	Linux
OS:	Linux
Site:	DRCMR	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---
Attachments:	slurm.conf

Description Torkil Svensgaard 2020-12-02 00:40:55 MST

From slurm.conf:

"
NodeName=bigger9 CPUs=48 Boards=1 SocketsPerBoard=1 CoresPerSocket=24 ThreadsPerCore=2 RealMemory=257552 Gres=gpu:1

PartitionName=debug Nodes=ALL MaxTime=INFINITE Shared=NO LLN=YES State=UP
PartitionName=HPC Nodes=ALL Default=YES MaxTime=INFINITE Shared=NO LLN=YES State=UP
PartitionName=application Nodes=ALL MaxTime=INFINITE Shared=YES LLN=YES State=UP
"

I then start a terminal on the application partition and submit a batch job, and the latter won't run until I clear the application queue even though it only needs a single core. What am I missing?

torkil@joe:/home/torkil$ sinfo
PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug          up   infinite      1    mix bigger9
HPC*           up   infinite      1    mix bigger9
application    up   infinite      1    mix bigger9
torkil@joe:/home/torkil$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
               746       HPC test_gpu   torkil PD       0:00      1 (Resources) 
               745 applicati xfce4-te   torkil  R      57:51      1 bigger9 
"

Batch job submitted by sbatch:

"
torkil@joe:/home/torkil/slurm$ cat test_gpu.sh 
#!/bin/bash
#SBATCH --partition=HPC
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=1 
#SBATCH --ntasks=1

nvidia-smi
"

Mvh.

Torkil

Comment 1 Torkil Svensgaard 2020-12-02 00:42:14 MST

*** Ticket 10333 has been marked as a duplicate of this ticket. ***

Comment 2 Torkil Svensgaard 2020-12-02 00:52:04 MST

Created attachment 16912 [details]
slurm.conf

Comment 3 Dominik Bartkiewicz 2020-12-02 02:03:02 MST

Hi

I think on those nodes is no enough free memory.
Could you send me output from?
sinfo -p HPC -O "AllocMem,Memory,NodeList,StateLong,Features,CPUsState"

Setting DefMemPerCPU is a good way to protect from such issues.

Dominik

Comment 4 Torkil Svensgaard 2020-12-02 02:19:16 MST

(In reply to Dominik Bartkiewicz from comment #3)
> Hi
> 
> I think on those nodes is no enough free memory.

There should be plenty of memory, the node is just running a single terminal and has 256GB RAM.

> Could you send me output from?
> sinfo -p HPC -O "AllocMem,Memory,NodeList,StateLong,Features,CPUsState"

"
# sinfo -p HPC -O "AllocMem,Memory,NodeList,StateLong,Features,CPUsState"
ALLOCMEM            MEMORY              NODELIST            STATE               AVAIL_FEATURES      CPUS(A/I/O/T)       
0                   257552              bigger9             mixed               (null)              2/46/0/48           
"

Comment 5 Dominik Bartkiewicz 2020-12-02 02:45:44 MST

Hi

Now I realized that you use 'SelectTypeParameters=CR_Core'.
Could you send me outputs from:
scontrol -d show job
scontrol -d show node

Dominik

Comment 6 Torkil Svensgaard 2020-12-02 03:07:52 MST

(In reply to Dominik Bartkiewicz from comment #5)
 
> Now I realized that you use 'SelectTypeParameters=CR_Core'.
> Could you send me outputs from:
> scontrol -d show job

"
# scontrol -d show job
JobId=745 JobName=xfce4-terminal
   UserId=torkil(1018) GroupId=torkil(1018) MCS_label=N/A
   Priority=4294901711 Nice=0 Account=(null) QOS=(null)
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=03:30:17 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2020-12-02T07:36:12 EligibleTime=2020-12-02T07:36:12
   AccrueTime=Unknown
   StartTime=2020-12-02T07:36:12 EndTime=Unknown Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-12-02T07:36:12
   Partition=application AllocNode:Sid=0.0.0.0:6132
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=bigger9
   BatchHost=bigger9
   NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=2,node=1,billing=2
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   JOB_GRES=(null)
     Nodes=bigger9 CPU_IDs=0-1 Mem=0 GRES=
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=xfce4-terminal
   WorkDir=/home/torkil
   Power=
   NtasksPerTRES:0

JobId=746 JobName=test_gpu.sh
   UserId=torkil(1018) GroupId=torkil(1018) MCS_label=N/A
   Priority=4294901710 Nice=0 Account=(null) QOS=(null)
   JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2020-12-02T07:36:22 EligibleTime=2020-12-02T07:36:22
   AccrueTime=2020-12-02T07:36:22
   StartTime=Unknown EndTime=Unknown Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-12-02T11:06:26
   Partition=HPC AllocNode:Sid=0.0.0.0:933761
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=1,node=1,billing=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/home/torkil/slurm/test_gpu.sh
   WorkDir=/home/torkil/slurm
   StdErr=/home/torkil/slurm/slurm-746.out
   StdIn=/dev/null
   StdOut=/home/torkil/slurm/slurm-746.out
   Power=
   TresPerNode=gpu:1
   NtasksPerTRES:0
"
> scontrol -d show node

"
# scontrol -d show node
NodeName=bigger9 Arch=x86_64 CoresPerSocket=24 
   CPUAlloc=2 CPUTot=48 CPULoad=0.00
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=gpu:1
   GresDrain=N/A
   GresUsed=gpu:0
   NodeAddr=bigger9 NodeHostName=bigger9 Version=20.11.0
   OS=Linux 4.18.0-193.28.1.el8_2.x86_64 #1 SMP Thu Oct 22 00:20:22 UTC 2020 
   RealMemory=257552 AllocMem=0 FreeMem=254680 Sockets=1 Boards=1
   State=MIXED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=debug,HPC,application 
   BootTime=2020-12-01T08:55:01 SlurmdStartTime=2020-12-01T11:43:02
   CfgTRES=cpu=48,mem=257552M,billing=48
   AllocTRES=cpu=2
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   Comment=(null)
"

Mvh.

Torkil

Comment 7 Torkil Svensgaard 2020-12-02 03:33:53 MST

If I submit the same sbatch job to the application partion instead of the HPC partition it runs right away. So the issue seems to be that related to partitions.

Mvh.

Torkil

Comment 8 Dominik Bartkiewicz 2020-12-02 04:18:00 MST

Hi

Jobs from 'HPC' cannot use this node if it is running job in sharing partitions. This is known and documented behavior.

Check https://slurm.schedmd.com/cons_res_share.html
Section "Nodes in Multiple Partitions" describes all possible combinations.

Dominik

Comment 9 Torkil Svensgaard 2020-12-03 05:21:24 MST

(In reply to Dominik Bartkiewicz from comment #8)
 
> Jobs from 'HPC' cannot use this node if it is running job in sharing
> partitions. 

Thanks, feel free to close the ticket.

Mvh.

Torkil