Ticket 10333

Summary: Job just needs 1 core but won't run unless node is empty
Product: Slurm Reporter: Torkil Svensgaard <torkil>
Component: ConfigurationAssignee: Director of Support <support>
Status: RESOLVED DUPLICATE QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: rkv
Version: 20.11.0   
Hardware: Linux   
OS: Linux   
Site: DRCMR Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Torkil Svensgaard 2020-12-02 00:40:55 MST
From slurm.conf:

"
NodeName=bigger9 CPUs=48 Boards=1 SocketsPerBoard=1 CoresPerSocket=24 ThreadsPerCore=2 RealMemory=257552 Gres=gpu:1

PartitionName=debug Nodes=ALL MaxTime=INFINITE Shared=NO LLN=YES State=UP
PartitionName=HPC Nodes=ALL Default=YES MaxTime=INFINITE Shared=NO LLN=YES State=UP
PartitionName=application Nodes=ALL MaxTime=INFINITE Shared=YES LLN=YES State=UP
"

I then start a terminal on the application partition and submit a batch job, and the latter won't run until I clear the application queue even though it only needs a single core. What am I missing?

torkil@joe:/home/torkil$ sinfo
PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug          up   infinite      1    mix bigger9
HPC*           up   infinite      1    mix bigger9
application    up   infinite      1    mix bigger9
torkil@joe:/home/torkil$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
               746       HPC test_gpu   torkil PD       0:00      1 (Resources) 
               745 applicati xfce4-te   torkil  R      57:51      1 bigger9 
"

Batch job submitted by sbatch:

"
torkil@joe:/home/torkil/slurm$ cat test_gpu.sh 
#!/bin/bash
#SBATCH --partition=HPC
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=1 
#SBATCH --ntasks=1

nvidia-smi
"

Mvh.

Torkil
Comment 1 Torkil Svensgaard 2020-12-02 00:42:14 MST
Submit never returned for 10332 but seems it was submitted

*** This ticket has been marked as a duplicate of ticket 10332 ***