11083 – Erratic GPU allocation

Ticket 11083 - Erratic GPU allocation

Summary: Erratic GPU allocation

Status:	RESOLVED FIXED

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Scheduling (show other tickets)
Version:	20.11.2
Hardware:	Linux Linux

Severity:	4 - Minor Issue
Assignee:	Dominik Bartkiewicz
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2021-03-15 01:46 MDT by Greg Wickham
Modified:	2021-04-28 11:28 MDT (History)
CC List:	5 users (show)

See Also:
Site:	KAUST
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:	20.11.6
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
slurm.conf (4.37 KB, text/plain) 2021-03-15 02:36 MDT, Ahmed Essam ElMazaty	Details
gres.conf (9.00 KB, text/plain) 2021-03-15 02:36 MDT, Ahmed Essam ElMazaty	Details
Fragments related to two jobs from Slurmctld (9.18 KB, text/plain) 2021-03-16 07:53 MDT, Greg Wickham	Details
Testing submission - slurmctld log (28.75 KB, text/plain) 2021-03-16 08:02 MDT, Greg Wickham	Details
Partitions.conf (6.28 KB, text/plain) 2021-03-16 08:11 MDT, Greg Wickham	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Greg Wickham 2021-03-15 01:46:13 MDT

Allocation resources with:

    srun --gpus-per-task=1 --ntasks=2 --nodes=2 --time 00:10:00 --pty /bin/bash -i

Has resulted in different allocations.

JobID|AllocTRES
14726565|billing=8,cpu=8,gres/gpu=3,mem=16G,node=2
14726565.extern|billing=8,cpu=8,gres/gpu=3,mem=16G,node=2
14726565.0|cpu=2,gres/gpu:gtx1080ti=2,gres/gpu=2,mem=0,node=2
14733058|billing=8,cpu=8,gres/gpu=5,mem=16G,node=2
14733058.extern|billing=8,cpu=8,gres/gpu=5,mem=16G,node=2
14733058.0|cpu=2,gres/gpu:gtx1080ti=2,gres/gpu=2,mem=0,node=2


Job # 14726565 was allocated 3 GPUs
Job # 14733058 was allocated 5 GPUs

Expected behavior is 1 task on each node, with each task being allocated 1 GPU.

   -Greg

Comment 1 Ahmed Essam ElMazaty 2021-03-15 02:36:12 MDT

Created attachment 18426 [details]
slurm.conf

Comment 2 Ahmed Essam ElMazaty 2021-03-15 02:36:40 MDT

Created attachment 18427 [details]
gres.conf

Comment 5 Marcin Stolarek 2021-03-16 02:38:37 MDT

Could you please set SlurmctldDebug to at least verbose, enable GRES debug flag and share slurmctld logs from the time when jobs are submitted and started?

cheers,
Marcin

Comment 6 Greg Wickham 2021-03-16 07:53:29 MDT

Created attachment 18468 [details]
Fragments related to two jobs from Slurmctld

Comment 7 Greg Wickham 2021-03-16 08:02:03 MDT

Created attachment 18469 [details]
Testing submission - slurmctld log

$ srun --gpus-per-task=1 --ntasks=2 --nodes=2 --time 00:10:00 --pty /bin/bash -i
srun: job 590 queued and waiting for resources
srun: job 590 has been allocated resources


$ scontrol show -d job=590
JobId=590 JobName=bash
   UserId=wickhagj(100302) GroupId=g-wickhagj(1100302) MCS_label=N/A
   Priority=889 Nice=0 Account=root QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=00:00:13 TimeLimit=00:10:00 TimeMin=N/A
   SubmitTime=2021-03-16T16:55:24 EligibleTime=2021-03-16T16:55:24
   AccrueTime=2021-03-16T16:55:24
   StartTime=2021-03-16T16:55:24 EndTime=2021-03-16T17:05:24 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-03-16T16:55:24
   Partition=batch AllocNode:Sid=slurm-02:2418
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=dgpu502-[29,33]
   BatchHost=dgpu502-29
   NumNodes=2 NumCPUs=8 NumTasks=2 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=8,mem=16G,node=2,billing=8,gres/gpu=8
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   JOB_GRES=gpu:8
     Nodes=dgpu502-[29,33] CPU_IDs=0-3 Mem=8192 GRES=gpu:4(IDX:0-3)
   MinCPUsNode=1 MinMemoryCPU=2G MinTmpDiskNode=0
   Features=nolmem DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/bin/bash
   WorkDir=/home/wickhagj
   Power=
   CpusPerTres=gpu:4
   TresPerTask=gpu:1
   NtasksPerTRES:0

$ sacct -j 590 -P --format=jobid,alloctres
JobID|AllocTRES
590|billing=8,cpu=8,gres/gpu=8,mem=16G,node=2
590.extern|billing=8,cpu=8,gres/gpu=8,mem=16G,node=2
590.0|cpu=2,gres/gpu:gtx1080ti=2,gres/gpu=2,mem=0,node=2

Comment 8 Dominik Bartkiewicz 2021-03-16 08:08:58 MDT

Hi

Could you send us partitions.conf?

Dominik

Comment 9 Greg Wickham 2021-03-16 08:10:50 MDT

The full debug logs will be uploaded tomorrow.

Comment 10 Greg Wickham 2021-03-16 08:11:24 MDT

Created attachment 18470 [details]
Partitions.conf

Comment 11 Dominik Bartkiewicz 2021-03-16 09:04:10 MDT

Hi

I can recreate this issue.
I will let you know when the fix will be available.

Dominik

Comment 18 Dominik Bartkiewicz 2021-03-31 09:41:49 MDT

Hi

This commit should fix this issue.
It will be available in slurm 20.11.6 and above.
https://github.com/SchedMD/slurm/commit/bdf66674f9e0f03

Dominik

Comment 19 Dominik Bartkiewicz 2021-04-02 04:35:50 MDT

Hi

Is there anything else I can do to help or are you ok to close this ticket?

Dominik

Comment 20 Greg Wickham 2021-04-04 00:37:43 MDT

Hi Dominik,

If the bug has been resolved, the ticket can be closed.

thanks,

   -greg

Comment 22 Greg Wickham 2021-04-28 11:28:13 MDT

We upgraded to 20.11.6 today and it's working great.

Thanks Dominik.

   -Greg