| Summary: | Question about --gpus-per-task | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Jonas Stare <jonst> |
| Component: | Configuration | Assignee: | Marcin Stolarek <cinek> |
| Status: | RESOLVED DUPLICATE | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | cinek |
| Version: | 20.02.4 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | SNIC | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | NSC | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | sigma |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: | slurm.conf for sigma | ||
|
Description
Jonas Stare
2020-10-30 09:19:43 MDT
Jonas,
I tried to reproduce it with:
># sbatch --mem=10 --gpus-per-task=2 -n1 -w test01 --wrap="sleep 100"
>Submitted batch job 54114
># sbatch --mem=10 --gpus-per-task=2 -n1 -w test01 --wrap="sleep 100"
>Submitted batch job 54115
># squeue
> JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
> 54114 AllNodes wrap root R 0:01 1 test01
> 54115 AllNodes wrap root R 0:01 1 test01
># scontrol show node test01
>NodeName=test01 Arch=x86_64 CoresPerSocket=16
> CPUAlloc=8 CPUTot=128 CPULoad=0.01
> AvailableFeatures=(null)
> ActiveFeatures=(null)
> Gres=gpu:4(S:0-1)
> NodeAddr=slurmctl NodeHostName=slurmctl Port=30001 Version=20.02.5
> OS=Linux 3.10.0-957.5.1.el7.x86_64 #1 SMP Fri Feb 1 14:54:57 UTC 2019
> RealMemory=900 AllocMem=20 FreeMem=42 Sockets=2 Boards=1
> State=MIXED ThreadsPerCore=4 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
> Partitions=AllNodes
> BootTime=2020-11-02T09:32:25 SlurmdStartTime=2020-11-02T14:43:21
> CfgTRES=cpu=128,mem=900M,billing=128,gres/gpu=4
> AllocTRES=cpu=8,mem=20M,gres/gpu=4
> CapWatts=n/a
> CurrentWatts=0 AveWatts=0
> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Did I understand your description correctly? Could you please share some commands results with reproducer?
cheers,
Marcin
Yes. Here in an example when we try starting two jobs asking (what I would expect 2 gpus) on the same node that has 4 gpus. First job. > [jonst@sign ~]$ srun -n1 -t10 --gpus-per-task=v100:2 -Ansc --reservation=gpu --pty -E /bin/bash -l > [jonst@n2017 ~]$ scontrol show job $SLURM_JOBID > JobId=1075339 JobName=bash > UserId=jonst(1041) GroupId=jonst(1041) MCS_label=N/A > Priority=1000164420 Nice=0 Account=nsc QOS=nsc > JobState=RUNNING Reason=None Dependency=(null) > Requeue=0 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 > RunTime=00:00:53 TimeLimit=00:10:00 TimeMin=N/A > SubmitTime=2020-11-02T17:05:42 EligibleTime=2020-11-02T17:05:42 > AccrueTime=Unknown > StartTime=2020-11-02T17:05:42 EndTime=2020-11-02T17:15:46 Deadline=N/A > PreemptEligibleTime=2020-11-02T17:05:42 PreemptTime=None > SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-11-02T17:05:42 > Partition=sigma AllocNode:Sid=sign:215598 > ReqNodeList=(null) ExcNodeList=(null) > NodeList=n2017 > BatchHost=n2017 > NumNodes=1 NumCPUs=18 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* > TRES=cpu=18,mem=52272M,node=1,billing=18 > Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=* > MinCPUsNode=1 MinMemoryCPU=2904M MinTmpDiskNode=0 > Features=(null) DelayBoot=00:00:00 > Reservation=gpu > OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) > Command=/bin/bash > WorkDir=/home/jonst > Power= > TresPerTask=gpu:v100:2 > MailUser=(null) MailType=NONE nvidia-smi shows 2 gpus. > [jonst@n2017 ~]$ nvidia-smi > Mon Nov 2 17:09:55 2020 > +-----------------------------------------------------------------------------+ > | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | > |-------------------------------+----------------------+----------------------+ > | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | > | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | > |===============================+======================+======================| > | 0 Tesla V100-SXM2... On | 00000000:61:00.0 Off | 0 | > | N/A 40C P0 41W / 300W | 0MiB / 32510MiB | 0% Default | > +-------------------------------+----------------------+----------------------+ > | 1 Tesla V100-SXM2... On | 00000000:62:00.0 Off | 0 | > | N/A 40C P0 41W / 300W | 0MiB / 32510MiB | 0% Default | > +-------------------------------+----------------------+----------------------+ > > +-----------------------------------------------------------------------------+ > | Processes: GPU Memory | > | GPU PID Type Process name Usage | > |=============================================================================| > | No running processes found | > +-----------------------------------------------------------------------------+ Job has been allocted 4? > [jonst@n2017 ~]$ sacct -j $SLURM_JOBID --format=JobID,Start,END,ReqGRES%20,ReqTRES%40,AllocGRES,AllocTRES%40 > JobID Start End ReqGRES ReqTRES AllocGRES AllocTRES > ------------ ------------------- ------------------- -------------------- ---------------------------------------- ------------ ---------------------------------------- > 1075339 2020-11-02T17:05:42 Unknown PER_TASK:gpu:v100:2 billing=1,cpu=1,mem=2904M,node=1 gpu:4 billing=18,cpu=18,mem=52272M,node=1 > 1075339.ext+ 2020-11-02T17:05:42 Unknown PER_TASK:gpu:v100:2 gpu:4 billing=18,cpu=18,mem=52272M,node=1 > 1075339.0 2020-11-02T17:05:46 Unknown PER_TASK:gpu:v100:2 gpu:4 cpu=1,mem=0,node=1 Trying to start a second job. (Using -w to place it on the same node) > [jonst@sign ~]$ srun -n1 -t10 --gpus-per-task=v100:2 -Ansc --reservation=gpu --pty -E -w n2017 /bin/bash -l > srun: job 1075340 queued and waiting for resources Job gets stuck pending but starts as soon as the first job exits. > [jonst@n2017 ~]$ squeue -u jonst > JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) > 1075340 sigma bash jonst PD 0:00 1 (Resources) > 1075339 sigma bash jonst R 3:55 1 n2017 Jonas, Could you please attach your configuration files? I took a look at other bugs, but I can't find a config with sigma partition. Is this for "tetralith" or other machine? Does it look like reservation releated or happens on the empty node as well? Could you please share the command used to create the reservation? cheers, Marcin Created attachment 16471 [details]
slurm.conf for sigma
This is what the reservation looks like.
> ReservationName=gpu StartTime=2020-09-14T12:54:44 EndTime=2030-07-24T12:54:44 Duration=3600-00:00:00
> Nodes=n[2017-2018] NodeCnt=2 CoreCnt=72 Features=(null) PartitionName=(null) Flags=SPEC_NODES
> TRES=cpu=72
> Users=(null) Accounts=nsc,liu-gpu-2020-1,liu-gpu-2020-2,liu-gpu-2020-3 Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a
> MaxStartDelay=(null)
We are probably misusing reservations a bit. The gpu-nodes should probably have been in their own partition but the way we create users only support one partition per cluster (at the moment).
Jonas, I think I reproduced the issue and it looks like it's related to --cpus-per-gpu being set from slurm.conf default. This may be a duplicate of a Bug 9947 where we already have a patch in QA process, are you able to apply it and verify if it will fix the issue for you?(attachment 16578 [details]) Alternatively, Can you add direct --cpus-per-gpu=9 to your srun calls and check if it works correctly? cheers, Marcin (In reply to Marcin Stolarek from comment #6) > Jonas, > > I think I reproduced the issue and it looks like it's related to > --cpus-per-gpu being set from slurm.conf default. This may be a duplicate of > a Bug 9947 where we already have a patch in QA process, are you able to > apply it and verify if it will fix the issue for you?(attachment 16578 [details] > [details]) > > Alternatively, Can you add direct --cpus-per-gpu=9 to your srun calls and > check if it works correctly? > > cheers, > Marcin My colleague tested and it seems like this was the problem. I saw that there was a patch for DefCpuPerGPU on the 20.02 branch, that is the one that would fix this? https://github.com/SchedMD/slurm/commit/0b6faf691c6fb5445fdb01c74daf81ecb87e05db Yes - the commit you're asking for is exectly the same I shared in comment 6 attachment. You should be able to apply it manually or just wait and upgrade - it will be part of Slurm 20.02.7 release. I'm marking the case as duplicate of original Bug 9947 now. cheers, Marcin *** This ticket has been marked as a duplicate of ticket 9947 *** |