| Summary: | Requesting multiple GPUs via --gpus=2 rejected despite being valid request | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Trey Dockendorf <tdockendorf> |
| Component: | GPU | Assignee: | Director of Support <support> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | cinek, felip.moll, tdockendorf, troy |
| Version: | 20.02.4 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: |
https://bugs.schedmd.com/show_bug.cgi?id=9716 https://bugs.schedmd.com/show_bug.cgi?id=10569 https://bugs.schedmd.com/show_bug.cgi?id=10623 |
||
| Site: | Ohio State OSC | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | 20.11.3 | |
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: |
slurm.conf
gres.conf |
||
Created attachment 15540 [details]
gres.conf
Hi Trey, I'm able to reproduce this, it appears. It looks like the MaxNodes=1 is messing things up - when I submit to a partition without a MaxNodes limit, I can get a one-node job with 2 GPUs. I will try to get to the bottom of this and get back to you. Have you observed this behavior in 19.05? Thanks, -Michael We have no installed or tested 19.05 because our SLURM install is new and we started on 20.02. I'm able to reproduce this on 19.05. As a workaround, I noticed that if you replace --gpus=2 with --gres=gpu:2, it works as expected with MaxNodes=1. The difference between --gpus=2 and --gres=gpu:2 is that the first specifies 2 GPUs per job, while the second specifies 2 GPUs per node in the job. When the job is limited to 1 node, these are effectively the same. It looks like some edge case in the logic. Hopefully, this turns out to be an easy fix. (In reply to Trey Dockendorf from comment #0) > What I've noticed is if I submit to a partition with MinNodes=2 on > the same host with --gpus=2, the request is accepted but I am allocated 2 > nodes with 1 GPU per node rather than the expected 1 node with 2 GPUs. Because MinNodes=2, I would expect no fewer than 2 nodes. So this case appears to be working as expected. Switching to --gres=gpu:2 isn't really a good solution for us if this is a bug, I don't want to have retrain our thousands of users once the bug is fixed. We are still in the testing phases for our SLURM install and we begin letting early users test SLURM next week and then we go into production with SLURM on October 1st so ideally a patch would be sooner rather than later. Hi Trey, Thanks for the report. This has finally been fixed with commit ba353b8c13 and will make it into 20.11.3. See https://github.com/SchedMD/slurm/commit/ba353b8c13d0f523a84ef9fa522ff2980929e01c. I'll go ahead and close this out. Feel free to reopen if this does not fix things for you. Thanks! -Michael |
Created attachment 15539 [details] slurm.conf I am unable to submit a job with --gpus=2 to a partition that has MaxNodes=1. What I've noticed if I submit to partition with MinNodes=2 on the same host with --gpus=2 the request is accepted but I am allocated 2 nodes with 1 GPU per node rather than the expected 1 node with 2 GPUs. $ sbatch -w p0302 --gpus=2 -p gpuserial-48core hostname.sbatch sbatch: error: Batch job submission failed: Requested partition configuration not available now $ sbatch -w p0302 --gpus=1 -p gpuserial-48core hostname.sbatch Submitted batch job 14116 $ sbatch --gpus=2 -p gpuparallel-48core hostname.sbatch Submitted batch job 14117 $ scontrol show job=14117 JobId=14117 JobName=hostname.sbatch UserId=tdockendorf(20821) GroupId=PZS0708(5509) MCS_label=N/A Priority=100047874 Nice=0 Account=pzs0708 QOS=pitzer-all JobState=COMPLETED Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=00:05:00 TimeMin=N/A SubmitTime=2020-08-21T12:33:30 EligibleTime=2020-08-21T12:33:30 AccrueTime=2020-08-21T12:33:30 StartTime=2020-08-21T12:33:32 EndTime=2020-08-21T12:33:32 Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-08-21T12:33:32 Partition=gpuparallel-48core AllocNode:Sid=pitzer-rw01:128863 ReqNodeList=(null) ExcNodeList=(null) NodeList=p[0301-0302] BatchHost=p0301 NumNodes=2 NumCPUs=96 NumTasks=0 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=96,node=2,billing=96,gres/gpfs:ess=0,gres/gpfs:project=0,gres/gpfs:scratch=0,gres/gpu=4,gres/gpu:v100-32g=4,gres/ime=0,gres/pfsdir=0,gres/pfsdir:ess=0,gres/pfsdir:scratch=0 Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=* MinCPUsNode=1 MinMemoryCPU=4556M MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null) Command=/users/sysp/tdockendorf/slurm-tests/hostname.sbatch WorkDir=/users/sysp/tdockendorf/slurm-tests Comment=stdout=/users/sysp/tdockendorf/slurm-tests/output/hostname-14117.out StdErr=/users/sysp/tdockendorf/slurm-tests/output/hostname-14117.out StdIn=/dev/null StdOut=/users/sysp/tdockendorf/slurm-tests/output/hostname-14117.out Power= TresPerJob=gpu:2 MailUser=(null) MailType=NONE $ cat hostname.sbatch #!/bin/bash #SBATCH -t 00:05:00 #SBATCH -o output/hostname-%j.out env | sort echo "SLURM_NODELIST" echo $SLURM_NODELIST echo "hostname" hostname $ scontrol show partition=gpuserial-48core PartitionName=gpuserial-48core AllowGroups=ALL DenyAccounts=pcon0060,pcon0003,pcon0014,pcon0015,pcon0016,pcon0008,pcon0010,pcon0009,pcon0020,pcon0022,pcon0023,pcon0024,pcon0025,pcon0026,pcon0040,pcon0041,pcon0009,pcon0020,pcon0022,pcon0023,pcon0024,pcon0025,pcon0026,pcon0040,pcon0041,pcon0009,pcon0020,pcon0022,pcon0023,pcon0024,pcon0025,pcon0026,pcon0040,pcon0041 AllowQos=ALL AllocNodes=ALL Default=NO QoS=pitzer-gpuserial-partition DefaultTime=01:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=1 MaxTime=7-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=48 Nodes=p03[01-42] PriorityJobFactor=1000 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO OverTimeLimit=NONE PreemptMode=OFF State=UP TotalCPUs=2016 TotalNodes=42 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerNode=UNLIMITED MaxMemPerCPU=9293 $ scontrol show partition=gpuparallel-48core PartitionName=gpuparallel-48core AllowGroups=ALL DenyAccounts=pcon0060,pcon0003,pcon0014,pcon0015,pcon0016,pcon0008,pcon0010,pcon0009,pcon0020,pcon0022,pcon0023,pcon0024,pcon0025,pcon0026,pcon0040,pcon0041,pcon0009,pcon0020,pcon0022,pcon0023,pcon0024,pcon0025,pcon0026,pcon0040,pcon0041,pcon0009,pcon0020,pcon0022,pcon0023,pcon0024,pcon0025,pcon0026,pcon0040,pcon0041 AllowQos=ALL AllocNodes=ALL Default=NO QoS=pitzer-gpuparallel-partition DefaultTime=01:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=4-00:00:00 MinNodes=2 LLN=NO MaxCPUsPerNode=48 Nodes=p03[01-42] PriorityJobFactor=1000 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=EXCLUSIVE OverTimeLimit=NONE PreemptMode=OFF State=UP TotalCPUs=2016 TotalNodes=42 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerNode=UNLIMITED MaxMemPerCPU=9293 $ scontrol show node=p0302 NodeName=p0302 Arch=x86_64 CoresPerSocket=24 CPUAlloc=0 CPUTot=48 CPULoad=0.19 AvailableFeatures=48core,expansion,exp,r740,gpu,eth-pitzer-rack09h1,ib-i4l1s12,ib-i4,pitzer-rack08,v100-32g ActiveFeatures=48core,expansion,exp,r740,gpu,eth-pitzer-rack09h1,ib-i4l1s12,ib-i4,pitzer-rack08,v100-32g Gres=gpu:v100-32g:2(S:0-1),pfsdir:scratch:1,pfsdir:ess:1,ime:1,gpfs:project:1,gpfs:scratch:1,gpfs:ess:1 NodeAddr=10.4.8.2 NodeHostName=p0302 Version=20.02.4 OS=Linux 3.10.0-1062.18.1.el7.x86_64 #1 SMP Wed Feb 12 14:08:31 UTC 2020 RealMemory=371712 AllocMem=0 FreeMem=376018 Sockets=2 Boards=1 State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=4 Owner=N/A MCS_label=N/A Partitions=batch,gpubackfill-parallel,gpubackfill-serial,gpudebug,gpuparallel,gpuparallel-48core,gpuserial,gpuserial-48core,systems BootTime=2020-08-19T16:22:31 SlurmdStartTime=2020-08-19T16:23:51 CfgTRES=cpu=48,mem=363G,billing=48,gres/gpfs:ess=1,gres/gpfs:project=1,gres/gpfs:scratch=1,gres/gpu=2,gres/gpu:v100-32g=2,gres/ime=1,gres/pfsdir=2,gres/pfsdir:ess=1,gres/pfsdir:scratch=1 AllocTRES= CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s # sacctmgr show qos --parsable Name|Priority|GraceTime|Preempt|PreemptExemptTime|PreemptMode|Flags|UsageThres|UsageFactor|GrpTRES|GrpTRESMins|GrpTRESRunMins|GrpJobs|GrpSubmit|GrpWall|MaxTRES|MaxTRESPerNode|MaxTRESMins|MaxWall|MaxTRESPU|MaxJobsPU|MaxSubmitPU|MaxTRESPA|MaxJobsPA|MaxSubmitPA|MinTRES| pitzer-all|0|00:00:00|||cluster|||1.000000|cpu=29856||||||||||||1400||665||| pitzer-default|0|00:00:00|||cluster|||1.000000|||||||||||||1000|cpu=2040|384||| pitzer-override-tres|0|00:00:00|||cluster|||1.000000|||||||||||||1000||384||| pitzer-gpuserial-partition|0|00:00:00|||cluster|DenyOnLimit||1.000000|||||||gres/gpu=4||||||||||gres/gpu=1| pitzer-gpuparallel-partition|0|00:00:00|||cluster|DenyOnLimit||1.000000||||||||gres/gpu=4|||||||||gres/gpu=1| pitzer-hugemem-partition|0|00:00:00|||cluster|DenyOnLimit||1.000000|||||||||||||||||mem=754G| debug|0|00:00:00|||cluster|DenyOnLimit||1.000000||||||||||||1|||||| gpudebug|0|00:00:00|||cluster|DenyOnLimit||1.000000||||||||||||1|||||gres/gpu=1| pitzer-datta|0|00:00:00|||cluster|||1.000000|node=44||||||||||||||||| pitzer-largemem-partition|0|00:00:00|||cluster|DenyOnLimit||1.000000|||||||||||||||||mem=363G|