| Summary: | odd resource allocation issue | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Todd Merritt <tmerritt> |
| Component: | Limits | Assignee: | Scott Hilton <scott> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 21.08.7 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | U of AZ | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: | Slurm config | ||
|
Description
Todd Merritt
2022-04-28 14:50:25 MDT
Do the slurmd's report a different GRES when you run the following?
> $ slurmd -G
How are your currently managing configurations throughout your cluster (NFS/configless etc...)?
Hi Jason, We're running in a configless setup. Running that command on one of the nodes I get no output [root@i0n1 ~]# slurmd -G [root@i0n1 ~]# Thanks, Todd ________________________________ From: bugs@schedmd.com <bugs@schedmd.com> Sent: Thursday, April 28, 2022 2:04 PM To: Merritt, Todd R - (tmerritt) <tmerritt@arizona.edu> Subject: [EXT][Bug 13961] odd resource allocation issue External Email Comment # 1<https://bugs.schedmd.com/show_bug.cgi?id=13961#c1> on bug 13961<https://bugs.schedmd.com/show_bug.cgi?id=13961> from Jason Booth<mailto:jbooth@schedmd.com> Do the slurmd's report a different GRES when you run the following? > $ slurmd -G How are your currently managing configurations throughout your cluster (NFS/configless etc...)? ________________________________ You are receiving this mail because: * You reported the bug. Can I get the output of:
>sacct -Pl -j 690774
Can I also get your slurm.conf?
Sure thing. # sacct -Pl -j 690774 JobID|JobIDRaw|JobName|Partition|MaxVMSize|MaxVMSizeNode|MaxVMSizeTask|AveVMSize|MaxRSS|MaxRSSNode|MaxRSSTask|AveRSS|MaxPages|MaxPagesNode|MaxPagesTask|AvePages|MinCPU|MinCPUNode|MinCPUTask|AveCPU|NTasks|AllocCPUS|Elapsed|State|ExitCode|AveCPUFreq|ReqCPUFreqMin|ReqCPUFreqMax|ReqCPUFreqGov|ReqMem|ConsumedEnergy|MaxDiskRead|MaxDiskReadNode|MaxDiskReadTask|AveDiskRead|MaxDiskWrite|MaxDiskWriteNode|MaxDiskWriteTask|AveDiskWrite|ReqTRES|AllocTRES|TRESUsageInAve|TRESUsageInMax|TRESUsageInMaxNode|TRESUsageInMaxTask|TRESUsageInMin|TRESUsageInMinNode|TRESUsageInMinTask|TRESUsageInTot|TRESUsageOutMax|TRESUsageOutMaxNode|TRESUsageOutMaxTask|TRESUsageOutAve|TRESUsageOutTot 690774|690774|run_wav|standard||||||||||||||||||700|1-22:45:05|COMPLETED|0:0||Unknown|Unknown|Unknown|4200G|0|||||||||billing=700,cpu=700,gres/gpu:kepler=25,mem=4200G,node=25|billing=700,cpu=700,gres/gpu:pascal=25,mem=4200G,node=25||||||||||||| 690774.batch|690774.batch|batch||313487760K|i16n0|0|313487760K|28671772K|i16n0|0|28671772K|304|i16n0|0|304|53-15:00:06|i16n0|0|53-15:00:06|1|28|1-22:45:05|COMPLETED|0:0|11K|0|0|0||0|754.96M|i16n0|0|754.96M|194632.63M|i16n0|0|194632.63M||cpu=28,gres/gpu:pascal=1,mem=168G,node=1|cpu=53-15:00:06,energy=0,fs/disk=791635013,mem=28671772K,pages=304,vmem=313487760K|cpu=53-15:00:06,energy=0,fs/disk=791635013,mem=28671772K,pages=304,vmem=313487760K|cpu=i16n0,energy=i16n0,fs/disk=i16n0,mem=i16n0,pages=i16n0,vmem=i16n0|cpu=0,fs/disk=0,mem=0,pages=0,vmem=0|cpu=53-15:00:06,energy=0,fs/disk=791635013,mem=28671772K,pages=304,vmem=313487760K|cpu=i16n0,energy=i16n0,fs/disk=i16n0,mem=i16n0,pages=i16n0,vmem=i16n0|cpu=0,fs/disk=0,mem=0,pages=0,vmem=0|cpu=53-15:00:06,energy=0,fs/disk=791635013,mem=28671772K,pages=304,vmem=313487760K|energy=0,fs/disk=204087109074|energy=i16n0,fs/disk=i16n0|fs/disk=0|energy=0,fs/disk=204087109074|energy=0,fs/disk=204087109074 690774.extern|690774.extern|extern||146524K|i16n20|20|146524K|1036K|i16n15|15|999915|0|i16n20|20|0|00:00:00|i16n20|20|00:00:00|25|700|1-22:45:10|COMPLETED|0:0|59.83G|0|0|0||0|0.00M|i16n20|20|0.00M|0|i16n20|20|0||billing=700,cpu=700,gres/gpu:pascal=25,mem=4200G,node=25|cpu=00:00:00,energy=0,fs/disk=2012,mem=999915,pages=0,vmem=146524K|cpu=00:00:00,energy=0,fs/disk=2012,mem=1036K,pages=0,vmem=146524K|cpu=i16n20,energy=i16n20,fs/disk=i16n20,mem=i16n15,pages=i16n20,vmem=i16n20|cpu=20,fs/disk=20,mem=15,pages=20,vmem=20|cpu=00:00:00,energy=0,fs/disk=2012,mem=836K,pages=0,vmem=146524K|cpu=i16n20,energy=i16n20,fs/disk=i16n20,mem=i16n20,pages=i16n20,vmem=i16n20|cpu=20,fs/disk=20,mem=20,pages=20,vmem=20|cpu=00:00:00,energy=0,fs/disk=50300,mem=24412K,pages=0,vmem=3663100K|energy=0,fs/disk=0|energy=i16n20,fs/disk=i16n20|fs/disk=20|energy=0,fs/disk=0|energy=0,fs/disk=0 690774.0|690774.0|orted||313404468K|i16n14|13|319716945578|29211664K|i16n17|16|28029274K|349|i16n8|7|240|53-10:19:29|i16n23|22|53-17:25:06|24|672|1-22:44:59|COMPLETED|0:0|144K|Unknown|Unknown|Unknown||0|5.26M|i16n3|2|5.25M|23535.27M|i16n17|16|5888.86M||cpu=672,gres/gpu:pascal=24,mem=4032G,node=24|cpu=53-17:25:06,energy=0,fs/disk=5500585,mem=28029274K,pages=240,vmem=319716945578|cpu=53-22:56:25,energy=0,fs/disk=5518298,mem=29211664K,pages=349,vmem=313404468K|cpu=i16n3,energy=i16n15,fs/disk=i16n3,mem=i16n17,pages=i16n8,vmem=i16n14|cpu=2,fs/disk=2,mem=16,pages=7,vmem=13|cpu=53-10:19:29,energy=0,fs/disk=5422759,mem=26884524K,pages=130,vmem=311087400K|cpu=i16n23,energy=i16n15,fs/disk=i16n23,mem=i18n0,pages=i16n16,vmem=i18n0|cpu=22,fs/disk=22,mem=23,pages=15,vmem=23|cpu=1289-10:02:34,energy=0,fs/disk=132014055,mem=672702576K,pages=5776,vmem=7493365912K|energy=0,fs/disk=24678517825|energy=i16n15,fs/disk=i16n17|fs/disk=16|energy=0,fs/disk=6174912701|energy=0,fs/disk=148197904828 Created attachment 24758 [details]
Slurm config
Todd, Are you able to reproduce this? Do you know what the submitting line looked like or sbatch options were used? Is the job or a job like it still in the system? If so send me this: scontrol show job <jobid> -Scott Hi Scott, yes, this is reproducible #!/bin/bash #SBATCH --account=hpcteam #SBATCH --partition=standard #SBATCH --time=01:00:00 #SBATCH --gres=gpu:1 #SBATCH --nodes=20 #SBATCH --ntasks=20 wait 100 (ocelote) tmerritt@junonia:~/ocelote-test $ scontrol show job 721123 JobId=721123 JobName=gpu-fail.sh UserId=tmerritt(7862) GroupId=hpcteam(30001) MCS_label=N/A Priority=5100 Nice=0 Account=hpcteam QOS=part_qos_standard JobState=PENDING Reason=Priority Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=01:00:00 TimeMin=N/A SubmitTime=2022-05-02T10:25:50 EligibleTime=2022-05-02T10:25:50 AccrueTime=2022-05-02T10:25:50 StartTime=2022-05-02T19:41:08 EndTime=2022-05-02T20:41:08 Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-05-02T11:34:24 Scheduler=Main Partition=standard AllocNode:Sid=junonia:12185 ReqNodeList=(null) ExcNodeList=(null) NodeList=(null) SchedNodeList=i16n[0,2-10,12,16,19,21,23],i18n[10,18-20,23] NumNodes=20-20 NumCPUs=20 NumTasks=20 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=20,mem=120G,node=20,billing=20,gres/gpu:kepler=20 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryCPU=6G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=/home/u11/tmerritt/ocelote-test/gpu-fail.sh WorkDir=/home/u11/tmerritt/ocelote-test StdErr=/home/u11/tmerritt/ocelote-test/slurm-721123.out StdIn=/dev/null StdOut=/home/u11/tmerritt/ocelote-test/slurm-721123.out Power= TresPerNode=gres:gpu:1 The job hasn't started yet though due to some other large jobs using all of our GPUs. Todd,
The requested Tres says gpu:kepler despite the fact that under the hood, slurm is just requesting any type of gpu. Because non-typed gpus don'tt exist in AccountingStorageTRES, it defaults to the first gpu type alphabetically (kepler). This is so you can see some gpus were requested.
To avoid this ambiguity you could add gres/gpu to AccountingStorageTRES:
>AccountingStorageTRES=gres/gpu:volta,gres/gpu:pascal,gres/gpu:kepler,gres/gpu
Or if you only wanted it to show when no type was specified you could do something like gres/gpu:anytype. This would show because it starts with 'a'.
The actual issue of the limits being ignored looks like a bug to me. I am able to reproduce it and will look into the issue.
-Scott
Thanks Scott, I was reviewing the documentation after opening this ticket up and saw that I was missing the generic gpu resource but thought I'd wait to resolve the ticket before fiddling with the configuration. I'll go make that update and it will probably take care of this in the short term. Thanks! Todd, The limits issue is due to a design limitation. See: https://slurm.schedmd.com/resource_limits.html#gres_limits This is specific to limits with gres subtypes. Setting then limit to gres/gpu=10 rather than gres/gpu:pascal=10 would work just fine. -Scott Thanks Scott, I guess that should be fine now. It would have been problematic for us in the past because we had PIs buy in with a particular type of GPU and we wanted to make sure we were keeping users on the right type of GPU. Sounds like it would be a difficult thing to change but I just wanted to point out there is a legitimate use case for being able to apply the limit more granularly. I'll go ahead and update our limits though to apply the restriction at the parent type. Todd ________________________________ From: bugs@schedmd.com <bugs@schedmd.com> Sent: Thursday, May 5, 2022 2:58 PM To: Merritt, Todd R - (tmerritt) <tmerritt@arizona.edu> Subject: [EXT][Bug 13961] odd resource allocation issue External Email Comment # 11<https://bugs.schedmd.com/show_bug.cgi?id=13961#c11> on bug 13961<https://bugs.schedmd.com/show_bug.cgi?id=13961> from Scott Hilton<mailto:scott@schedmd.com> Todd, The limits issue is due to a design limitation. See: https://slurm.schedmd.com/resource_limits.html#gres_limits This is specific to limits with gres subtypes. Setting then limit to gres/gpu=10 rather than gres/gpu:pascal=10 would work just fine. -Scott ________________________________ You are receiving this mail because: * You reported the bug. Glad I could help. Thanks for letting me know about your setup. If you have more questions about this issue in the future, let us know. -Scott |