| Summary: | CPU jobs on GPU nodes | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Wei Feinstein <wfeinstein> |
| Component: | Heterogeneous Jobs | Assignee: | Ben Roberts <ben> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 22.05.6 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | LBNL - Lawrence Berkeley National Laboratory | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Wei Feinstein
2023-08-09 14:42:17 MDT
Do all nodes in the partition "es1" have GPUs or only a select few? Hi Jason, Only GPU nodes in the Es1 partition, including v100, A40... Thanks, Wei Hi Wei,
It sounds like you should be able to do what you want with the MaxTRESPerNode setting. If I understand correctly you are asking for a way to prevent users from requesting an entire node, but only want them to use some of the CPUs when they are going to use a GPU. If that's right then you can define a maximum number of CPUs per node that a job can request. This is something you can set on a QOS.
Here's a quick example that only allows up to 12 CPUs per node to be used in my GPU partition. I configured the 'gpu' partition to also use the 'gpu' QOS.
PartitionName=gpu Default=NO Nodes=node[07-08] MaxTime=5:00:00 State=UP QOS=gpu
I set the MaxTRESPerNode to 12 CPUs.
$ sacctmgr show qos gpu format=name,maxtrespernode%20
Name MaxTRESPerNode
---------- --------------------
gpu cpu=12
When I run a job that requests 12 CPUs it runs fine, but when I request exclusive access to a node the job doesn't run.
$ srun -pgpu --ntasks 1 --cpus-per-task=12 --gpus-per-task=1 hostname
kitt
$ srun -pgpu -N1 --exclusive hostname
srun: job 9176 queued and waiting for resources
You can see that the Reason for the job is QOSMaxCpuPerNode.
$ scontrol show jobs 9176 | grep Reason
JobState=PENDING Reason=QOSMaxCpuPerNode Dependency=(null)
Let me know if this looks like what you're looking for or if I'm not quite getting your use case.
Thanks,
Ben
Hi Ben, Thank you for the suggestion and sorry for the late response. This suggestion will sort of work. Below is the GPU nodes in slurm.conf. [root@perceus-00 ~]# grep es1 /etc/slurm/slurm.conf ##es1 Nodes ## NodeName=n00[00-11].es[1] NodeAddr=10.0.43.[0-11] CPUS=8 Sockets=2 CoresPerSocket=4 Feature=es1_1080ti,es1 Weight=1 Gres=gpu:GTX1080TI:4 RealMemory=64318 NodeName=n00[12-13].es[1] NodeAddr=10.0.44.[12-13] CPUS=8 Sockets=2 CoresPerSocket=4 Feature=es1_v100,es1 Weight=3 Gres=gpu:V100:2 RealMemory=64318 NodeName=n00[14-23].es[1] NodeAddr=10.0.43.[14-23] CPUS=8 Sockets=2 CoresPerSocket=4 Feature=es1_v100,es1 Weight=4 Gres=gpu:V100:2 RealMemory=192094 NodeName=n00[24-31].es[1] NodeAddr=10.0.43.[24-31] CPUS=8 Sockets=2 CoresPerSocket=4 Feature=es1_2080ti,es1 Weight=1 Gres=gpu:GRTX2080TI:4 RealMemory=96236 NodeName=n00[32].es[1] NodeAddr=10.0.43.[32] CPUS=8 Sockets=2 CoresPerSocket=4 Feature=es1_v100,es1 Weight=4 Gres=gpu:V100:2 RealMemory=192086 NodeName=n00[33-34].es[1] NodeAddr=10.0.43.[33-34] CPUS=8 Sockets=2 CoresPerSocket=4 Feature=es1_2080ti,es1 Weight=1 Gres=gpu:GRTX2080TI:4 RealMemory=95228 NodeName=n00[35-38].es[1] NodeAddr=10.0.43.[35-38] CPUS=8 Sockets=2 CoresPerSocket=4 Feature=es1_2080ti,es1 Weight=2 Gres=gpu:GRTX2080TI:4 RealMemory=191996 NodeName=n00[39-40].es[1] NodeAddr=10.0.43.[39-40] CPUS=8 Sockets=2 CoresPerSocket=4 Feature=es1_2080ti,es1 Weight=1 Gres=gpu:GRTX2080TI:4 RealMemory=95228 NodeName=n0041.es[1] NodeAddr=10.0.43.41 CPUS=8 Sockets=2 CoresPerSocket=4 Feature=es1_2080ti,es1 Weight=2 Gres=gpu:GRTX2080TI:4 RealMemory=191996 NodeName=n00[42].es[1] NodeAddr=10.0.43.42 CPUS=8 Sockets=2 CoresPerSocket=4 Feature=es1_2080ti,es1 Weight=1 Gres=gpu:GRTX2080TI:4 RealMemory=95228 NodeName=n00[43-44].es[1] NodeAddr=10.0.43.[43-44] CPUS=16 Sockets=2 CoresPerSocket=8 Feature=es1_v100,es1,c16 Weight=5 Gres=gpu:V100:2 RealMemory=192093 NodeName=n00[00-05].es[1] NodeAddr=10.0.43.[0-5] CPUS=64 Sockets=1 CoresPerSocket=64 Feature=es1_a40,es1 Weight=6 Gres=gpu:A40:4 RealMemory=515865 NodeName=n00[45-52].es[1] NodeAddr=10.0.43.[45-52] CPUS=64 Sockets=1 CoresPerSocket=64 Feature=es1_a40,es1 Weight=6 Gres=gpu:A40:4 RealMemory=515865 PartitionName=es1 Nodes=n00[00-05,12-52].es[1] Oversubscribe=FORCE DefMemPerCPU=8000 LLN=Yes And es_normal|1000|00:00:00|es_lowprio||cluster|||1.000000|||||||node=64|||3-00:00:00|||||||cpu=2,gres/gpu=1| With --exclusive, a user can get an entire node without using any GPU cards. Yet given the heterogeneity of the es1 GPU partition, MaxTRESPerNode is difficult to set. Is there anyway to prevent CPU usage on these GPU nodes? Thank you, Wei Hi Wei,
If you want to exclude CPU-only jobs from running on this partition then there are a couple options that should work. There is an option to set a minimum number of generic resources that must be requested on a QOS. This will allow you to prevent jobs that don't request a GPU from running. Here's a quick example of how that might look.
$ sacctmgr show qos member format=name,mintresperjob
Name MinTRES
---------- -------------
member gres/gpu=1
$ sbatch -qmember -n12 --wrap='srun sleep 120'
Submitted batch job 9240
$ sbatch -qmember -n12 --gpus=1 --wrap='srun sleep 120'
Submitted batch job 9241
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
9240 debug wrap ben PD 0:00 1 (QOSMinGRES)
9241 debug wrap ben R 0:01 1 node01
In this example I just requested the QOS directly, but you could tie the QOS to the partition so that the limit is enforced on the partition, regardless of which QOS the user specifies.
https://slurm.schedmd.com/resource_limits.html#qos_mintresperjob
Another possible approach would be to create a submit filter that looks for jobs requesting this partition. If the job doesn't request a GPU you could add a GPU request to the job for them or reject the job with a message of your choice to make it clear why the job is rejected. You can read more about submit filters here:
https://slurm.schedmd.com/job_submit_plugins.html
Let me know if these don't sound like they would work.
Thanks,
Ben
Hi Ben, es_normal|1000|00:00:00|es_lowprio||cluster|||1.000000|||||||node=64|||3-00:00:00|||||||cpu=2,gres/gpu=1| As you can see, MinTRES is configured above, which works as expected until the flag --exclusive is used. I know slurm plugin is another way to check jobs at the submission stage, which has a lot more involved in terms of coding. We do have several plugins in place working with our account management portal. Thanks, Wei Hi Wei,
Ah, I didn't realize that you had MinTRES configured. It's true that if a user specifies --exclusive it will allow the job to run because the exclusive flag will cause it to allocate all the resources on the node (with the exception of memory) to the job. Since this includes a GPU then the job will be allowed to run. Here's an example of how this looks with the same QOS configuration I showed in my previous example.
$ sbatch -qmember -n12 --exclusive --wrap='srun sleep 30'
Submitted batch job 9243
$ scontrol show jobs 9243 | grep TRES
ReqTRES=cpu=12,mem=2400M,node=1,billing=13
AllocTRES=cpu=24,mem=4800M,node=1,billing=34,gres/gpu=4,gres/gpu:tesla=4
I don't have a working example of a submit filter that would reject a job that doesn't request a GPU, but here's an example of how you can look for one.
if (string.find(job_desc.tres_per_job, "gpu")) then
slurm.log_user("Matched gpu on job")
end
Let me know if you have any questions about this.
Thanks,
Ben
Hi Wei, I wanted to check in with you to see if you have any additional questions about this or if we can close the ticket. Thanks, Ben Thank you Ben. No problem, closing now. |