Ticket 5150

Summary:	Set Group GPUs limits for a group/account
Product:	Slurm	Reporter:	Damien <damien.leong>
Component:	Limits	Assignee:	Marshall Garey <marshall>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	4 - Minor Issue
Priority:	---
Version:	17.11.5
Hardware:	Linux
OS:	Linux
Site:	Monash University	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description Damien 2018-05-09 02:48:35 MDT

Hi Slurm Support

We are trying to set Group GPUs limits for a particular group/account. 


I know that I can set QOS limits on GPUs
--
sacctmgr modify QOS foo set MaxTRESPerUser=gres/gpu=4
--
We have already set this as a cluster-wide QOS requirement.



For a particular group, I am trying to limit them more, but I am trying to do this on their account directly , Not another QOS, like 

sacctmgr update Account Boo44 set GrpTRES=gres/gpu=8 


Is this possible ? if not what is correct syntax ? or a proper method to do this ?



Kindly advise. Thanks.


Cheers

Damien

Comment 1 Marshall Garey 2018-05-09 11:00:29 MDT

(In reply to Damien from comment #0)
> For a particular group, I am trying to limit them more, but I am trying to
> do this on their account directly , Not another QOS, like 
> 
> sacctmgr update Account Boo44 set GrpTRES=gres/gpu=8 
> 
> 
> Is this possible ? if not what is correct syntax ? or a proper method to do
> this ?

Yes, that's the correct way to do it.

Comment 2 Damien 2018-05-09 17:53:28 MDT

Thanks, Yes. this is working.

Comment 3 Marshall Garey 2018-05-09 17:54:36 MDT

That's good to hear. I'm closing this ticket as resolved/infogiven.

Comment 4 Damien 2018-05-09 23:50:53 MDT

Hi Marshall

Actually this is not working as what we expected....

--
sacctmgr update Account Boo44 set GrpTRES=gres/gpu=8 
--
This command and feature works, it is really blocking users from Boo44 from getting more than 8 GPUs.... BUT the same set of users are asking for additional new CPU jobs only, and this is blocking them because they already used 8 GPUs.



Which by SLURM documentation is correct:
-
GrpTRES= The total count of TRES able to be used at any given time from jobs running from an association and its children or QOS. If this limit is reached new jobs will be queued but only allowed to run after resources have been relinquished from this group.
-


So let me rephrase this question, Is there a way or method where I can block a specify group the number of GPU used (8 GPUs), but not block they when they ask additional jobs that does need GPU ? (For example, CPUs job)



Please let me know if you need clarification on this. Thanks.

Cheers

Damien

Comment 5 Marshall Garey 2018-05-10 10:06:39 MDT

(In reply to Damien from comment #4)
> --
> sacctmgr update Account Boo44 set GrpTRES=gres/gpu=8 
> --
> This command and feature works, it is really blocking users from Boo44 from
> getting more than 8 GPUs.... BUT the same set of users are asking for
> additional new CPU jobs only, and this is blocking them because they already
> used 8 GPUs.
> 
> 
> Which by SLURM documentation is correct:
> -
> GrpTRES= The total count of TRES able to be used at any given time from jobs
> running from an association and its children or QOS. If this limit is
> reached new jobs will be queued but only allowed to run after resources have
> been relinquished from this group.
> -

This is only for the TRES you set a limit on - it doesn't limit any other TRES.

I have defined a limit of 4 gres/gpu on my account test.

$ sacctmgr show assoc where account=test  format=account,user,grptres
   Account       User       GrpTRES 
---------- ---------- ------------- 
      test               gres/gpu=4 
      test   marshall

I request 4 gpus in a job and hit the limit, so the second job pends:

$ srun --gres=gpu:4 sleep 789&
$ srun --gres=gpu:4 sleep 789&
srun: job 538119 queued and waiting for resources

marshall@voyager:~/slurm/17.11/byu$ squeue 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            538119     debug    sleep marshall PD       0:00      1 (AssocGrpGRES)
            538118     debug    sleep marshall  R       0:04      1 v1


But I can still run non-GPU jobs just fine:

$ srun sleep 78&
$ squeue 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            538119     debug    sleep marshall PD       0:00      1 (AssocGrpGRES)
            538121     debug    sleep marshall  R       0:01      1 v1
            538118     debug    sleep marshall  R       1:37      1 v1

So something else is going on here. Can you upload an example job submission that pends, as well as the output of squeue and scontrol show job <jobid> of the job that is pending?

Comment 6 Damien 2018-05-11 21:37:02 MDT

Hi

Thanks for this.

In additional, We have 3 different types of GPU within our cluster, Using this mechanism, Can we lock down even further ?

For example:

We have   K10, K20, K80

--

sacctmgr update Account Boo44 set GrpTRES=gres/gpu:K10=2,gres/gpu:K20=2,gres/gpu:K80=2

--

And if this is possible and logical, the above is using and operator, How can we use 'or' instead ?




Kindly advise. Thanks.

Cheers


Damien

Comment 7 Marshall Garey 2018-05-14 10:32:30 MDT

(In reply to Damien from comment #6)
> In additional, We have 3 different types of GPU within our cluster, Using
> this mechanism, Can we lock down even further ?
> 
> For example:
> 
> We have   K10, K20, K80
> 
> --
> 
> sacctmgr update Account Boo44 set
> GrpTRES=gres/gpu:K10=2,gres/gpu:K20=2,gres/gpu:K80=2

This is kind of possible, but with the caveat that if a user requests a generic gres/gpu (for example, srun --gres=gpu:4 <job>), they can exceed the limit on the specific types of gres.

See bug 4767 and commit c2c06468, which is now live on our website:

https://slurm.schedmd.com/resource_limits.html

That commit specifically talks about QOS limits, but it also applies to association limits. I will update the documentation to clarify this.

I recommend using the suggested approach in the resource_limits page - that is, use a job submit plugin to force the user to always request specific GPUs.

> And if this is possible and logical, the above is using and operator, How
> can we use 'or' instead ?

There isn't a way to enforce one limit or the other - it will enforce all limits.

Comment 12 Marshall Garey 2018-05-18 14:26:48 MDT

Damien,

We've updated the documentation to clarify that this isn't just a limitation on QOS limits. See commit 1e1cd45ee86c45c4c.

Is there anything else we can help you with for this ticket?

- Marshall

Comment 13 Damien 2018-05-18 23:37:38 MDT

Hi 

We are still to keen to explore TRES limits on a selected account:

--

sacctmgr update Account Boo44 set GrpTRES=gres/gpu:K10=2

sacctmgr update Account Boo44 set GrpTRES=gres/gpu:K20=2

sacctmgr update Account Boo44 set GrpTRES=gres/gpu:K30=10

--

We hope to implement this concurrently, Does this makes sense and is this logical ?



Cheers

Damien

Comment 14 Damien 2018-05-20 20:19:53 MDT

Hi 

My Testings for this does not work:
---

sacctmgr update Account boo6 set GrpTRES=gres/gpu:K80=3
 Unknown option: GrpTRES=gres/gpu:K80=3
 Use keyword 'where' to modify condition

---


If I choose just the generic gpu limits, it works:
---
sacctmgr update Account boo6 set GrpTRES=gres/gpu=7
 Modified account associations...
  C = m3         A = boo6 of p001
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
---

So the command does not allow me to restrict by GPU-types ? Is this correct ?


Kindly advise. Thanks.

Cheers

Damien

Comment 15 Marshall Garey 2018-05-21 10:39:48 MDT

(In reply to Damien from comment #14)
> My Testings for this does not work:
> ---
> 
> sacctmgr update Account boo6 set GrpTRES=gres/gpu:K80=3
>  Unknown option: GrpTRES=gres/gpu:K80=3
>  Use keyword 'where' to modify condition

What's the output of the following?

sacctmgr show tres

I suspect it doesn't include the specific types of tres (gpu:K80) but includes the generic gres/gpu. You have to put the specific types of gres in AccountingStorageTRES in slurm.conf to make that work:

AccountingStorageTRES=gres/gpu,gres/gpu:K10,gres/gpu:K20,...

then it should show up in sacctmgr show tres and you should be able to set the limit.


(In reply to Damien from comment #13)
> We are still to keen to explore TRES limits on a selected account:
> 
> --
> 
> sacctmgr update Account Boo44 set GrpTRES=gres/gpu:K10=2
> 
> sacctmgr update Account Boo44 set GrpTRES=gres/gpu:K20=2
> 
> sacctmgr update Account Boo44 set GrpTRES=gres/gpu:K30=10
> 
> --
> 
> We hope to implement this concurrently

This is fine. Just make sure you understand the limitation and workaround mentioned at the bottom of the resource limits page (and in comment 7):

https://slurm.schedmd.com/resource_limits.html

To sum it up again:

The limitation is that jobs that request generic gpus will be able to exceed the limit imposed on specific gpus. The recommended workaround is to use a job submit plugin that enforces the policy that all jobs must specify the type of gpu.

Comment 16 Damien 2018-05-21 17:12:46 MDT

Hi Marshall

These are our tres


sacctmgr: 
sacctmgr: show tres
    Type            Name     ID 
-------- --------------- ------ 
     cpu                      1 
     mem                      2 
  energy                      3 
    node                      4 
 billing                      5 
    gres             gpu   1001 



scontrol show config |grep AccountingStorageTRES
AccountingStorageTRES   = cpu,mem,energy,node,billing,gres/gpu



We need to do another config changes.


Cheers

Damien

Comment 17 Marshall Garey 2018-05-21 17:23:11 MDT

Yes, that's what I thought. Go ahead and add the additional types of GPUs in AccountingStorageTRES. You can just use scontrol reconfigure to propagate that change to the cluster - you don't have to restart the controller.

Then you should see the new TRES by using sacctmgr show tres and should be able to create the limits.

Let us know if you have any additional questions on that, or if you're able to do it successfully.

Comment 18 Damien 2018-05-22 19:01:36 MDT

Hi Marshall

We wanted to switch to fairtree priority as mentioned in the previous notes, but I don't see it under  '/opt/slurm-17.11.4/lib/slurm' , Do it need a separate so file or this is built-in  ?


Please advise. Thanks.

Cheers

Damien

Comment 19 Marshall Garey 2018-05-23 09:45:22 MDT

(In reply to Damien from comment #18)
> We wanted to switch to fairtree priority as mentioned in the previous notes,
> but I don't see it under  '/opt/slurm-17.11.4/lib/slurm' , Do it need a
> separate so file or this is built-in  ?

I'm guess the "previous notes" you mention are in a different ticket, perhaps bug 5176? Can you bring this up in a separate ticket (perhaps 5176)? I'd like to keep this bug focused on GPU limits.

If you have no further questions about GPU limits, I'd like to close this ticket.

Comment 20 Damien 2018-05-27 19:42:58 MDT

Hi Marshall

Sorry, I am confused with the number of questions that we have asked.


Cheers

Damien

Comment 21 Marshall Garey 2018-05-28 16:32:16 MDT

No worries.

Closing as resolved/infogiven.