Ticket 13558 - MaxTRESRunMinsPerUser
Summary: MaxTRESRunMinsPerUser
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Limits (show other tickets)
Version: 21.08.5
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Jason Booth
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-03-03 13:31 MST by Rob Yelle
Modified: 2022-05-05 14:08 MDT (History)
0 users

See Also:
Site: Oregon State Univ
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Rob Yelle 2022-03-03 13:31:23 MST
Hello,

I am currently using GrpTRESRunMins to help contain groups using certain resources and also to help maximize the use of available resources (e.g. see ticket 10242).  However, while GrpTRESRunMins is helpful, it is not really adequate and I could really use a TRESRunMins implemented at the user level as described very well in ticket #8148 (esp. comment #10). This would be a very helpful feature to have - it could be used improve the flexibility and efficiency of resource usage - and I saw that some elements are already in place to implement MaxTRESRunMinsPerUser and MaxTRESRunMinsPerAccount (ticket #8148 comment #3). Are these features still on the roadmap, and if so what is the current time frame?

Thanks!

Rob
Comment 1 Jason Booth 2022-03-03 16:35:04 MST
Hi Rob

We do not have anything on the development roadmap in the foreseeable future that would cover your use case.  

As you already know, there is no option for TRESRunMins at the user association level.

> supported options 
> https://slurm.schedmd.com/sacctmgr.html#SECTION_SPECIFICATIONS-FOR-USERS

We do have a long-standing request that this fits into, and that is bug#5368. I am marking this as a duplicate of that request.

*** This ticket has been marked as a duplicate of ticket 5368 ***
Comment 3 Jason Booth 2022-04-20 12:49:07 MDT
Rob - my apologies for duplicating this issue out too quickly without catching more of the details mentioned in bug#5368. 

That bug is around TRESMins and not TRESRunMins.

> However, while GrpTRESRunMins is helpful, it is not really adequate and I could 
> really use a TRESRunMins implemented at the user level

GrpTRESRunMins can be set on individual associations

For example:

> $ sacctmgr modify user <USER >set GrpTRESRunMin=<TRES=LIMIT>


> it could be used improve the flexibility and efficiency of resource usage

It is not clear to me why you would want to set this on a user. Can you share with us your use case and what you are experiencing in your cluster? Normally we see these TRES limits at the QOS level and a user would then select the QOS depending on the service level they need.
Comment 4 Rob Yelle 2022-04-20 15:39:24 MDT
Hi Jason,

Thank you for your response.

Ideally I want to limit TRESRunMins using QOS applied to partitions, though having a TRESRunMins at the Assoc level would also be useful.  An example of where I would use them is limiting GPU usage - we have some a number of V100 GPUs available (Nvidia DGX-2), with several users wanting to request up to 8 or even 16 or more at a time.  But the demand for them is so high that I need to set a limit restricting the amount per user at say 8 at once (using MaxTRESPerUser) with a limited time window for job turnover so that others can have their turn. So that limits the scale of experiments and amount of jobs/experiments they can run. I would like to improve the flexibility by allowing up to 16 or even 32 at once per user for a short period of time for those who can take advantage of them, or for users to capitalize on when those resources are available (e.g. when the demand and load is lower than usual, users that are normally held in check by MaxTRESPU would be able to use more of the system for a short period, resulting in increased efficiency). Also, this would also allow me to increase the time limit for others who only need a few GPUs and would rather have longer run times than I am currently allowing.  I have GrpTRESRunMins working but that only applies the limit to groups and departments, which is still useful but I really need it for individual users (per partition) for best effect.  In graphical terms, the limits are the shape of a rectangle where the dimensions are time limit and TRES.  TRESRunMins per user would allow the rectangle dimensions to be changed for each user according to their needs.

I have considered applying GrpTRESRunMins per association - I believe that was suggested in a related ticket.  At present I have users assigned to departments, groups, classes, etc., and many instances where users are in multiple accounts. The accounts are useful for tracking but required for accessing certain partitions. It seems that in order for this to work properly, every user would need to be assigned to their own separate account.  This could work for some partitions, except that I have multiple heterogenous pools of CPU and GPU resources that are governed by different policies (e.g. because of different ownership), and applying this TRESRunMins limit on a global level rather than partition level would not be as helpful as a per-partition QOS limit. In addition, if I have to go with this approach, then it seems I would lose the ability to easily track TRES usage by department, class and research group, etc.  

Does this make sense? Perhaps I am missing something and maybe there is another creative way to go about this? Is GrpTRESRunMins sufficient for my use case, and if so, how can I apply this in such a way that I can limit TRESRunMins to individual users in a partition?  

Thanks,

Rob
Comment 5 Jason Booth 2022-04-25 11:25:37 MDT
Rob,

I took some time to think about what you are trying to accomplish here, and I have some ideas.


Slurm partitions support a QOS option. 

https://slurm.schedmd.com/slurm.conf.html#OPT_QOS

Used to extend the limits available to a QOS on a partition. Jobs will not be associated to this QOS outside of being associated to the partition. They will still be associated to their requested QOS. By default, no QOS is used. NOTE: If a limit is set in both the Partition's QOS and the Job's QOS the Partition QOS will be honored unless the Job's QOS has the OverPartQOS flag set in which the Job's QOS will have priority.


You could enable a limit for the partition's that need these types of limits. We recommend creating QOS'es that not shared with users and are only used for partitions.

Secondly, QOS'es also support a flag 


https://slurm.schedmd.com/sacctmgr.html#OPT_OverPartQOS

If set jobs using this QOS will be able to override any limits used by the requested partition's QOS limits.



In-order for this to work, the same limit needs to be defined on the QOS that will override the partition's QOS. For example:

Used for the partition:
> sacctmgr update qos limit set MaxTresPU=cpu=2

QOS used by users.
> sacctmgr update qos normal set MaxTresPU=cpu=18

A blank value will tell Slurm to enforce the partition limit.

You can go one step further and define a partition ACL and only allow certain QOS'es to override in a given partition "AllowQos".

https://slurm.schedmd.com/slurm.conf.html#OPT_AllowQos


As another way to approach this, Slurm does support partition associations, though that does increase the number of associations a cluster needs since an entry will be needed for every user + qos + account + partition. If you have 4k users, this can become unmanageable quickly.
Comment 6 Rob Yelle 2022-04-27 15:58:42 MDT
Hi Jason,

Thank you for the suggestions, I will give these a try and provide feedback.

Rob
Comment 7 Jason Booth 2022-05-05 14:08:51 MDT
> Thank you for the suggestions, I will give these a try and provide feedback.

Feel free to follow up if you have further questions. For now, I am resolving this out.