Hi, I'd like to apply/add/modify user resource throttling limits using GrpTresRunMins for all users that already exist in the Slurm database. I already did so for single users with the following command: sacctmgr -i modify user <username> set GrpTresRunMins=cpu=9600,mem=76800g This is to prevent every single user from allocating huge amounts of resources for very long times (exact numbers above are just examples). Now I wonder if I can apply this GrpTresRunMins limit (or maybe other limits as well) to all users with one single sacctmgr command or do I have to loop on all users and run this command for every single user? Is there is a better approach that I am not yet aware of? Is it possible to define some default user limits that automatically apply to all existing and new users. Thank you in advance. Best regards Jürgen
Hi Jürgen, > Is it possible to > define some default user limits that automatically apply to all existing > and new users. Yes, Slurm has the concept of default limits at cluster or at any account level. Let me try to explain it with some detail: First of all let me say that each cluster, account and user have also what we call "association". The associations are the key concept for Slurm for limits and accounting. Internally, the cluster/account/users are basically only a list of names and ids (plus a little bit of extra info, but not limits or accounting). But then Slurm has an internal table of associations where Slurm: - "associates" that user/account/cluster to limits - locate that association into a hierarchy of associations, where all users have an account assoc as parent, and accounts have a cluster or another account associations as parent. If you run "sacctmgr show assoc tree" you will be able to see your hierarchy of cluster, accounts and users. Note that each cluster has a "root" association that represents the cluster level, the top/root level of the hierarchy of the cluster. If you set limits to an association that is a parent of other associations, the general rule is that that limit become the default limit of all its children. If their children don't have that limit set, they have their parent's one as default. If you set the limit to the "root" association, then the limit is the default for all associations of the cluster. Note also that we have these couple of parameters on sacctmgr command to be able to see if a limit was set and inherited at parent level or not: WOPLimits Display information without hierarchical parent limits (i.e. will only display limits where they are set instead of propagating them from the parent). WOPInfo Display information without parent information (i.e. parent id, and parent account name). This option also implicitly sets the WOPLimits option. Some simple examples to see all the above in action: # The scenario is simple: 1 cluster, 1 account and 2 users on it (plus root user), without MaxJobs limit $ sacctmgr show association tree format=account,user,MaxJobs Account User MaxJobs -------------------- ---------- ------- root root root development development agil development bob # If we update only one of the users, the limit is applied only to it: $ sacctmgr update user bob set MaxJobs=30 $ sacctmgr show association tree format=account,user,MaxJobs Account User MaxJobs -------------------- ---------- ------- root root root development development agil development bob 30 # If we update the account, the limit is applied to all users as default. # The previous user with the limit set keeps its value. $ sacctmgr update account development set MaxJobs=40 $ sacctmgr show association tree format=account,user,MaxJobs Account User MaxJobs -------------------- ---------- ------- root root root development 40 development agil 40 development bob 30 # If we remove the limit at user level, all have the default: $ sacctmgr update user bob set MaxJobs=-1 $ sacctmgr show association tree format=account,user,MaxJobs Account User MaxJobs -------------------- ---------- ------- root root root development 40 development agil 40 development bob 40 # If we use WOPLimits we don't see the inherited limits of the parents: $ sacctmgr show association tree format=account,user,MaxJobs woplimits Account User MaxJobs -------------------- ---------- ------- root root root development 40 development agil development bob # Finally we can set limits at the top/cluster/root level, # and remove any account or user limits to make them all to use the cluster default: $ sacctmgr update account root set MaxJobs=50 $ sacctmgr update account development set MaxJobs=-1 $ sacctmgr show association tree format=account,user,MaxJobs Account User MaxJobs -------------------- ---------- ------- root 50 root root 50 development 50 development agil 50 development bob 50 $ sacctmgr show association tree format=account,user,MaxJobs woplimits Account User MaxJobs -------------------- ---------- ------- root 50 root root development development agil development bob This is a general rule to set limits using the hierarchy of associations, but now let's focus in your case: > sacctmgr -i modify user <username> set GrpTresRunMins=cpu=9600,mem=76800g > > This is to prevent every single user from allocating huge amounts of > resources for very long times (exact numbers above are just examples). Please note that all Grp* limits are a little bit special because they are "group limits". That means that they are not applied to each association individually and inherited as default (as I explained as the general rule above), but group limits are applied to to the aggregation of all usage/request from that association where they are set plus all their children. So, if you set GrpTresRunMins at Account level, that limit is "shared" for all the user into account. For example, if you set GrpTresRunMins=cpu=9600 at account level, if two users of that account use 4800min of CPU, that will prevent *any other user of the account* to run more minutes. If I understand correctly, that's not what you want, right? You want limits to be applied at each individual assoc/user. I think that what you are looking for is MaxTRESMins instead, that is a limit that can be set as default at account or cluster level and applied to all users by default. So, I guess that the command that you are looking for is: $ sacctmgr update account root set MaxTRESMins=cpu=9600,mem=76800g Am I correct? Regards, Albert
Hi Albert, first of all: Thank you so much for your kind answer and especially for the very instructive examples that you provided. That was very helpful. There is one thing that I still do not fully understand: We chose GrpTres*Run*Mins on purpose because this resource limit factors in the *remaining* time that the resources will be occupied by a user. We did that with the idea that it's ok for users to occupy larger amounts resources for a shorter time, but not so much for a longer time in the future. If we'd only consider CPU usage, this would count against sum(job_core_count * job_remaining_time) for all running jobs of a user. Whereas, as far as I understood, MaxTRESMins factors in the requested walltime of the jobs, i.e. this would count against sum(job_core_count * job_wall_time) for all running jobs of a user even if the jobs are all close to the end of their walltime. I guess, what I'm looking for is a MaxTresRunMins limit which I could not find in the documentation. That's why I used GrpTresRunMins on the user level although this admittedly somewhat undermines the Grp semantics. Best regards Jürgen
Hi Jürgen, You are basically right, but let me add some comments: > We chose GrpTres*Run*Mins on purpose because this resource limit factors in > the *remaining* time that the resources will be occupied by a user. We did that > with the idea that it's ok for users to occupy larger amounts resources for a > shorter time, but not so much for a longer time in the future. If we'd only consider > CPU usage, this would count against sum(job_core_count * job_remaining_time) > for all running jobs of a user. Correct. But don't forget that it takes into account the association children. Meaning that GrpTresRunMin would limit sum(job_core_count * job_remaining_time) for all running jobs of an association, and its children. > Whereas, as far as I understood, MaxTRESMins factors in the requested > walltime of the jobs, i.e. this would count against sum(job_core_count * job_wall_time) > for all running jobs of a user even if the jobs are all close to the end of > their walltime. Not totally correct. Actually it works at *per job* level. There is no sum() of all jobs, but it only looks for the submitted job and its cpu*time. So, it's even farther to what you are looking for. > I guess, what I'm looking for is a MaxTresRunMins limit which I could not > find in the documentation. That's why I used GrpTresRunMins on the user level > although this admittedly somewhat undermines the Grp semantics. You are right. What you are looking for is the current GrpTresRunMins BUT without taking into account the children (as you named: MaxTresRunMins), so you can set it at top level as a default and not an aggregated limit. This feature is not available, and the workarounds are: - Set GrpTresRunMins individually per user, as you were trying to avoid. - Use other similar limits to obtain a similar behavior, as I proposed. I'm sorry I don't have a perfect solution. But let me add some hope: The code already has some uncompleted code very close to what you are looking for, but for QOS: MaxTRESRunMinsPerAccount MaxTRESRunMinsPerUser typedef struct { char *max_tres_run_mins_pa; /* max number of tres minutes this * qos can having running at one * time per account, currently * this doesn't do anything. */ char *max_tres_run_mins_pu; /* max number of tres minutes this * qos can having running at one * time, currently this doesn't * do anything. */ } slurmdb_qos_rec_t; As you can see in the comments above and in the code, this fields of a QoS are not used/enforced (that's why they are not in the documentation). But probably we could enforce it in some enhancement. Are you interested in convert this bug into a request for enhancement to make these max_tres_run_mins_pu and/or max_tres_run_mins_pa a reality? Regards, Albert
Hi Jürgen, If this is Ok for you I'm closing the issue as infogiven. Please don't hesitate to reopen it if you need further support. Or even reopen+convert it into a 5-Enhancement if you want to request such enhancement of "MaxTresRunMins". Regards, Albert
Hi Albert, I have to apologize for my late response and thank you once again for your excellent support. This was very helpful for me. Yes, I'll convert this issue to an enhancement request as I think this might be a useful feature for others as well ... Best regards Jürgen
Hi Albert, actually, what I'd like to propose in this request for enhancement, is a non-Grp counterpart of GrpTRESRunMins as an association based
(Sorry for the previously broken post. Here it comes again.) Hi Albert, actually, what I'd like to propose in this request for enhancement, is a non-Grp counterpart of GrpTRESRunMins as an association based limit. Similar to other limits that have a Grp and non-Grp implementation like MaxWall and GrpWall, MaxJobs and GrpJobs, MaxTRES and GrpTRES, ... Best regards Jürgen
Jürgen - Albert brought this to my attention. Most if not all of our enhancement work is paid development time and is sponsored by the sites requested the feature. Is this something you are interested in sponsoring?
(In reply to Albert Gil from comment #3) > > I guess, what I'm looking for is a MaxTresRunMins limit which I could not > > find in the documentation. That's why I used GrpTresRunMins on the user level > > although this admittedly somewhat undermines the Grp semantics. > > You are right. > What you are looking for is the current GrpTresRunMins BUT without taking > into account the children (as you named: MaxTresRunMins), so you can set it > at top level as a default and not an aggregated limit. > This feature is not available, and the workarounds are: > - Set GrpTresRunMins individually per user, as you were trying to avoid. > - Use other similar limits to obtain a similar behavior, as I proposed. > Hi Albert, may I follow up with a related question at this point? What I'd actually like to achieve with GrpTresRunMins is to allow users to occupy a larger amount of resources for a short time but not so much for a long time. I think this can be achieved by setting GrpTresRunMins individually at the user level. That's fine so far. But I'd also like to add a limit for the maximum walltime of the jobs and also cap the total amount of resources that a user can occupy at any given time (aggregated over all running jobs). The latter is meant as kind of emergency brake to prevent individual users from taking over large fractions of the cluster by flooding the system with hundreds or thousands of short running jobs. I see two options to set the limit for the maximum walltime. The first one is to set the limit at the root level of the association tree, e.g.: sacctmgr -i update account root set MaxWall=14-00:00:00 The second one is to set this limit for the default QOS of the cluster, e.g. sacctmgr modify qos normal set MaxWall=14-00:00:00 Can you confirm this? For the upper limit of the total amount of resources any individual user can occupy at a time, I do not see any way to define this at the top level of the association tree, as MaxTRES works at per job level. So I probably have to set GrpTRES individually per user again (just like the GrpTRESRunMins limit). Is that correct? Another option might be to use MaxTRESPerUser for the default QOS of the cluster, e.g. sacctmgr modify qos normal set MaxTRESPerUser=cpu=1000,mem=8000g Can you please give me some advise whether I should apply those base limits at account/user level or better use the QOS approach? As far as I understood, the association tree is the best way to establish some base limits on accounts and users and QOS's are actually meant to override any of those base limits for exceptional cases. But I may be wrong. Best regards Jürgen
Hi Jürgen, > may I follow up with a related question at this point? Probably it's better at this point to open a new bug, but let me try to answer what you already asked for here: > What I'd actually like to achieve with GrpTresRunMins is to allow users > to occupy a larger amount of resources for a short time but not so much > for a long time. I think this can be achieved by setting GrpTresRunMins > individually at the user level. That's fine so far. Good. > But I'd also like to > add a limit for the maximum walltime of the jobs > I see two options to set the limit for the maximum walltime. The first one > is to set the limit at the root level of the association tree, e.g.: > > sacctmgr -i update account root set MaxWall=14-00:00:00 > > The second one is to set this limit for the default QOS of the cluster, > e.g. > > sacctmgr modify qos normal set MaxWall=14-00:00:00 > > Can you confirm this? Yes, I can confirm this. And let me add some extra comments: - You can also set this limit "per partition" in slurm.conf: MaxTime Maximum run time limit for jobs. Format is minutes, minutes:seconds, hours:minutes:seconds, days-hours, days-hours:minutes, days-hours:minutes:seconds or "UNLIMITED". Time resolution is one minute and second values are rounded up to the next minute. - As the same limit can be set in several places, Slurm has a hierarchical approach to select the limit that the system will use. The order is: 1) Partition QOS limit 2) Job QOS limit 3) User association 4) Account association(s), ascending the hierarchy 5) Root/Cluster association 6) Partition limit 7) None See https://slurm.schedmd.com/resource_limits.html for more details. - Please be sure that the limits that you set are actually enforced. See AccountingStorageEnforce: AccountingStorageEnforce This controls what level of association-based enforcement to impose on job submissions. Valid options are any combination of associations, limits, nojobs, nosteps, qos, safe, and wckeys, or all for all things (expect nojobs and nosteps, they must be requested as well). - Note that setting a limit on a default QoS does nothing if the user submits a job to another QoS. So, for such base/security limits, QoS is probably not a good idea. - For completeness let me mention GrpWall, although it's a "Grp" limit, you know, shared in a hierarchy: GrpWall=<max wall> Maximum wall clock time running jobs are able to be allocated in aggregate for this association and all associations which are children of this association. > and also cap the total > amount of resources that a user can occupy at any given time (aggregated > over all running jobs). The latter is meant as kind of emergency brake to > prevent individual users from taking over large fractions of the cluster > by flooding the system with hundreds or thousands of short running jobs. > > For the upper limit of the total amount of resources any individual user > can occupy at a time, I do not see any way to define this at the top level > of the association tree, as MaxTRES works at per job level. > So I probably have to set GrpTRES individually per user again (just > like the GrpTRESRunMins limit). Is that correct? Yes, this is correct. This is the same scenario that we discussed for GrpTRESRunMins. > Another option might be to use MaxTRESPerUser for the default QOS of > the cluster, e.g. > > sacctmgr modify qos normal set MaxTRESPerUser=cpu=1000,mem=8000g Yes, this is correct. But I mentioned earlier, this will limit only the jobs submitted against that default QoS. Jobs submitted to another QoS won't be considered/aggregated. > Can you please give me some advise whether I should apply those base > limits at account/user level or better use the QOS approach? > > As far as I understood, the association tree is the best way to > establish some base limits on accounts and users and QOS's are > actually meant to override any of those base limits for exceptional > cases. But I may be wrong. You are totally right. Well, I wouldn't say that QoS are meant to override the assosiation limits, although they do, but more to have a perpendicular way to set limits, or to allow the same user to have different limits. But I totally agree that the way to set such base/security limits is on Associations, and QoS are meant for fine grained limits based on any share polices that you want to establish. Hope that helps, Albert