Summary: | Questions regarding resource limits | ||
---|---|---|---|
Product: | Slurm | Reporter: | Jeff White <jeff.white> |
Component: | Accounting | Assignee: | Tim Wickberg <tim> |
Status: | RESOLVED INFOGIVEN | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | ||
Version: | 15.08.8 | ||
Hardware: | Linux | ||
OS: | Linux | ||
See Also: | https://bugs.schedmd.com/show_bug.cgi?id=4245 | ||
Site: | Washington State University | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Tzag Elita Sites: | --- |
Linux Distro: | --- | Machine Name: | |
CLE Version: | Version Fixed: | ||
Target Release: | --- | DevPrio: | --- |
Emory-Cloud Sites: | --- | ||
Attachments: | It's called slurm.conf, what do you think it would be? |
Description
Jeff White
2016-03-23 08:20:33 MDT
Created attachment 2906 [details]
It's called slurm.conf, what do you think it would be?
(In reply to Jeff White from comment #0) > I am trying to determine how to implement certain resource limits on our > cluster and I'm looking for guidance on how we should do so. > I have accounts configured so that each partition has a 1-to-1 mapping to an > account. Each account is then associated with the users who should be able > to access the partition. The parent of all of those accounts is an account > called "all" which has no users directly associated with it. If I recall, you have a strict "condo" model where each group only has access to their own hardware? If so then your partitions look reasonable. > I have not configured anything with qos, mostly because I'm not clear on > what "qos" is and if I absolutely need it or not. From what I can tell > there are different types of resource limitations, some for partitions, some > for qos, some for "associations", etc. We could document this a bit better. I assume you've looked at http://slurm.schedmd.com/qos.html http://slurm.schedmd.com/resource_limits.html As association is a mapping of limits to a (cluster,account), possibly further limited by the partition and user. sacctmgr is the tool to view and modify all of these, I'll give some examples below. Think of a QOS as a set of limits (and bucket of usage which is compared to the limit) that isn't tied to the accounting hierarchy. > The first limit I'm being asked for is to prevent a single user (regardless > of number of jobs they submit) from using all nodes in a partition. Can you > recommend how we can do that in Slurm? Do I need to use qos for something > like that or is there a setting that can be applied directly to a partition? There's a few approaches to this depending on your preference. One way would be to, per user, set a maximum number of nodes they have access to: sacctmgr update user tim set maxtres=node=2 Another approach would be to set a max grprunmin sacctmgr update user tim set grpwall=1000 this would limit me to a maximum of 1000 cpu-minutes of jobs running total on the cluster at any time, which may be simpler to explain than why a nodes may be sitting idle otherwise. If you'd rather not set these limits individually, you could use a QOS per account to set a value for everyone in the account: sacctmgr add qos acct_timlab maxtresperuser=node=2 (Note that "acct_tim" is just my shorthand for the QOS being for the timlab account - the names can be any text string you prefer.) sacctmgr update account timlab set qos=acct_timlab An easy way to verify this is set correctly then is: sacctmgr show assoc tree format=account,user,qos Alternatively, if you get to a point where multiple accounts have access to the same partition, you can build a QOS with the various limits and set it on the partition itself and have that apply to everyone running on that partition. Let me know how I can elaborate on this, I'm hoping this at least points you in the right direction. - Tim (In reply to Tim Wickberg from comment #2) > If I recall, you have a strict "condo" model where each group only has > access to their own hardware? > > If so then your partitions look reasonable. That's correct. There's also a "free" partition anyone can access and at some point in the future there will be a backfill partition that will also be accessible by any user. > One way would be to, per user, set a maximum number of nodes they have > access to: > > sacctmgr update user tim set maxtres=node=2 > > Another approach would be to set a max grprunmin > > sacctmgr update user tim set grpwall=1000 > > this would limit me to a maximum of 1000 cpu-minutes of jobs running total > on the cluster at any time, which may be simpler to explain than why a nodes > may be sitting idle otherwise. > > If you'd rather not set these limits individually, you could use a QOS per > account to set a value for everyone in the account: > > sacctmgr add qos acct_timlab maxtresperuser=node=2 > > (Note that "acct_tim" is just my shorthand for the QOS being for the timlab > account - the names can be any text string you prefer.) > > sacctmgr update account timlab set qos=acct_timlab > > An easy way to verify this is set correctly then is: > sacctmgr show assoc tree format=account,user,qos > > Alternatively, if you get to a point where multiple accounts have access to > the same partition, you can build a QOS with the various limits and set it > on the partition itself and have that apply to everyone running on that > partition. We have the opposite at the moment, for example my user is in multiple accounts. Most users are only in one account though and that account is a 1-to-1 mapping to a partition. The exception is an account named "all" which is the parent of all other accounts. No idea if that's a good idea but it works so that's what I did. What you describe above seems like that would be good for global limits but what I'm looking for I guess is "max nodes per user per partition regardless of number of jobs". In an ideal world it would simply be "PartitionName=blah MaxNodesPerUser=50%" so a single user can't take more than 50% of a partition (regardless of what they may be doing in other partitions). How could we implement that? Partitions have a "MaxNodes" parameter but that's per job so a user can simply submit multiple jobs to get around the limitation. > We have the opposite at the moment, for example my user is in multiple
> accounts. Most users are only in one account though and that account is a
> 1-to-1 mapping to a partition. The exception is an account named "all"
> which is the parent of all other accounts. No idea if that's a good idea
> but it works so that's what I did.
>
> What you describe above seems like that would be good for global limits but
> what I'm looking for I guess is "max nodes per user per partition regardless
> of number of jobs". In an ideal world it would simply be
> "PartitionName=blah MaxNodesPerUser=50%" so a single user can't take more
> than 50% of a partition (regardless of what they may be doing in other
> partitions). How could we implement that? Partitions have a "MaxNodes"
> parameter but that's per job so a user can simply submit multiple jobs to
> get around the limitation.
The easiest way to do this will be with a "Partition QOS" defined with a strict node limit (there is not "50%" setting available, it works off strict counts only so you'd need to adjust these to suit).
The MaxTRESPerUser flag is designed to handle this exact situation. Briefly, you'd define a QOS to use on the partition as:
sacctmgr add qos part_example maxtresperuser=node=2
Once created, you can now apply this QOS to a partition by setting QOS=part_example in the Partition definition in slurm.conf.
For example, my line is now:
PartitionName=example Nodes=zoidberg[01-04] MaxTime=7-0 QOS=part_example
Running 'scontrol reconfigure' will apply that change to the partition definition.
'scontrol show part example' can be used to confirm the setting is applied. 'scontrol show assoc_mgr' can give you a detailed look into the internal status of the various QOS and association limits that are currently in use on the cluster.
You may also want to set SchedulerParameters=assoc_limit_continue to avoid the highest-priority job in a given partition from blocking other jobs in that partition from launching.
I created a QOS as you described and it doesn't seem to be working as expected. I'm now doing this on a development system, I'll upload its config. So to make things simple I have a single partition and I applied a single QOS to it: # # PARTITIONS # PartitionName=DEFAULT MaxTime=10080 State=UP Default=NO DefMemPerCPU=100 PartitionName=whatever Nodes=dn[1-4] QOS=whatever That QOS was created with `sacctmgr add qos whatever maxtresperuser=node=2`. I restarted slurmctld then submitted a few single CPU jobs. Two of them began running, taking up a single node which has 2 CPU cores. The next two jobs went into PENDING with QOSMaxNodePerUserLimit. Shouldn't the QOS have allowed these jobs to run as my user only had running jobs on a single node? $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 33 whatever run_burn jeff.whi PD 0:00 1 (QOSMaxNodePerUserLimit) 34 whatever run_burn jeff.whi PD 0:00 1 (QOSMaxNodePerUserLimit) 31 whatever run_burn jeff.whi R 4:52 1 dn1 32 whatever run_burn jeff.whi R 4:52 1 dn1 # sacctmgr show qos whatever Name Priority GraceTime Preempt PreemptMode Flags UsageThres UsageFactor GrpTRES GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit GrpWall MaxTRES MaxTRESPerNode MaxTRESMins MaxWall MaxTRESPU MaxJobsPU MaxSubmitPU MinTRES ---------- ---------- ---------- ---------- ----------- ---------------------------------------- ---------- ----------- ------------- ------------- ------------- ------- --------- ----------- ------------- -------------- ------------- ----------- ------------- --------- ----------- ------------- whatever 0 00:00:00 cluster 1.000000 node=2 # scontrol show part whatever PartitionName=whatever AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL AllocNodes=ALL Default=NO QoS=whatever DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=7-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=dn[1-4] Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF State=UP TotalCPUs=8 TotalNodes=4 SelectTypeParameters=N/A DefMemPerCPU=100 MaxMemPerNode=UNLIMITED > That QOS was created with `sacctmgr add qos whatever maxtresperuser=node=2`.
> I restarted slurmctld then submitted a few single CPU jobs. Two of them
> began running, taking up a single node which has 2 CPU cores. The next two
> jobs went into PENDING with QOSMaxNodePerUserLimit. Shouldn't the QOS have
> allowed these jobs to run as my user only had running jobs on a single node?
Each of the "nodes" from separate jobs counts independently, even though the jobs are packed into a single node. So in this case one node isn't one node, but two.
I should have pointed out that limit. I'd suggest using the cpu counts instead, that'll be a bit more intuitive.
Closing issue, we were able to get limits in place as described. |