| Summary: | Resource Limits in percentage instead of absolute numbers | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | timeu <uemit.seren> |
| Component: | Configuration | Assignee: | Marcin Stolarek <cinek> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | ||
| Version: | 18.08.6 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | IMP | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
timeu
2019-04-17 13:58:12 MDT
Uemit, As far as I understand you're looking for an option to configure global hard limits on allocated resources based on wall time. If this is the case the most efficient way I see to achieve this would be to configure three partitions - short, medium, long - with different "MaxTime" and assign all nodes you have to short partition(default), 50% of all nodes to medium and only 20% to long. Instead of separate partitions for high-clock, high-mem, etc. I'd recommend using "feature" configuration option for nodes. If this is not the case - please elaborate a little bit on what are you trying to achieve. Do you want a 50% limit to be applied per user or per account instead of having it globally? Did you consider use of priority plugin to give shorter jobs a boost instead of "hard" limits? This approach may be beneficial since it provides overall higher utilization of resources. cheers, Marcin @Macin thanks for the reply. I don't think that your suggested approach would work for us We want to have distinct partitions for the different node types because we want to avoid overflow to the expensive nodes (high-mem nodes). The user has to specifically submit to the corresponding partition if he/she wants to target the high-mem nodes for example. Our current workaround is to create the MxN QOSes (short, medium and long for each partition) when we create the slurm cluster and then have a lua script that will re-write the user submitted qos (short, medium, long) to the actual qos/partition pair (c_short, g_medium, etc). Service Class Prio TimeLimit Resource/User TotalQoSLimit m_short 1000 cpu=202,mem=4109G cpu=404,mem=8218G g_short 1000 cpu=14,mem=173G cpu=28,mem=346G c_short 1000 cpu=966,mem=4132G cpu=1932,mem=8265G m_medium 500 cpu=80,mem=1643G cpu=202,mem=4109G g_medium 500 cpu=5,mem=69G cpu=14,mem=173G c_medium 500 cpu=386,mem=1653G cpu=966,mem=4132G m_long 100 cpu=40,mem=821G cpu=80,mem=1643G g_long 100 cpu=2,mem=34G cpu=5,mem=69G c_long 100 cpu=193,mem=826G cpu=386,mem=1653G short 0 08:00:00 medium 0 2-00:00:00 long 0 14-00:00:00 The user will use only short, medium and long QOS and the partition (c, m, g) and we will re-write it to the actual qos. For example if a user submits a job with q medium qos (medium) to the high-mem partition (m), we re-write it to m_medium This approach works, however if we could define the resource limits as percentages we could avoid the MxN combinations of qos/partitions. There is another advantage of having percentages instead of absolute values: Our slurm cluster might not be static and we might dynamically add and remove nodes from it. With the absolute values we always have to re-calculate/re-generate the QOSes. if we could specifiy the resource limits in percentages, we would not need to do that. I hope this clarifies our use case. In this case, the option you can try is to configure GrpTRES limits based on "billing", with a command like: > sacctmgr create qos medium GrpTRES=billing=50 MaxWall=2-0:0:0 Billing is calculated based on TRESBillingWeights[1] option defined per partition. For instance, setting TRESBillingWeights="CPU=0.5" will result in billing of 50 when 100 CPUs are in use. If you'd like to take other parameters like memory into account you may find PriorityFlags=MAX_TRES setting useful. The default behavior calculates billing as a sum of all parameters, with MAX_TRES billing for each resource is calculated separately and the highest value is treated as final result. I believe it's very close to percentage configuration you've been looking for. Let me know if this works for you. cheers, Marcin [1] https://slurm.schedmd.com/slurm.conf.html Since there were no further questions from you within a week. I'll close this ticket as "info given". Should you need any further information, please do not hesitate to reopen. cheers, Marcin |