Ticket 15349 - multicluster shares QOS global table with other clusters; request for separable but common named QOS and/or multi-QOS specification and submission
Summary: multicluster shares QOS global table with other clusters; request for separab...
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Limits (show other tickets)
Version: 22.05.5
Hardware: Linux Linux
: 5 - Enhancement
Assignee: Unassigned Developer
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-11-03 15:19 MDT by S Senator
Modified: 2023-01-04 15:58 MST (History)
5 users (show)

See Also:
Site: LANL
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name: chicoma, guaje, tycho, rocinante, razorback, xroads
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description S Senator 2022-11-03 15:19:20 MDT
We are in the process of rolling out multicluster capabilities among our various clusters.

All of the clusters have a consistent QOS naming scheme with QOS of 'high, debug, standard, standby' with a variety of limits associated with each.

As clusters are enabled with a common data base and shared slurmdbd we would like mechanisms to do any or all of the following features. We understand that these are likely to require sponsored enhancements to current slurm features and request a SOW.

Feature requests, in ranked order of desirability and near-term priority:
1. The ability to define TRES limits in terms of cluster-relative or partition-relative quantities, such as nodes_percent=20 rather than nodes=50 as a fixed constant.

2. Separation of the slurm_acct_db QOS table from a per-data base table into a per-cluster table, so that a given cluster could have tres limits defined appropriate to the size of the resources on each cluster. If a global QOS table were to exist, then a (partition?) flag to determined whether the job or partition flag would take precedence.

3. A mechanism to submit a job to a QOS list, such that a job could be considered for each QOS, in the order defined by the QOS list. The first QOS which allowed this job to be enqueued would be selected.
Comment 2 S Senator 2022-11-07 16:05:02 MST
Related and desirable limit requests:

4) bf_threshold_percent or bf_threshold_percent_free
   which would only backfill up to a specific percentage of use (bf_threshold_percent=95) for this QOS, or leave a certain percentage free (bf_threshold_percent_free=5)

5) qos_limits_enforced_starttime=<timespec>, qos_limits_enforced_endtime=<timespec>, qos_limits_apply=[weekly, weekend, weekday, daily]
   That is, the limits specified by the QOS would be enforced between starttime and endtime either for the specified set of days.
Comment 3 Jason Booth 2022-11-17 14:46:59 MST
Steve, I just wanted to give you an update here. We are still discussing this request internally, however, I do have some initial comments to share. I will get back to you on comment#2.

[Point 1]

This is something we don't think we would be interested in tackling. 


[Point 2]

I do remember us discussing this back when Tim and I were on-site, at least the initial idea of this was talked about.

The model we suggest would be to mark the cluster-specific QOS's as cluster-name. Then as part of the job_submit logic, you could readily prepend the cluster name, and end up with the "correct" internal QOS name.


[Point 3]

This is a request that we are not too excited about. Multi-partition already causes considerable scheduling overhead as we need to independently evaluate the job against each option. Allowing for multi-QOS submissions would further complicate this. And if you combined multi-partition and multi-QOS you'd be even worse off.
Comment 4 S Senator 2022-11-17 15:10:37 MST
If you'd prefer that this conversation took place directly, I am open to that.

The capability in
> 1) TRES limits as percentages

which, just to be clear, you refer to as [point 1] (correct?), would provide a very large fraction of the solution to this problem. Could you elaborate on the reasoning behind not addressing it as a paid enhancement?

I understand and am not particularly surprised at the difficulty or scheduling time cost of implementing per-cluster QOS and multiple QOS lists. I appreciate that you included the reasoning behind why this is not an ideal feature to be implemented in slurm.
Comment 5 S Senator 2022-11-17 15:31:52 MST
I appreciate the feedback.

Thank you,
-Steve Senator

________________________________________
From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Thursday, November 17, 2022 2:46:59 PM
To: Senator, Steven Terry
Subject: [EXTERNAL] [Bug 15349] multicluster shares QOS global table with other clusters; request for separable but common named QOS and/or multi-QOS specification and submission

Comment # 3<https://urldefense.com/v3/__https://bugs.schedmd.com/show_bug.cgi?id=15349*c3__;Iw!!Bt8fGhp8LhKGRg!Ef85lVCmZlWPJJuNgWDgf0t50bY99OmDfdmC05KA_GzuJV1hQyHpt1Y2Llg1av9OXSov9Gs4yg$> on bug 15349<https://urldefense.com/v3/__https://bugs.schedmd.com/show_bug.cgi?id=15349__;!!Bt8fGhp8LhKGRg!Ef85lVCmZlWPJJuNgWDgf0t50bY99OmDfdmC05KA_GzuJV1hQyHpt1Y2Llg1av9OXSqFbj0ZxQ$> from Jason Booth<mailto:jbooth@schedmd.com>

Steve, I just wanted to give you an update here. We are still discussing this
request internally, however, I do have some initial comments to share. I will
get back to you on comment#2<https://urldefense.com/v3/__https://bugs.schedmd.com/show_bug.cgi?id=15349*c2__;Iw!!Bt8fGhp8LhKGRg!Ef85lVCmZlWPJJuNgWDgf0t50bY99OmDfdmC05KA_GzuJV1hQyHpt1Y2Llg1av9OXSqmQAtdhg$>.

[Point 1]

This is something we don't think we would be interested in tackling.


[Point 2]

I do remember us discussing this back when Tim and I were on-site, at least the
initial idea of this was talked about.

The model we suggest would be to mark the cluster-specific QOS's as
cluster-name. Then as part of the job_submit logic, you could readily prepend
the cluster name, and end up with the "correct" internal QOS name.


[Point 3]

This is a request that we are not too excited about. Multi-partition already
causes considerable scheduling overhead as we need to independently evaluate
the job against each option. Allowing for multi-QOS submissions would further
complicate this. And if you combined multi-partition and multi-QOS you'd be
even worse off.

________________________________
You are receiving this mail because:

  *   You reported the bug.
Comment 6 Kilian Cavalotti 2022-11-29 09:40:48 MST
Hi!

Just wanted to add a quick note to express our interest in 1. as well:

"The ability to define TRES limits in terms of cluster-relative or partition-relative quantities, such as nodes_percent=20 rather than nodes=50 as a fixed constant."

We have a fair number of independent partitions, each with their own access list, and we tend to re-use the same partition QOS on each of those partitions, since their usage semantics is pretty much the same.

Their size can vary a lot though (from 1 to several hundred nodes), so having a way to express QOS limits as a percentage of the partition size would be very helpful to us.

Cheers,
--
Kilian
Comment 8 Jason Booth 2023-01-02 10:25:03 MST
Steve, I am following up with this issue. On the last call with SchedMD and LANL 
we provided some further explanation regarding this issue and based on that call 
LANL was not interested in pursing development for this feature at this time, 
though it does look like other sites might be interested in such a feature. 
Therefore, I propose we convert this into a Sev-5 / unassigned with the 
understanding that we won't do anything on it unless some site wants to pursue 
sponsorship.
Comment 9 Kilian Cavalotti 2023-01-02 10:25:12 MST
Hi,

I am currently out of office, returning on Jan 9th. 

If you need to
reach Stanford Research Computing, please email srcc-support@stanford.edu

Cheers,
Comment 10 S Senator 2023-01-02 13:49:04 MST
(In reply to Jason Booth from comment #8)
> Steve, I am following up with this issue. On the last call with SchedMD and
> LANL 
> we provided some further explanation regarding this issue and based on that
> call 
> LANL was not interested in pursing development for this feature at this
> time, 

Jason,  As this discussion is relevant to the community I'm choosing not to restrict access to a narrower group. But, let's communicate directly. I'm not sure that this is what we meant to communicate and convey. -Steve