We have a machine with 2 partitions: batch and gpu. The desire is to limit the batch partition to 4 running jobs at a time and the gpu partition to 1 running job at a time. However we have a small number of users/accounts that should be exempt from this limit. You can't specify MaxJob on a partition directly, so I would have to create a partition QOS. That's the first item evaluated for limits, so you can't override it with a (Cluster,Account,User,Partition) association. You could use OverPartQOS and give the user access to a QOS with a relaxed limit, but then you get into "qos creep". Alternatively, I could just create all the user associations as (Cluster,Account,User,Partition), requiring an association per user per partition. This would be completely unmanageable by hand, but is potentially viable with automated processes manipulating the slurmdb. What's the best way to do this?
Hi Matt - I am looking into this. As you have already discovered the mechanisms to do this are rather limited at this time but I will see what I can find for you. Your idea with partition associations may just be the best solution given the current state of "partition limits" and overriding. Do all of these users have the same QoS and would you be able to use different QoSes as part of a solution here without a partition QOS or OverPartQOS?
Today, we aren't using QOS (everyone just uses "normal"). The problem then becomes the QOS creep I mentioned, where you need a QOS for every possible combination of limits (jobs running, jobs accruing, max walltime, etc). To support allowing 1 or 4 running jobs with a 1day or 2 day max walltime, you need 4 QOS: running1_wall24 running1_wall48 running4_wall24 running4_wall48 Each time a user comes with another request (I need 10 jobs, etc) you have to add more QOS and logic to the submit filter to 'steer' the job to the right QOS. Need to also be able to change the accrue limits? That doubles the number of needed QOS.
Matt - What you are asking for is a blanket MaxJob to be applied to all submitted job into that partition and have it as a per partition limit and overridable by the user association. As you already are aware of the MaxJob setting is an association or QoS limit and no such limit is in the core slurmctld code. You options are limited since you want to avoid creating QoSes. Option 1) Which seems like the best solution in your case Configure (Cluster,Account,User,Partition) as previously mentioned in the description second paragraph. Option 2) Setup partition QoS to limit batch (4 jobs) GPU (1 jobs). Use a job_submit plugin for the exception QOS w/OverPartQOS and otherwise use the partition default QOS. No matter what option you choose you will get some type of "creep". You could open an NRE to have the association override the QoS similar to what we do with OverPartQOS.
Matt - after talking this over a bit more internally we want to put more emphasis on option 2 as the preferred option. We see partition QoS as the right answer to this problem.
That's non-ideal for several reasons: - QOSs are global on a slurmdbd - I'll need a QOS for every possible limit combination So we are going to have a LOT if we are serving multiple clusters from the same SlurmDBD (as recommended). I guess that's the path I'll go down for now, but I'd be interested to chat to scope NRE paths that could simplify this.
> I guess that's the path I'll go down for now, but I'd be interested to chat to scope NRE paths that could simplify this. To simplify the NRE process would you open a new bug with that request?