Ticket 3669

Summary: Add configuration option to limit the per-user/per-partition backfill limit
Product: Slurm Reporter: Moe Jette <jette>
Component: SchedulingAssignee: Moe Jette <jette>
Status: RESOLVED DUPLICATE QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: kathleen_keating, rc
Version: 17.02.1   
Hardware: Linux   
OS: Linux   
Site: Harvard Medical School Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: Add bf_max_job_user_part configuration parameter.
Updated patch that allows all bf_max_job parameters to work

Description Moe Jette 2017-04-05 14:42:17 MDT
Created attachment 4299 [details]
Add bf_max_job_user_part configuration parameter.

Add SchedulerParameters option of bf_max_job_user_part which will limit the number of backfill jobs per-user per-partition, so a value of bf_max_job_user_part=30 will permit the backfill scheduler to test 30 jobs for each user in each partition, so if they have a bunch of jobs in 3 partitions, they will have 30 jobs tested for each partition for a total of 90 jobs. You will want to use this instead of bf_max_job_user and probably restore bf_max_job_part to a value of say 300.

Note that the attached patch applies to version 17.02.2 and will need to be managed as a local patch until version 17.11 is released and you upgrade.

The patch has been added to version 17.11 in this commit:
https://github.com/SchedMD/slurm/commit/e6e28f449bce78b6711d3bf88c40a6eab870e0e9
Comment 1 Moe Jette 2017-04-07 07:13:20 MDT
Today I discovered that Slurm already does have a solution that will enforce a per-user per-partition limit. Here is how that should be done:
1. Define a QOS for each Slurm partition
2. In each QOS set a MaxJobPerUser (The documentation for this option was just added today, but the option has existed for a while)
3. Add to each Slurm partition a QOS= parameter
4. Restart slurmctld

*** This ticket has been marked as a duplicate of ticket 3668 ***
Comment 2 HMS Research Computing 2017-04-07 11:35:35 MDT
Hi Moe,
I don't think that the QOS per Partition you mentioned would work in our case.

IF I understand it correctly: 
setting  MaxJobsPerUser would imply setting a "Maximum number of jobs each user is allowed to run at one time" for the given partition.

We don't necessary want to limit the number of jobs each user can run in a given partition but only set a max to the number of pending jobs "per user per partition" the back-filler can check.
Eventually the goal is to make sure that the back-filler has enough "freedom" to evaluate at least a few jobs per user per partitions and avoid what is happening right now. For example medium partition has 400+ job and none of those has an expected start time (i.e. is evaluated by the back-filler)
rp189@login03:~ ssqueue -t PD -p medium --noheader|grep -v 'N\/A'|wc -l
0
rp189@login03:~ ssqueue -t PD -p medium --noheader|wc -l
487
while short has some jobs with expected start time assigned.
rp189@login03:~ ssqueue -t PD -p short --noheader|grep -v 'N\/A'|wc -l
70
rp189@login03:~ ssqueue -t PD -p short --noheader|wc -l
23053

So we would still need this bf_per_user_per_partition feature going forward.

We are actually already using MaxJobsPerUser QOS feature for our interactive and priority to limit the number of running jobs on those special partitions.

Raffaele
Comment 3 Moe Jette 2017-04-07 14:49:31 MDT
(In reply to Research Computing from comment #2)
> Hi Moe,
>
> So we would still need this bf_per_user_per_partition feature going forward.
> 
> We are actually already using MaxJobsPerUser QOS feature for our interactive
> and priority to limit the number of running jobs on those special partitions.

Hi Raffaele,

My error. Yes you should use this patch in that case. Note that with this patch you should probably only configure a bf_max_job_user_part option and remove the  bf_max_job_user and  bf_max_job_user options. What I want to do is not count a job toward any limit if it does not pass all three tests. That is not part of this patch, but I do plan to work on it soon.
Comment 4 Moe Jette 2017-04-10 08:36:20 MDT
Created attachment 4325 [details]
Updated patch that allows all bf_max_job parameters to work

This patch is an update of the previous one. This permits bf_max_job_user, bf_max_job_part and bf_max_job_user_part to all be used independently (i.e. the counters do not get updated until after all three tests succeed.