Created attachment 4299 [details] Add bf_max_job_user_part configuration parameter. Add SchedulerParameters option of bf_max_job_user_part which will limit the number of backfill jobs per-user per-partition, so a value of bf_max_job_user_part=30 will permit the backfill scheduler to test 30 jobs for each user in each partition, so if they have a bunch of jobs in 3 partitions, they will have 30 jobs tested for each partition for a total of 90 jobs. You will want to use this instead of bf_max_job_user and probably restore bf_max_job_part to a value of say 300. Note that the attached patch applies to version 17.02.2 and will need to be managed as a local patch until version 17.11 is released and you upgrade. The patch has been added to version 17.11 in this commit: https://github.com/SchedMD/slurm/commit/e6e28f449bce78b6711d3bf88c40a6eab870e0e9
Today I discovered that Slurm already does have a solution that will enforce a per-user per-partition limit. Here is how that should be done: 1. Define a QOS for each Slurm partition 2. In each QOS set a MaxJobPerUser (The documentation for this option was just added today, but the option has existed for a while) 3. Add to each Slurm partition a QOS= parameter 4. Restart slurmctld *** This ticket has been marked as a duplicate of ticket 3668 ***
Hi Moe, I don't think that the QOS per Partition you mentioned would work in our case. IF I understand it correctly: setting MaxJobsPerUser would imply setting a "Maximum number of jobs each user is allowed to run at one time" for the given partition. We don't necessary want to limit the number of jobs each user can run in a given partition but only set a max to the number of pending jobs "per user per partition" the back-filler can check. Eventually the goal is to make sure that the back-filler has enough "freedom" to evaluate at least a few jobs per user per partitions and avoid what is happening right now. For example medium partition has 400+ job and none of those has an expected start time (i.e. is evaluated by the back-filler) rp189@login03:~ ssqueue -t PD -p medium --noheader|grep -v 'N\/A'|wc -l 0 rp189@login03:~ ssqueue -t PD -p medium --noheader|wc -l 487 while short has some jobs with expected start time assigned. rp189@login03:~ ssqueue -t PD -p short --noheader|grep -v 'N\/A'|wc -l 70 rp189@login03:~ ssqueue -t PD -p short --noheader|wc -l 23053 So we would still need this bf_per_user_per_partition feature going forward. We are actually already using MaxJobsPerUser QOS feature for our interactive and priority to limit the number of running jobs on those special partitions. Raffaele
(In reply to Research Computing from comment #2) > Hi Moe, > > So we would still need this bf_per_user_per_partition feature going forward. > > We are actually already using MaxJobsPerUser QOS feature for our interactive > and priority to limit the number of running jobs on those special partitions. Hi Raffaele, My error. Yes you should use this patch in that case. Note that with this patch you should probably only configure a bf_max_job_user_part option and remove the bf_max_job_user and bf_max_job_user options. What I want to do is not count a job toward any limit if it does not pass all three tests. That is not part of this patch, but I do plan to work on it soon.
Created attachment 4325 [details] Updated patch that allows all bf_max_job parameters to work This patch is an update of the previous one. This permits bf_max_job_user, bf_max_job_part and bf_max_job_user_part to all be used independently (i.e. the counters do not get updated until after all three tests succeed.