Ticket 9470

Summary:	Question on scheduling fairness
Product:	Slurm	Reporter:	Greg Wickham <greg.wickham>
Component:	Scheduling	Assignee:	Marcin Stolarek <cinek>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	4 - Minor Issue
Priority:	---	CC:	cinek
Version:	20.02.3
Hardware:	Linux
OS:	Linux
Site:	KAUST	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description Greg Wickham 2020-07-26 23:40:53 MDT

Dear Team,

One of our CS support team has raised a question about "scheduling fairness" that i cannot answer. I include it below. Would be appreciated if you could provide any comments / suggestions. -Greg

Question:

We are looking to for a way to satisfy the following user-centric 'fairness' objective that prioritizes user-level work progress:

Within a given priority / QoS level, users can expect to make progress on at least one of their submitted jobs by the end of the max job length cycle. I.e., we are trying to maximize the number of simultaneous users.

Assuming: users with the same base priority / QoS level, max job length of 24 hours, and sufficient compute resources to satisfy at least one job from each user simultaneously.

Is there a way to adjust job priority based on the number of jobs the user currently has running*?

What approach would you recommend to achieve this 'user-progress' / 'maximum simultaneous users' objective?

Thank you in advance for your input and advice.

Comment 1 Marcin Stolarek 2020-07-27 01:42:27 MDT

Greg,

Obviously there is a number of ways you can achieve this ( not very well defined) goal, but focusing on
>users can expect to make progress on at least one 

I think that they may be interested in a multifactor configuration where time in queue[1] is a key factor within given QoS with a low per-user limit of MaxJobsAccrue[1]. This will guarantee priority increase when the job is pending, but only for the specified number of jobs, so new user job/jobs will easily get ahead of jobs of the user that submitted a lot in the past (within or beyond other factors configured by PriorityWeight parameters).

Does that make sense?

cheers,
Marcin

[1]https://slurm.schedmd.com/slurm.conf.html#OPT_PriorityWeightAge
[2]https://slurm.schedmd.com/sacctmgr.html#SECTION_GENERAL-SPECIFICATIONS-FOR-ASSOCIATION-BASED-ENTITIES

Comment 2 Greg Wickham 2020-07-29 22:27:47 MDT

Dear Marcin,

It almost makes sense :-/

If I can rephrase what you stated.

Use "PriorityWeightAge" to give increased priority up to the "MaxJobsAccrue" number of jobs.

So, as a users job finishes "MaxJobsAccrue" will permit the priority to be lifted on a limited number of pending jobs.

Hypothetically, if users submit jobs:

 - if there is no congestion all jobs will start limited only by existing resource limits

 - if there is congestion each user jobs would start up to the per user resource limits (normal rules), however a limited number of pending jobs would have a  increased priority based on how long the job has been in the queue. As jobs finished, the oldest job (within the accrue limit) would have the highest priority.

Is this correct?

   -greg

Comment 3 Marcin Stolarek 2020-07-30 00:56:13 MDT

Greg,

> - if there is no congestion all jobs will start limited only by existing resource limits
Yes, MaxJobsAccrue will not prevent jobs from starting it will only limit the number of jobs gaining priority while pending in the queue.

> - if there is congestion each user jobs would start up to the per user resource limits (normal rules), however, a limited number of pending jobs would have a increased priority based on how long the job has been in the queue. As jobs finished, the oldest job (within the accrue limit) would have the highest priority.
MaxJobsAccrue doesn't take running jobs into account, so as one users jobs get into running state one pending job will start accruing age-based priority. If you want to provide an additional boost to jobs from users that have low resources usage (based on running and completed jobs) you can use fair share[1] factor.

It's not very uncommon to use a (sometimes sophisticated) mixture of different Priority* parameters to tune job priority calculation to fit site needs.

Let me know if you have further questions/doubts.

cheers,
Marcin
[1] https://slurm.schedmd.com/slurm.conf.html#OPT_PriorityWeightFairshare

Comment 4 Greg Wickham 2020-07-31 05:11:20 MDT

Hi Marcin,

Thanks. I feel you've given us enough to think about.

Please close the ticket.

   -greg