| Summary: | Question on scheduling fairness | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Greg Wickham <greg.wickham> |
| Component: | Scheduling | Assignee: | Marcin Stolarek <cinek> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | cinek |
| Version: | 20.02.3 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | KAUST | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Greg Wickham
2020-07-26 23:40:53 MDT
Greg, Obviously there is a number of ways you can achieve this ( not very well defined) goal, but focusing on >users can expect to make progress on at least one I think that they may be interested in a multifactor configuration where time in queue[1] is a key factor within given QoS with a low per-user limit of MaxJobsAccrue[1]. This will guarantee priority increase when the job is pending, but only for the specified number of jobs, so new user job/jobs will easily get ahead of jobs of the user that submitted a lot in the past (within or beyond other factors configured by PriorityWeight parameters). Does that make sense? cheers, Marcin [1]https://slurm.schedmd.com/slurm.conf.html#OPT_PriorityWeightAge [2]https://slurm.schedmd.com/sacctmgr.html#SECTION_GENERAL-SPECIFICATIONS-FOR-ASSOCIATION-BASED-ENTITIES Dear Marcin, It almost makes sense :-/ If I can rephrase what you stated. Use "PriorityWeightAge" to give increased priority up to the "MaxJobsAccrue" number of jobs. So, as a users job finishes "MaxJobsAccrue" will permit the priority to be lifted on a limited number of pending jobs. Hypothetically, if users submit jobs: - if there is no congestion all jobs will start limited only by existing resource limits - if there is congestion each user jobs would start up to the per user resource limits (normal rules), however a limited number of pending jobs would have a increased priority based on how long the job has been in the queue. As jobs finished, the oldest job (within the accrue limit) would have the highest priority. Is this correct? -greg Greg, > - if there is no congestion all jobs will start limited only by existing resource limits Yes, MaxJobsAccrue will not prevent jobs from starting it will only limit the number of jobs gaining priority while pending in the queue. > - if there is congestion each user jobs would start up to the per user resource limits (normal rules), however, a limited number of pending jobs would have a increased priority based on how long the job has been in the queue. As jobs finished, the oldest job (within the accrue limit) would have the highest priority. MaxJobsAccrue doesn't take running jobs into account, so as one users jobs get into running state one pending job will start accruing age-based priority. If you want to provide an additional boost to jobs from users that have low resources usage (based on running and completed jobs) you can use fair share[1] factor. It's not very uncommon to use a (sometimes sophisticated) mixture of different Priority* parameters to tune job priority calculation to fit site needs. Let me know if you have further questions/doubts. cheers, Marcin [1] https://slurm.schedmd.com/slurm.conf.html#OPT_PriorityWeightFairshare Hi Marcin, Thanks. I feel you've given us enough to think about. Please close the ticket. -greg |