Created attachment 30317 [details] SSHARE We are seeing an issue where jobs are not getting scheduled sooner even though they should move in front of others we believe based on fairshare value. When we run sprio we are not seeing any fairshare values coming across but we are seeing them when we run ssshare command - so it appears to me something is not correct in the config, even though we do have PriorityType=priority/multifactor in slurm.conf. I've attached the relevant output of those commands - we are struggling to make sure that fairshare seems to be working and prioritizing appropriately.
Created attachment 30319 [details] slurm.conf
Created attachment 30320 [details] squeue
Created attachment 30321 [details] priority
It looks like you are missing the weights. https://slurm.schedmd.com/priority_multifactor.html#configexample For example: PriorityWeightAge=1000 PriorityWeightFairshare=10000 PriorityWeightJobSize=1000 PriorityWeightPartition=1000 PriorityWeightQOS=2 Before enabling these, please review the parameters and adjust them to your site's needs. We have found that increments of 1000 seem to work better than in the 100s regarding the weights.
Thanks. Is there not a default value or does it default to 0? Is this why priority is all equal? Does this make it really working like FIFO now? Is there a typical best practice number for these? I noticed the example that QOS was set to 2. Was this on purpose? Does this mean it is valued less because it is a multiplier or more because the weight is a divider? Doesn't Age and Jobsize already get calculated in fair share or is fair share only Consumed resources over the past x days? Thanks for the additional information - just trying to understand it all. Jeff Jeff Fahnoe Chief Information Officer The Wiatar Institute ________________________________ From: bugs@schedmd.com <bugs@schedmd.com> Sent: Tuesday, May 16, 2023 5:19 PM To: Jeff Fahnoe <jfahnoe@Wistar.org> Subject: [EXT] [Bug 16756] Fairshare Accounting Comment # 4<https://bugs.schedmd.com/show_bug.cgi?id=16756#c4> on bug 16756<https://bugs.schedmd.com/show_bug.cgi?id=16756> from Jason Booth<mailto:jbooth@schedmd.com> It looks like you are missing the weights. https://slurm.schedmd.com/priority_multifactor.html#configexample For example: PriorityWeightAge=1000 PriorityWeightFairshare=10000 PriorityWeightJobSize=1000 PriorityWeightPartition=1000 PriorityWeightQOS=2 Before enabling these, please review the parameters and adjust them to your site's needs. We have found that increments of 1000 seem to work better than in the 100s regarding the weights. ________________________________ You are receiving this mail because: * You reported the bug. NOTICE: The contents of this e-mail message and any attachments are intended solely for the addressee(s) and may contain confidential and/or legally privileged information. If you are not the intended recipient of this message or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message and any attachments. If you are not the intended recipient, you are notified that any use, dissemination, distribution, copying, or storage of this message or any attachment is strictly prohibited.
> Is there not a default value or does it default to 0? Is this why priority is > all equal? Does this make it really working like FIFO now? The defaults are indeed 0 and this is why you do not see values. [1] https://slurm.schedmd.com/slurm.conf.html#OPT_PriorityWeightQOS [2] https://slurm.schedmd.com/slurm.conf.html#OPT_PriorityWeightPartition > Is there a typical best practice number for these? There are starting points, however each site usually ends up adjusting these to their workload, and user/use cases. > PriorityWeightAge=1000 > PriorityWeightFairshare=10000 > PriorityWeightJobSize=1000 > PriorityWeightPartition=1000 This mostly depends on what your site wants to prioritize and how much priority you want to give groups or users. > I noticed the example that QOS was set to 2. Was this on purpose? Does this > mean it is valued less because it is a multiplier or more because the weight is > a divider? This was just an example. I just wanted to draw your attention to the parameters that are frequently used. I would highly suggest you look over what is available and use those to determine what is important to your site, such as QOS or single users. > Doesn't Age and Jobsize already get calculated in fair share or is fair share > only Consumed resources over the past x days? By default, these are 0 and are not factored in unless set. Both can be used to influence job priority. Regarding time and age. These are also influenced by PriorityCalcPeriod PriorityDecayHalfLife. [3] https://slurm.schedmd.com/slurm.conf.html#OPT_PriorityCalcPeriod [4] https://slurm.schedmd.com/slurm.conf.html#OPT_PriorityDecayHalfLife An older Slug presentation that covers some example use cases. [5] https://slurm.schedmd.com/SLUG19/Priority_and_Fair_Trees.pdf Most of what is covered is drawn from the following two web documents. [6] https://slurm.schedmd.com/priority_multifactor.html [7] https://slurm.schedmd.com/fair_tree.html
Jeff just following up on this and resolving it out. Please feel free to re-open if you need further information regarding this issue.