The behavior we are looking for is that each user gets an equal share and that the halflife on their usage decays such that they should recover (or nearly recover) all their shares after 5 days of inactivity. What I see now is that we have 1000 raw shares allocated but each user has just 2 raw share allocated (should that be 10?) current effective shares don't make sense to me nor does the effective Usage. for example: Account User RawShares NormShares RawUsage NormUsage EffectvUsage FairShare LevelFS-------------------- ---------- ---------- ----------- ----------- ----------- ------------- ---------- ----------root 0.000000 12060 1.000000 root root 1 0.000999 23 0.001909 0.001909 0.014085 0.523278 hpcusers 1000 0.999001 12037 0.998091 0.998091 1.000912 hpcusers user1 1 0.014286 0 0.000058 0.000058 0.098592 244.882215 hpcusers user2 1 0.014286 52 0.004369 0.004377 0.042254 3.263630 In particular I don't see how / where the decay function is applied to restore a users usage. The intent is that for any user who hasn't used the system for 5+ days, that the decay restores their fairshare over time.
Hi Team Awaiting your response
Hi Sanjiv, Looking at the sshare output, with 1 share each, user1 and user2 are setup for equal usage within the account. If you have other users in the account with more shares, they can have higher usage. These shares are relative to other users in the account and independent of the shares set on the account. https://slurm.schedmd.com/fair_tree.html#algorithm Configuring PriorityDecayHalfLife=1-00 (one day) will cause an idle user's historical usage contribution to decay by 96.9% over 5 days, so that's a good place to start. If you want to get closer to 0 after 5 days, you can use shorter half-life values: PriorityDecayHalfLife=20:00:00 (98.4% decay) PriorityDecayHalfLife=17:00:00 (99.2% decay) PriorityDecayHalfLife=15:00:00 (99.6% decay) Due to the nature of the half-life calculation, usage will never decrease to absolute zero in a fixed time, but you can get arbitrarily close. Regards, Michael
Hi Michael, We have more than 80 users in HPC Cluster and want to give equal resource to them , please let me know best configuration.. sshare -a Account User RawShares NormShares RawUsage EffectvUsage FairShare -------------------- ---------- ---------- ----------- ----------- ------------- ---------- root 0.000000 354150 1.000000 root root 1 0.000999 0 0.000001 1.000000 hpcusers 1000 0.999001 354149 0.999999 hpcusers aatowell 1 0.014286 0 0.000000 0.985915 hpcusers adm_anair5 1 0.014286 0 0.000000 0.985915 hpcusers adm_bchal+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_bmaha+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_cchan+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_cvija+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_dpuru+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_hlokh+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_ksath+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_mdeva+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_mkali+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_mmill+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_odavid 1 0.014286 0 0.000000 0.985915 hpcusers adm_pjacob 1 0.014286 0 0.000000 0.985915 hpcusers adm_psamr+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_rbhuy+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_rjoth+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_rkish+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_rtrip+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_srath+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_ssank+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_tcruz2 1 0.014286 0 0.000000 0.985915 hpcusers adm_tmoug+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_tsubr+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_vpare+ 1 0.014286 0 0.000000 0.985915 hpcusers adm_vthan+ 1 0.014286 0 0.000000 0.985915 hpcusers asenthilk+ 1 0.014286 1778 0.005021 0.056338 hpcusers cjamieson 1 0.014286 0 0.000000 0.985915 hpcusers cpham6 1 0.014286 0 0.000000 0.985915 hpcusers dkang 1 0.014286 0 0.000000 0.985915 hpcusers dkang1 1 0.014286 0 0.000000 0.985915 hpcusers dkeri 1 0.014286 0 0.000000 0.985915 hpcusers frobles 1 0.014286 0 0.000000 0.985915 hpcusers gcuellarp+ 1 0.014286 0 0.000000 0.985915 hpcusers gdegaga 1 0.014286 0 0.000000 0.985915 hpcusers gpvelevat+ 1 0.014286 0 0.000000 0.985915 hpcusers gsctccs_s+ 1 0.014286 0 0.000000 0.985915 hpcusers hxu1 1 0.014286 0 0.000000 0.985915 hpcusers ihernande+ 1 0.014286 0 0.000000 0.985915 hpcusers isavchenko 1 0.014286 0 0.000000 0.985915 hpcusers jbigness 1 0.014286 0 0.000000 0.112676 hpcusers jfisher2 1 0.014286 0 0.000000 0.985915 hpcusers jperry01 1 0.014286 0 0.000000 0.985915 hpcusers kanapindi 1 0.014286 0 0.000000 0.985915 hpcusers kliu6 1 0.014286 0 0.000000 0.985915 hpcusers knishikawa 1 0.014286 0 0.000000 0.985915 hpcusers lxu5 1 0.014286 0 0.000000 0.985915 hpcusers mjacksonw+ 1 0.014286 0 0.000000 0.985915 hpcusers mkhanam 1 0.014286 0 0.000000 0.985915 hpcusers mmiller10 1 0.014286 134 0.000379 0.070423 hpcusers mwalker5 1 0.014286 0 0.000000 0.985915 hpcusers nupalanchi 1 0.014286 0 0.000000 0.985915 hpcusers odavid 1 0.014286 0 0.000000 0.098592 hpcusers pgupta11 1 0.014286 0 0.000000 0.985915 hpcusers pseshadri 1 0.014286 0 0.000000 0.985915 hpcusers rferrao 1 0.014286 0 0.000000 0.985915 hpcusers rgautam1 1 0.014286 0 0.000000 0.985915 hpcusers rli3 1 0.014286 0 0.000000 0.985915 hpcusers scrch_sm_+ 1 0.014286 0 0.000000 0.985915 hpcusers sjayarama+ 1 0.014286 0 0.000000 0.985915 hpcusers spack 1 0.014286 224926 0.635116 0.014085 hpcusers stiwari7 1 0.014286 0 0.000000 0.985915 hpcusers tcruz2 1 0.014286 0 0.000001 0.084507 hpcusers tmoughamer 1 0.014286 121550 0.343218 0.028169 hpcusers tsubraman+ 1 0.014286 0 0.000000 0.985915 hpcusers vmrapnm 1 0.014286 0 0.000000 0.985915 hpcusers vparekh1 1 0.014286 0 0.000000 0.985915 hpcusers vraut1 1 0.014286 0 0.000000 0.985915 hpcusers vthangaraj 1 0.014286 5760 0.016266 0.042254 hpcusers yhuang7 1 0.014286 0 0.000000 0.985915
Sanjiv, Your current configuration is correct for giving each user in hpcusers the same fairshare priority. All users will have the same fairshare value initially. As users accumulate usage, their fairshare values will decrease, but over time the impact of that historical usage will decay and bring their fairshare values back up. Michael
Created attachment 40917 [details] share
Hi Michael, Thanks for confirmation.. We have made the change PriorityDecayHalfLife=1-00 (one day) as per you suggestion but we are still not able to understand the decay calculations. could you please share the calculation formula for it? sshare -a Account User RawShares NormShares RawUsage EffectvUsage FairShare -------------------- ---------- ---------- ----------- ----------- ------------- ---------- root 0.000000 293090 1.000000 root root 1 0.000999 49291 0.168177 0.014085 hpcusers 1000 0.999001 243799 0.831823 hpcusers aatowell 1 0.014286 0 0.000000 1.000000 hpcusers adm_anair5 1 0.014286 0 0.000000 1.000000 hpcusers adm_bchal+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_bmaha+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_cchan+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_cvija+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_dpuru+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_hlokh+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_ksath+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_mdeva+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_mkali+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_mmill+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_odavid 1 0.014286 0 0.000000 1.000000 hpcusers adm_pjacob 1 0.014286 0 0.000000 1.000000 hpcusers adm_psamr+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_rbhuy+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_rjoth+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_rkish+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_rtrip+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_srath+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_ssank+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_tcruz2 1 0.014286 0 0.000000 1.000000 hpcusers adm_tmoug+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_tsubr+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_vpare+ 1 0.014286 0 0.000000 1.000000 hpcusers adm_vthan+ 1 0.014286 0 0.000000 1.000000 hpcusers asenthilk+ 1 0.014286 155 0.000639 0.084507 hpcusers cjamieson 1 0.014286 0 0.000000 1.000000 hpcusers cpham6 1 0.014286 0 0.000000 1.000000 hpcusers dkang 1 0.014286 0 0.000000 1.000000 hpcusers dkang1 1 0.014286 0 0.000000 1.000000 hpcusers dkeri 1 0.014286 0 0.000000 1.000000 hpcusers frobles 1 0.014286 0 0.000000 1.000000 hpcusers gcuellarp+ 1 0.014286 0 0.000000 1.000000 hpcusers gdegaga 1 0.014286 0 0.000000 1.000000 hpcusers gpvelevat+ 1 0.014286 0 0.000000 1.000000 hpcusers gsctccs_s+ 1 0.014286 0 0.000000 1.000000 hpcusers hxu1 1 0.014286 0 0.000000 1.000000 hpcusers ihernande+ 1 0.014286 0 0.000000 1.000000 hpcusers isavchenko 1 0.014286 0 0.000000 1.000000 hpcusers jbigness 1 0.014286 0 0.000000 0.126761 hpcusers jfisher2 1 0.014286 0 0.000000 1.000000 hpcusers jperry01 1 0.014286 0 0.000000 1.000000 hpcusers kanapindi 1 0.014286 0 0.000000 1.000000 hpcusers kliu6 1 0.014286 0 0.000000 1.000000 hpcusers knishikawa 1 0.014286 0 0.000000 1.000000 hpcusers lxu5 1 0.014286 0 0.000000 1.000000 hpcusers mjacksonw+ 1 0.014286 0 0.000000 1.000000 hpcusers mkhanam 1 0.014286 0 0.000000 1.000000 hpcusers mmiller10 1 0.014286 106143 0.435370 0.028169 hpcusers mwalker5 1 0.014286 0 0.000000 1.000000 hpcusers nupalanchi 1 0.014286 0 0.000000 1.000000 hpcusers odavid 1 0.014286 0 0.000000 0.112676 hpcusers pgupta11 1 0.014286 0 0.000000 1.000000 hpcusers pseshadri 1 0.014286 0 0.000000 1.000000 hpcusers rferrao 1 0.014286 0 0.000000 1.000000 hpcusers rgautam1 1 0.014286 0 0.000000 1.000000 hpcusers rli3 1 0.014286 0 0.000000 1.000000 hpcusers scrch_sm_+ 1 0.014286 0 0.000000 1.000000 hpcusers sjayarama+ 1 0.014286 0 0.000000 1.000000 hpcusers spack 1 0.014286 61947 0.254092 0.056338 hpcusers stiwari7 1 0.014286 0 0.000000 1.000000 hpcusers tcruz2 1 0.014286 0 0.000000 0.098592 hpcusers tmoughamer 1 0.014286 75038 0.307789 0.042254 hpcusers tsubraman+ 1 0.014286 0 0.000000 1.000000 hpcusers vmrapnm 1 0.014286 0 0.000000 1.000000 hpcusers vparekh1 1 0.014286 0 0.000000 1.000000 hpcusers vraut1 1 0.014286 0 0.000000 1.000000 hpcusers vthangaraj 1 0.014286 514 0.002110 0.070423 hpcusers yhuang7 1 0.014286 0 0.000000 1.000000
Hi Michael, I am posting you faire share configuration details, please look into this let me know if anything needs to change here. ## TIMERS SlurmctldTimeout=300 SlurmdTimeout=300 InactiveLimit=0 MinJobAge=300 KillWait=30 Waittime=0 MessageTimeout=60 ## SCHEDULING SchedulerType=sched/backfill SelectType=select/cons_tres SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK PriorityType=priority/multifactor PriorityDecayHalfLife=1-00 PriorityUsageResetPeriod=Monthly # Fairshare Factor PriorityWeightFairshare=100000 PriorityWeightAge=50000 PriorityMaxAge=14-0 PriorityFavorSmall=NO PriorityWeightJobSize=20000 PriorityWeightPartition=10000 PriorityWeightTRES=CPU=1000,Mem=2000,GRES/gpu=3000 DefMemPerCPU=2000 GresTypes=gpu #
Sanjiv, Your fairshare configuration settings look good for the goal you have described: users in the hpcusers account will have the same fairshare priority by default, and most of the impact of historical usage will decay (dropping by about 97%) over 5 days. The decay calculation simply reduces the impact of historical usage by half for each `PriorityDecayHalfLife` period of time that passes. You can see this reflected in the changing `RawUsage` values reported by sshare over time. In your sshare output, the users with effective usage of 0.000000 and lower fairshare values probably have effective usage of more than zero, but values small enough to be hidden by the display being limited to 6 decimal places. Also note that each user's fairshare value isn't directly based on the effective usage, but on their usage relative to other users in the same account, so a small effective usage can still have an impact on the final fairshare value. The fairshare calculation is described from an individual user's perspective here: https://slurm.schedmd.com/fair_tree.html#enduser Michael
Hi Sajiv, Just checking whether the above info has addressed your questions. Michael
Hi Micheal, I have gone through the document "https://slurm.schedmd.com/fair_tree.html#enduser" and found it quite complicated. I would really appreciate your help in calculating the fair share value for at least one user. Once we have a clear understanding of this, we can follow the same practice for all users. Thank you in advance for your assistance. Account User RawShares NormShares RawUsage EffectvUsage FairShare -------------------- ---------- ---------- ----------- ----------- ------------- ---------- root 0.000000 26928 1.000000 root root 1 0.000999 0 0.000000 1.000000 hpcusers 1000 0.999001 26928 1.000000 hpcusers aatowell 10 0.070423 0 0.000000 0.985915 hpcusers adm_anair5 1 0.007042 0 0.000000 0.985915 hpcusers adm_bchal+ 10 0.070423 0 0.000000 0.985915 hpcusers adm_bmaha+ 1 0.007042 0 0.000000 0.985915 hpcusers adm_cchan+ 10 0.070423 0 0.000000 0.985915 hpcusers mmiller10 1 0.007042 10391 0.385879 0.028169 hpcusers spack 1 0.007042 16292 0.605043 0.014085 Sanjiv
Sanjiv, A user's FairShare value is just the user's rank among its siblings divided by the number of siblings. For example, in the sshare output in comment 5, there are 62 users tied for top ranking (70/71 = 0.985915). In rank order (and thus FairShare order) below that group of users are: jbigness - 8/71 = 0.112676 odavid - 7/71 = 0.098592 tcruz2 - 6/71 = 0.084507 mmiller10 - 5/71 = 0.070423 asenthilk+ - 4/71 = 0.056338 vthangaraj - 3/71 = 0.042254 tmoughamer - 2/71 = 0.028169 spack - 1/71 = 0.014085 The user ranking is determined by the LevelFS value which is visible in the output of `sshare -l`. The LevelFS value is normalized shares divided by effective (normalized) usage. In the sshare output in the description of this ticket: user1 has LevelFS = 0.014286 / 0.000058 = ~244.882215 user2 has LevelFS = 0.014286 / 0.004369 = ~3.263630 The calculations using the displayed values don't work out exactly due to truncation. The LevelFS values are meaningful only among siblings, but they do answer the question why a particular user has a lower or higher ranking (and FairShare value) than a sibling. Here it looks like user1 has ranking 7/71 = 0.098592 and user2 has ranking 3/71 = 0.042254. Michael
Hi Michael, Thank you for your detailed reply. We have another query: How can the decay function restore shares, and how do we adjust the weights to ensure they are restored within 5 days? Best Regards, Sanjiv
Sanjiv, The RawShares and NormShares values per association won't be affected by decay, it's just the RawUsage numbers that you'll see decrease over time. In the absence of new utilization, RawUsage decreases by half for each `PriorityDecayHalfLife` period of time. With PriorityDecayHalfLife=1-00, RawUsage will decrease by about 97% over 5 days. If you want to get closer to 100% in 5 days, you can use shorter half-life values as noted in comment 4. Michael