Ticket 22138 - User wise Fairshare configuration
Summary: User wise Fairshare configuration
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 23.11.4
Hardware: Linux Linux
: 2 - High Impact
Assignee: Michael Steed
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2025-02-19 05:13 MST by Sanjiv
Modified: 2025-03-13 11:45 MDT (History)
1 user (show)

See Also:
Site: Gilead Science
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
share (78.51 KB, image/png)
2025-02-25 05:09 MST, Sanjiv
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Sanjiv 2025-02-19 05:13:06 MST
The behavior we are looking for is that each user gets an equal share and that the halflife on their usage decays such that they should recover (or nearly recover) all their shares after 5 days of inactivity.
 
What I see now is that we have 1000 raw shares allocated but each user has just 2 raw share allocated (should that be 10?) current effective shares don't make sense to me nor does the effective Usage. for example:


Account                    User  RawShares  NormShares    RawUsage   NormUsage  EffectvUsage  FairShare    LevelFS-------------------- ---------- ---------- ----------- ----------- ----------- ------------- ---------- ----------root                                          0.000000       12060                  1.000000                                            root                      root          1    0.000999          23    0.001909      0.001909   0.014085   0.523278 hpcusers                             1000    0.999001       12037    0.998091      0.998091              1.000912  hpcusers                user1          1    0.014286           0    0.000058      0.000058   0.098592 244.882215  hpcusers                user2          1    0.014286          52    0.004369      0.004377   0.042254   3.263630
In particular I don't see how / where the decay function is applied to restore a users usage.    
The intent is that for any user who hasn't used the system for 5+ days, that the decay restores their fairshare over time.
Comment 2 Sanjiv 2025-02-20 04:38:29 MST
Hi Team

Awaiting your response
Comment 4 Michael Steed 2025-02-20 12:00:42 MST
Hi Sanjiv,

Looking at the sshare output, with 1 share each, user1 and user2 are setup for equal usage within the account. If you have other users in the account with more shares, they can have higher usage. These shares are relative to other users in the account and independent of the shares set on the account.

https://slurm.schedmd.com/fair_tree.html#algorithm

Configuring PriorityDecayHalfLife=1-00 (one day) will cause an idle user's historical usage contribution to decay by 96.9% over 5 days, so that's a good place to start. If you want to get closer to 0 after 5 days, you can use shorter half-life values:

PriorityDecayHalfLife=20:00:00 (98.4% decay)
PriorityDecayHalfLife=17:00:00 (99.2% decay)
PriorityDecayHalfLife=15:00:00 (99.6% decay)

Due to the nature of the half-life calculation, usage will never decrease to absolute zero in a fixed time, but you can get arbitrarily close.

Regards,
Michael
Comment 5 Sanjiv 2025-02-21 05:52:55 MST
Hi Michael, 

We have more than 80 users in HPC Cluster and want to give equal resource to them , please let me know best configuration..

sshare -a
Account                    User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare
-------------------- ---------- ---------- ----------- ----------- ------------- ----------
root                                          0.000000      354150      1.000000
 root                      root          1    0.000999           0      0.000001   1.000000
 hpcusers                             1000    0.999001      354149      0.999999
  hpcusers             aatowell          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_anair5          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_bchal+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_bmaha+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_cchan+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_cvija+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_dpuru+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_hlokh+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_ksath+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_mdeva+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_mkali+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_mmill+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_odavid          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_pjacob          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_psamr+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_rbhuy+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_rjoth+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_rkish+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_rtrip+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_srath+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_ssank+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_tcruz2          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_tmoug+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_tsubr+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_vpare+          1    0.014286           0      0.000000   0.985915
  hpcusers           adm_vthan+          1    0.014286           0      0.000000   0.985915
  hpcusers           asenthilk+          1    0.014286        1778      0.005021   0.056338
  hpcusers            cjamieson          1    0.014286           0      0.000000   0.985915
  hpcusers               cpham6          1    0.014286           0      0.000000   0.985915
  hpcusers                dkang          1    0.014286           0      0.000000   0.985915
  hpcusers               dkang1          1    0.014286           0      0.000000   0.985915
  hpcusers                dkeri          1    0.014286           0      0.000000   0.985915
  hpcusers              frobles          1    0.014286           0      0.000000   0.985915
  hpcusers           gcuellarp+          1    0.014286           0      0.000000   0.985915
  hpcusers              gdegaga          1    0.014286           0      0.000000   0.985915
  hpcusers           gpvelevat+          1    0.014286           0      0.000000   0.985915
  hpcusers           gsctccs_s+          1    0.014286           0      0.000000   0.985915
  hpcusers                 hxu1          1    0.014286           0      0.000000   0.985915
  hpcusers           ihernande+          1    0.014286           0      0.000000   0.985915
  hpcusers           isavchenko          1    0.014286           0      0.000000   0.985915
  hpcusers             jbigness          1    0.014286           0      0.000000   0.112676
  hpcusers             jfisher2          1    0.014286           0      0.000000   0.985915
  hpcusers             jperry01          1    0.014286           0      0.000000   0.985915
  hpcusers            kanapindi          1    0.014286           0      0.000000   0.985915
  hpcusers                kliu6          1    0.014286           0      0.000000   0.985915
  hpcusers           knishikawa          1    0.014286           0      0.000000   0.985915
  hpcusers                 lxu5          1    0.014286           0      0.000000   0.985915
  hpcusers           mjacksonw+          1    0.014286           0      0.000000   0.985915
  hpcusers              mkhanam          1    0.014286           0      0.000000   0.985915
  hpcusers            mmiller10          1    0.014286         134      0.000379   0.070423
  hpcusers             mwalker5          1    0.014286           0      0.000000   0.985915
  hpcusers           nupalanchi          1    0.014286           0      0.000000   0.985915
  hpcusers               odavid          1    0.014286           0      0.000000   0.098592
  hpcusers             pgupta11          1    0.014286           0      0.000000   0.985915
  hpcusers            pseshadri          1    0.014286           0      0.000000   0.985915
  hpcusers              rferrao          1    0.014286           0      0.000000   0.985915
  hpcusers             rgautam1          1    0.014286           0      0.000000   0.985915
  hpcusers                 rli3          1    0.014286           0      0.000000   0.985915
  hpcusers           scrch_sm_+          1    0.014286           0      0.000000   0.985915
  hpcusers           sjayarama+          1    0.014286           0      0.000000   0.985915
  hpcusers                spack          1    0.014286      224926      0.635116   0.014085
  hpcusers             stiwari7          1    0.014286           0      0.000000   0.985915
  hpcusers               tcruz2          1    0.014286           0      0.000001   0.084507
  hpcusers           tmoughamer          1    0.014286      121550      0.343218   0.028169
  hpcusers           tsubraman+          1    0.014286           0      0.000000   0.985915
  hpcusers              vmrapnm          1    0.014286           0      0.000000   0.985915
  hpcusers             vparekh1          1    0.014286           0      0.000000   0.985915
  hpcusers               vraut1          1    0.014286           0      0.000000   0.985915
  hpcusers           vthangaraj          1    0.014286        5760      0.016266   0.042254
  hpcusers              yhuang7          1    0.014286           0      0.000000   0.985915
Comment 6 Michael Steed 2025-02-21 09:41:09 MST
Sanjiv,

Your current configuration is correct for giving each user in hpcusers the same fairshare priority. All users will have the same fairshare value initially. As users accumulate usage, their fairshare values will decrease, but over time the impact of that historical usage will decay and bring their fairshare values back up.

Michael
Comment 7 Sanjiv 2025-02-25 05:09:53 MST
Created attachment 40917 [details]
share
Comment 8 Sanjiv 2025-02-25 05:15:03 MST
Hi Michael,

Thanks for confirmation..

We have made the change PriorityDecayHalfLife=1-00 (one day) as per you suggestion but we are still not able to understand the decay calculations.

could you please share the calculation formula for it?



sshare -a
Account                    User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare
-------------------- ---------- ---------- ----------- ----------- ------------- ----------
root                                          0.000000      293090      1.000000
 root                      root          1    0.000999       49291      0.168177   0.014085
 hpcusers                             1000    0.999001      243799      0.831823
  hpcusers             aatowell          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_anair5          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_bchal+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_bmaha+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_cchan+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_cvija+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_dpuru+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_hlokh+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_ksath+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_mdeva+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_mkali+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_mmill+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_odavid          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_pjacob          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_psamr+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_rbhuy+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_rjoth+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_rkish+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_rtrip+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_srath+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_ssank+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_tcruz2          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_tmoug+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_tsubr+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_vpare+          1    0.014286           0      0.000000   1.000000
  hpcusers           adm_vthan+          1    0.014286           0      0.000000   1.000000
  hpcusers           asenthilk+          1    0.014286         155      0.000639   0.084507
  hpcusers            cjamieson          1    0.014286           0      0.000000   1.000000
  hpcusers               cpham6          1    0.014286           0      0.000000   1.000000
  hpcusers                dkang          1    0.014286           0      0.000000   1.000000
  hpcusers               dkang1          1    0.014286           0      0.000000   1.000000
  hpcusers                dkeri          1    0.014286           0      0.000000   1.000000
  hpcusers              frobles          1    0.014286           0      0.000000   1.000000
  hpcusers           gcuellarp+          1    0.014286           0      0.000000   1.000000
  hpcusers              gdegaga          1    0.014286           0      0.000000   1.000000
  hpcusers           gpvelevat+          1    0.014286           0      0.000000   1.000000
  hpcusers           gsctccs_s+          1    0.014286           0      0.000000   1.000000
  hpcusers                 hxu1          1    0.014286           0      0.000000   1.000000
  hpcusers           ihernande+          1    0.014286           0      0.000000   1.000000
  hpcusers           isavchenko          1    0.014286           0      0.000000   1.000000
  hpcusers             jbigness          1    0.014286           0      0.000000   0.126761
  hpcusers             jfisher2          1    0.014286           0      0.000000   1.000000
  hpcusers             jperry01          1    0.014286           0      0.000000   1.000000
  hpcusers            kanapindi          1    0.014286           0      0.000000   1.000000
  hpcusers                kliu6          1    0.014286           0      0.000000   1.000000
  hpcusers           knishikawa          1    0.014286           0      0.000000   1.000000
  hpcusers                 lxu5          1    0.014286           0      0.000000   1.000000
  hpcusers           mjacksonw+          1    0.014286           0      0.000000   1.000000
  hpcusers              mkhanam          1    0.014286           0      0.000000   1.000000
  hpcusers            mmiller10          1    0.014286      106143      0.435370   0.028169
  hpcusers             mwalker5          1    0.014286           0      0.000000   1.000000
  hpcusers           nupalanchi          1    0.014286           0      0.000000   1.000000
  hpcusers               odavid          1    0.014286           0      0.000000   0.112676
  hpcusers             pgupta11          1    0.014286           0      0.000000   1.000000
  hpcusers            pseshadri          1    0.014286           0      0.000000   1.000000
  hpcusers              rferrao          1    0.014286           0      0.000000   1.000000
  hpcusers             rgautam1          1    0.014286           0      0.000000   1.000000
  hpcusers                 rli3          1    0.014286           0      0.000000   1.000000
  hpcusers           scrch_sm_+          1    0.014286           0      0.000000   1.000000
  hpcusers           sjayarama+          1    0.014286           0      0.000000   1.000000
  hpcusers                spack          1    0.014286       61947      0.254092   0.056338
  hpcusers             stiwari7          1    0.014286           0      0.000000   1.000000
  hpcusers               tcruz2          1    0.014286           0      0.000000   0.098592
  hpcusers           tmoughamer          1    0.014286       75038      0.307789   0.042254
  hpcusers           tsubraman+          1    0.014286           0      0.000000   1.000000
  hpcusers              vmrapnm          1    0.014286           0      0.000000   1.000000
  hpcusers             vparekh1          1    0.014286           0      0.000000   1.000000
  hpcusers               vraut1          1    0.014286           0      0.000000   1.000000
  hpcusers           vthangaraj          1    0.014286         514      0.002110   0.070423
  hpcusers              yhuang7          1    0.014286           0      0.000000   1.000000
Comment 9 Sanjiv 2025-02-25 09:14:45 MST
Hi Michael,

I am posting you faire share configuration details, please look into this let me know if anything needs to change here.

## TIMERS
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
MessageTimeout=60
## SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK
PriorityType=priority/multifactor
PriorityDecayHalfLife=1-00
PriorityUsageResetPeriod=Monthly
# Fairshare Factor
PriorityWeightFairshare=100000
PriorityWeightAge=50000
PriorityMaxAge=14-0
PriorityFavorSmall=NO
PriorityWeightJobSize=20000
PriorityWeightPartition=10000
PriorityWeightTRES=CPU=1000,Mem=2000,GRES/gpu=3000
DefMemPerCPU=2000
GresTypes=gpu
#
Comment 14 Michael Steed 2025-02-25 17:53:53 MST
Sanjiv,

Your fairshare configuration settings look good for the goal you have described: users in the hpcusers account will have the same fairshare priority by default, and most of the impact of historical usage will decay (dropping by about 97%) over 5 days.

The decay calculation simply reduces the impact of historical usage by half for each `PriorityDecayHalfLife` period of time that passes. You can see this reflected in the changing `RawUsage` values reported by sshare over time.

In your sshare output, the users with effective usage of 0.000000 and lower fairshare values probably have effective usage of more than zero, but values small enough to be hidden by the display being limited to 6 decimal places. Also note that each user's fairshare value isn't directly based on the effective usage, but on their usage relative to other users in the same account, so a small effective usage can still have an impact on the final fairshare value.

The fairshare calculation is described from an individual user's perspective here:

https://slurm.schedmd.com/fair_tree.html#enduser

Michael
Comment 15 Michael Steed 2025-03-03 15:58:25 MST
Hi Sajiv,

Just checking whether the above info has addressed your questions.

Michael
Comment 16 Sanjiv 2025-03-06 08:04:50 MST
Hi Micheal,


I have gone through the document "https://slurm.schedmd.com/fair_tree.html#enduser" and found it quite complicated. I would really appreciate your help in calculating the fair share value for at least one user. Once we have a clear understanding of this, we can follow the same practice for all users.
 
Thank you in advance for your assistance.


Account                    User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare 
-------------------- ---------- ---------- ----------- ----------- ------------- ---------- 
root                                          0.000000       26928      1.000000            
 root                      root          1    0.000999           0      0.000000   1.000000 
 hpcusers                             1000    0.999001       26928      1.000000            
  hpcusers             aatowell         10    0.070423           0      0.000000   0.985915 
  hpcusers           adm_anair5          1    0.007042           0      0.000000   0.985915 
  hpcusers           adm_bchal+         10    0.070423           0      0.000000   0.985915 
  hpcusers           adm_bmaha+          1    0.007042           0      0.000000   0.985915 
  hpcusers           adm_cchan+         10    0.070423           0      0.000000   0.985915
hpcusers            mmiller10          1    0.007042       10391      0.385879   0.028169 
hpcusers                spack          1    0.007042       16292      0.605043   0.014085 


Sanjiv
Comment 17 Michael Steed 2025-03-06 11:06:15 MST
Sanjiv,

A user's FairShare value is just the user's rank among its siblings divided by the number of siblings. For example, in the sshare output in comment 5, there are 62 users tied for top ranking (70/71 = 0.985915). In rank order (and thus FairShare order) below that group of users are:

jbigness   - 8/71 = 0.112676
odavid     - 7/71 = 0.098592
tcruz2     - 6/71 = 0.084507
mmiller10  - 5/71 = 0.070423
asenthilk+ - 4/71 = 0.056338
vthangaraj - 3/71 = 0.042254
tmoughamer - 2/71 = 0.028169
spack      - 1/71 = 0.014085

The user ranking is determined by the LevelFS value which is visible in the output of `sshare -l`. 

The LevelFS value is normalized shares divided by effective (normalized) usage. In the sshare output in the description of this ticket:

user1 has LevelFS = 0.014286 / 0.000058 = ~244.882215
user2 has LevelFS = 0.014286 / 0.004369 = ~3.263630

The calculations using the displayed values don't work out exactly due to truncation.

The LevelFS values are meaningful only among siblings, but they do answer the question why a particular user has a lower or higher ranking (and FairShare value) than a sibling.

Here it looks like user1 has ranking 7/71 = 0.098592 and user2 has ranking 3/71 = 0.042254.

Michael
Comment 18 Sanjiv 2025-03-13 09:36:20 MDT
Hi Michael,

Thank you for your detailed reply.

We have another query: How can the decay function restore shares, and how do we adjust the weights to ensure they are restored within 5 days?

Best Regards,
Sanjiv
Comment 19 Michael Steed 2025-03-13 11:45:13 MDT
Sanjiv,

The RawShares and NormShares values per association won't be affected by decay, it's just the RawUsage numbers that you'll see decrease over time. In the absence of new utilization, RawUsage decreases by half for each `PriorityDecayHalfLife` period of time. With PriorityDecayHalfLife=1-00, RawUsage will decrease by about 97% over 5 days. If you want to get closer to 100% in 5 days, you can use shorter half-life values as noted in comment 4.

Michael