| Summary: | What is "current_period" in fairshare calculation? | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Michael Schoenfelder <michael.schoenfelder> |
| Component: | Configuration | Assignee: | Ben Roberts <ben> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | marshall.adrian, vito.burggraf |
| Version: | 20.02.4 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | SiFive | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
|
Description
Michael Schoenfelder
2021-07-12 18:56:44 MDT
Hi Michael, The current period it is referring to is the period of the PriorityDecayHalfLife. It's saying to break out the usage into blocks of time that have a duration of the value defined for PriorityDecayHalfLife. That formula is showing that the usage in the current period isn't decayed, the previous period does have the usage decayed, the period prior to that is decayed twice as much, etc. The PriorityCalcPeriod defines how often the calculation of the half-life decay takes place. The PriorityUsageResetPeriod defines a period after which the usage of associations will be reset. To bring all three of these together in an example, assume you have the following defined in your slurm.conf: PriorityDecayHalfLife =1:00:00 PriorityCalcPeriod =10:00 PriorityUsageResetPeriod=daily This means that the usage for a certain user will have less impact on that user's overall FairShare value for each hour that passes. The FairShare value is recalculated every 10 minutes to account for usage that may not have been over an hour old during the last calculation. And every day at midnight the usage is reset to 0 so that the user starts with no prior usage the next day. If you have PriorityDecayHalfLife set to 0 this means that prior usage isn't decayed at all and would rely on your using PriorityUsageResetPeriod to reset the usage after a period of time of your choosing. Does that help clarify how these options are related to each other? If your current settings don't line up with how you want this to behave feel free to let me know and I can help you get it working the way you want. Thanks, Ben The formula says "D" is a decay factor between zero and one that delivers the half-life decay based off the PriorityDecayHalfLife. You are saying that the "period" in the formula is also the PriorityDecayHalfLife? We had our PriorityDecayHalfLife set to 24 hours, but felt that the fairshare had too much memory. We understand that is one measure of "fair", but users seem to prefer "fair" being more instantaneous than historical. We figured setting PriorityDecayHalfLife=0, would remove all the terms but the first one, Ucurrent_period. You are saying that "period" was also 24 hours when PriorityDecayHalfLife=24. If we set PriorityDecayHalfLife=0, then slurm uses PriorityUsageResetPeriod as the "Ucurrent_period" term. I think we had "never" for PriorityUsageResetPeriod, but when we set PriorityDecayHalfLife=0, slurm asked us to set a PriorityUsageResetPeriod. It seems a if the minimum is Daily, so that is what we used. It sounds like if we want even less history, then we should use a PriorityDecayHalfLife=1:00:00, which would make the reporting period 1 hour? I have seen fairshare scheme in other batch systems where if multiple people all have the same shares, and all have many pending jobs, and all jobs take about the same time to run, then the farm will run be running about the same number of jobs for each of the users. It acts as if there is no (or very little) history at all. We are looking for a smaller amount of history than 24 hours and figured zero would be the smallest possible amount. (I should probably go read the code; I haven't looked at slurm code yet. Can you tell me in what file this is implemented?) Hi Michael, The decay factor is derived from the decay halflife, so they are related. Here you can see the formula used to calculate the decay_factor with the decay_hl as part of the equation: decay_factor = 1 - (0.693 / decay_hl); This comes from the *_decay_thread function in the priority_multifactor code, which you can review here: https://github.com/SchedMD/slurm/blob/7fa5a332363056ba58780a48344e99d4419eb3c4/src/plugins/priority/multifactor/priority_multifactor.c#L1189 If you do want the amount of usage tracked to be smaller then you would want to set the PriorityDecayHalfLife to a small, non-zero value. Setting it to zero means the usage won't decay and will continue to be tracked until it is reset, daily in your case. Setting the decay half life to 1 hour seems like a good value to use. Let me know if you have any additional questions about this. Thanks, Ben Hi Michael, Did the information I sent answer your question about the priority decay factor? Let me know if you have additional questions or if this ticket is ok to close. Thanks, Ben Ben, you answered the question and can close the ticket. We're still playing around with the settings, but currently in a satisfied state. I'm glad to hear it. Let us know if anything else comes up. Thanks, Ben |