Ticket 5212

Summary: Hints is configuring our fair-share algorithm
Product: Slurm Reporter: hpc-cs-hd
Component: SchedulingAssignee: Director of Support <support>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 3 - Medium Impact    
Priority: ---    
Version: 17.11.5   
Hardware: Linux   
OS: Linux   
Site: Cineca Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: slurm.conf
First file - old configuration
Second file - new configuration

Description hpc-cs-hd 2018-05-25 08:59:46 MDT
Dear SchedMD,
here at CINECA we are struggling with trying to figure out how to set up the fair-share behaviour, so that it satisfies the kind of policy we want to implement on our HPC environment.

In our idea, we have a group of project "families" (for example, belonging to the same institute), and each family has a number of accounts for any single project. For any account we allocate a budget of CPU hours, and we want the fair-share algorithm to reflect the fact that accounts with more hours to spend have more shares to use. So we wrote a procedure that assigns to each project a number of RawShares proportional to the number of CPU hours they have to spend. While the project families act as a "father" to the accounts related to it, we want the fair-share to not take in consideration neither the relationship between father and son, nor the relationship between the siblings, because any account should have its own personal budget represented by its own personal number of shares.

However, we were not able to reproduce this situation for now. In our first attempt, we set up PriorityFlags=FAIR_TREE as you usually recommend. However, this resulted in an unwanted behaviour, since when an account comsumed some cpu hours, we saw a drop of the fair-share value even for accounts of the same family that haven't consumed anything yet.
We solved this by recurring at PriorityFlags=DEPTH_OBLIVIOUS. On your manual it is stated: "depth of the associations in the tree do not adversely effect their priority". We thought this was exactly what we wanted, and it actually solved the problem, but then we noticed another one.

Basically, we assign to all the fathers, i.e. the account "families", a symbolic fair-share of 1, because we don't really care about them, and then we give to the actual accounts the correct number for shares, expecting that the NormShares parameter are calculated accordingly. Instead, this is an example of what we have:

 family_A                        1    0.011236 
    A_001                      613    0.000389
    A_002                      815    0.000517  
    ...
    A_014                     1713    0.001087

 family_B                        1    0.011236
    B_001                     1584    0.000013
    B_002                     4000    0.000032
    ...
    B_582                     1229    0.000010

Basically, each family gets an equal slice of the cake (0.011236 is 1/89, as 89 is the number of families defined on the cluster), and such slice is divided between the accounts related to the same parent depending (we suppose) on the number of RawShares assigned to them. As a result, accounts of a small family have a bigger amount of NormShares than accounts of bigger families, regardless of the RawShares declared. Instead, we want the RawShares number to represent the individual shares that any account has in relation to all the system, and the cake to be divided between all the single accounts with no interest to the family they belong from.

First of all, we would like to ask you if this is the behaviour we should expect, even with DEPTH_OBLIVIOUS enabled (we expected from it to eliminate all the parental relationships between accounts on the calculation of NormShares). Secondly, about a strategy to recreate the situation I was trying to describe. I can see two ways of acting:

1) Removing all relationship and treat all accounts as siblings of an unique root father. This may be unwanted from us, as the division of accounts in families is useful to us also for other things other than fair-sharing (for example, for statistics about the usage of the cluster) and in any case is the most logical way for us to think about the account hierarchies. However, it will still be considered as a last resort;
2) Providing to any father a number of RawShares equal (or proportional) to the sum of the shares of their sons, instead of the symbolic number "1" equal for all. This should then assign the correct number of NormShares to every account, that should be now proportional to the individual RawShares assigned to each of them. Sons of large families will have a share that depends not from the number of siblings, but from the total number of shares assigned to its own family, and the global proportion should be then recreated. We are currently working towards this solution, while trying to figure out how many shares can be given to a single account or family (is there an upper limit?).

So, we would like your two cents about our situation. Have you ever had any other user that wanted to recreate a situation similar to ours, and in which direction would you go to represent it at best as it is possible?

Thank you for reading all trough that and for any eventual answer.
Regards,
Alessandro Marani - HPC User Support - CINECA
Comment 2 Isaac Hartung 2018-05-29 10:14:08 MDT
Hi Alessandro,

I'm looking over your stuff and am preparing an answer for you.

--Isaac
Comment 3 hpc-cs-hd 2018-05-30 04:00:13 MDT
(In reply to Isaac Hartung from comment #2)
> Hi Alessandro,
> 
> I'm looking over your stuff and am preparing an answer for you.
> 
> --Isaac

Thank you Isaac,
please don't hesitate to ask any additional information that you think may prove useful.

Alessandro
Comment 4 Isaac Hartung 2018-05-31 11:33:54 MDT
Could you attach your slurm.conf?
Comment 5 hpc-cs-hd 2018-06-01 04:13:12 MDT
Created attachment 6975 [details]
slurm.conf
Comment 6 hpc-cs-hd 2018-06-01 04:13:50 MDT
(In reply to Isaac Hartung from comment #4)
> Could you attach your slurm.conf?

Sure, attached in the previous comment.
Comment 7 Isaac Hartung 2018-06-05 14:47:57 MDT
Seeing as your trees are all of the same depth, I can't imagine the depth_oblivious options would help as it is intended to "improve fairness between accounts in deep and/or irregular hierarchies".


setting the fairshare field on your child accounts to "parent" might interest you:

https://slurm.schedmd.com/resource_limits.html
Comment 8 Isaac Hartung 2018-06-26 11:06:03 MDT
Hi Alessandro,

Have you been able to try those changes?  Do you have any other concerns regarding fair-share?

Regards
Comment 9 hpc-cs-hd 2018-06-27 09:10:05 MDT
Dear Isaac,
sorry for not having replied before.

Unfortunately, setting the child account fairshare to parent won't help us. In my first message i forgot to mention that we already use parent: this is how the regular user fairshare is defined. In our user environment, we have that more users can belong to a single project (account). Such project has its own share, and all users have to consume it with the same (local) priority, that goes down for all the users of the same project if one of them submit some jobs accounted on it. On the other hand, the single project is never related to any of the others, even if they have the same father. So the effect of the siblings in the fairshare calculation should be null.
So, we want the child account share to be the first considered, and therefore the users share is parent. This cannot be replicated for the "father account-son account" relationship, because if we do that only the father shares count, and this is not what we want (a project with many cpu hours to spend may see its jobs waiting for too long because a smaller project has consumed some of the fathers' shares).

Our idea is that SLURM fair-share system simply wasn't thought for the account priority model that we are trying to implement, so we have to force a bit our hands to find a way to make it behave as we want. In this moment we are trying to implement the second strategy i showed you in my first message, that is to assign to the fathers the sum of the shares of their sons instead of a symbolic "1", and to keep DEPTH_OBLIVIOUS so that such shares are redistributed to all accounts keeping the proportion with the effective dimension of the project itself. We are finding difficulties with writing the script that actually implements them, and our system administrators are also assigned to many other tasks so they don't have much time for this. Therefore this activity is currently slowed down and this is why I hadn't replied to you yet.

Finally, I thought of another way to rephrase our question. In your manual,  you explain the simplified Fair-Share formula which is quite individual, i.e. it takes into account only the fair-share factor, the normalized shares and usage, adn the dampening factor related to the single account. Then you explain the actual formula that adds to the mix all the parameters related fo father-son and sibling relationship, and I assume this is actually the formula used when computing the final fair-share value. So our question can be probably become: is there a way to substitute the complete fair-share formula with the simplified fair-share formula, so that parent relationships are not taken into consideration?

I will update you when we are able to implement our changes and i will report about its effects.
Thank you for reading and best regards,
Alessandro
Comment 10 Isaac Hartung 2018-07-12 13:59:03 MDT
>So our question can be probably become: is there a way to substitute the >complete fair-share formula with the simplified fair-share formula, so that >parent relationships are not taken into consideration?

No this is not possible.  Fair tree as outlined in the docs is the only fair share algorithm available.

On another note, if you are using DEPTH_OBLIVIOUS, you are already not engaging the FAIR_TREE calculations:

DEPTH_OBLIVIOUS
If set, priority will be calculated based similar to the normal multifactor calculation, but depth of the associations in the tree do not adversely effect their priority. [[[---This option precludes the use of FAIR_TREE.---]]] (emphasis added).
Comment 11 Isaac Hartung 2018-07-17 10:20:56 MDT
Hi Alessandro,

Seeing as you want to use DEPTH_OBLIVIOUS and it precludes FAIR_TREE, it seems we have addressed the issue raised by this ticket.  I am going to close this ticket, but should you have any further, related questions, please post them here and/or reopen this ticket.

Regards,

Isaac
Comment 12 hpc-cs-hd 2018-07-26 01:43:31 MDT

Dear Isaac,

Sorry that I'm back to you after so much time. I too agree that this ticket can be closed, but before that I would like to show you the results of our last update, in the hope it may be interesting for you as well.

There are also some things I would like to clarify about our original question.  First of all, we know that DEPTH_OBLIVIOUS is excluding FAIR_TREE effects. If you recall this excerpt from my first correspondance:
"In our first attempt, we set up PriorityFlags=FAIR_TREE as you usually recommend. However, this resulted in an unwanted behaviour, since when an account consumed some cpu hours, we saw a drop of the fair-share value even for accounts of the same family that haven't consumed anything yet. We solved this by recurring at PriorityFlags=DEPTH_OBLIVIOUS."

After that, for me the idea of further discussing FAIR_TREE effects was completely out of question, and in fact i have never mentioned it again once. The problem for us is that, after having eradicated some of effect of the account relationship by moving to DEPTH OBLIVIOUS, there were still some unwanted effects that have to do with relationship and that we didn't want either. I refer here to the problem of the fathers with the same amount of resources that are distributed among an uneven number of sons, so that fathers with a lot of children scatter the Norm_shares quota more sparsely than father with a few children. This was happening when DEPTH_OBLIVIOUS was already activated, and FAIR_TREE was off.

So our question has never been "Is it a good idea to activate DEPTH_OBLIVIOUS instead of FAIR_TREE?", but rather "given that we want to use DEPTH_OBLIVIOUS, how can we overcome this persisting unwanted behaviour?". I admit that the question of the simplified formula was kind of a long shot, but let's say that if it existed a third flag "NO_RELATIONS" we would take it.

So I am still of the idea that I expressed in my last mail:
"Our idea is that SLURM fair-share system simply wasn't thought for the account priority model that we are trying to implement, so we have to force a bit our hands to find a way to make it behave as we want."

That's what we did. If you remember, I told you that I would have come back to you after we were able to apply some changes in the calculation of our norm_shares. I attach two text files, that are the result of the command "sshare -l --format=account,rawshares,normshares". The file oldshares.txt has been produced before the change: every father account has a share of "1", so they all have the same amount of norm_shares because they all refer directly to root. Such amount is splitten among the sons, that vary in number from father to father: so you can see that fathers with few sons assign to them much bigger values of norm_shares than fathers of many sons. Each son has also its pre-assigned amount of raw_shares, and there is at least a respect of the proportions between the siblings on the assignment of norm_shares, i.e. accounts with many raw shares will get more norm shares than accounts of the same father and with less raw shares. But that is not enough: you can take as an example the "pra16" family, that are the biggest projects in our cluster and they have more raw shares than anyone else. The biggest son, pra16_4235, has norm_shares=0.001191, while a small account but with few siblings like sigu_tarta17_0 has norm_shares=0.008891, way more than the supposedly bigger one. You also have to imagine an undisplayed third layer of hierarchy that are the users assigned to each account, all set to fairshare=parent.

The second file, newshares.txt, has been produced after the change. What we did is to implement a script that sums the raw shares of each account belonging to the same family, and assigns that number to the father, instead of the default "1". In this way there is a proportion between fathers as well as between sons, so the global proportion is respected. For example, the father pra16 has now way more shares than the father sigu, and their sons behave accordingly: now pra16_4235 has norm_shares=0.008037 and sigu_tarta17_0 has norm_shares=0.000027. So mission (supposedly) accomplished.

There is still some work to do for us on tuning the fair-share (we need to find a Dampening Factor that reduces the consume of shares for each job and makes our model resemble more like a monthly budget linearization), but at least we have solved this issue. I wanted to share this with you, because it may be interesting for you to confront with users that have this kind of necessities when dealing with fair share, and you may think in the upcoming versions of SLURM to implement a solution for this kind of situation, handled by the scheduler itself.

Thank you for all the attention and the support.
Regards,
Alessandro
Comment 13 hpc-cs-hd 2018-07-26 01:44:01 MDT
Created attachment 7417 [details]
First file - old configuration
Comment 14 hpc-cs-hd 2018-07-26 01:44:26 MDT
Created attachment 7418 [details]
Second file - new configuration