Ticket 6738

Summary:	Fairshare and partitions
Product:	Slurm	Reporter:	Davide Vanzo <davide.vanzo>
Component:	Accounting	Assignee:	Ben Roberts <ben>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	4 - Minor Issue
Priority:	---
Version:	18.08.4
Hardware:	Linux
OS:	Linux
Site:	Vanderbilt	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description Davide Vanzo 2019-03-21 14:51:06 MDT

Hello all,

I need a clarification regarding the fairshare distribution across multiple users in an account when associations to multiple partitions are present. In the sshare output below you see for example the fairshare of one of our accounts with its users. Does each association counts as two separate users even if the actual user is the same?
What I am trying to figure out is if this has any effect on the final caluclated fairshare.


>              Account       User    Partition  RawShares  NormShares    RawUsage  EffectvUsage  FairShare 
> -------------------- ---------- ------------ ---------- ----------- ----------- ------------- ---------- 
> neuert_lab_account                                   40    0.003642    33592782      0.017115   0.038483 
>  p_neuert_lab                                         1    0.003642    33592782      0.017115   0.038483 
>   p_neuert_lab         hughesjj   production          1    0.000182           0      0.000856   0.038483 
>   p_neuert_lab         hughesjj        debug          1    0.000182           0      0.000856   0.038483 
>   p_neuert_lab          jashnsh        debug          1    0.000182           0      0.000856   0.038483 
>   p_neuert_lab          jashnsh   production          1    0.000182    33357230      0.017001   0.000000 
>   p_neuert_lab         keslerbk        debug          1    0.000182           0      0.000856   0.038483 
>   p_neuert_lab         keslerbk   production          1    0.000182        3687      0.000858   0.038222 
>   p_neuert_lab          meyera1   production          1    0.000182      213521      0.000959   0.025966 
>   p_neuert_lab          meyera1        debug          1    0.000182           0      0.000856   0.038483 
>   p_neuert_lab          neuertg   production          1    0.000182           0      0.000856   0.038483 
>   p_neuert_lab          neuertg        debug          1    0.000182           0      0.000856   0.038483 
>   p_neuert_lab          singha4        debug          1    0.000182           0      0.000856   0.038483 
>   p_neuert_lab          singha4   production          1    0.000182           0      0.000856   0.038483 
>   p_neuert_lab         spielmcl        debug          1    0.000182           0      0.000856   0.038483 
>   p_neuert_lab         spielmcl   production          1    0.000182           0      0.000856   0.038483 
>   p_neuert_lab          thiemia        debug          1    0.000182           0      0.000856   0.038483 
>   p_neuert_lab          thiemia   production          1    0.000182           0      0.000856   0.038483 
>   p_neuert_lab         walkesd1   production          1    0.000182       18344      0.000865   0.037204 
>   p_neuert_lab         walkesd1        debug          1    0.000182           0      0.000856   0.038483 
>   p_neuert_lab           wangel        debug          1    0.000182           0      0.000856   0.038483 
>   p_neuert_lab           wangel   production          1    0.000182           0      0.000856   0.038483

Comment 1 Ben Roberts 2019-03-22 13:02:14 MDT

Hi Davide,

The multiple entries you are seeing for the users is due to them having multiple associations for the different partitions.  I would like to see how you have the associations defined for the users in this account.  Can I have you send the output of the following command:

sacctmgr show assoc tree account=neuert_lab_account,p_neuert_lab format=cluster,account,user,partition,share,qos

I would expect that jobs for the different partitions would have different priorities based on the fact that you see different "FairShare" values.  You can gain some insight into how the priority is calculated for pending jobs with the sprio command.  I'd like to make sure I'm testing the same scenario before I give a definitive answer though.  

Thanks,
Ben

Comment 2 Davide Vanzo 2019-03-22 14:01:59 MDT

Ben,

Here is the output you requested.

>    Cluster              Account       User  Partition     Share                  QOS 
> ---------- -------------------- ---------- ---------- --------- -------------------- 
>    vampire neuert_lab_account                                40               normal 
>    vampire  p_neuert_lab                                      1               normal 
>    vampire   p_neuert_lab         hughesjj production         1               normal 
>    vampire   p_neuert_lab         hughesjj      debug         1               normal 
>    vampire   p_neuert_lab          jashnsh      debug         1               normal 
>    vampire   p_neuert_lab          jashnsh production         1               normal 
>    vampire   p_neuert_lab         keslerbk      debug         1               normal 
>    vampire   p_neuert_lab         keslerbk production         1               normal 
>    vampire   p_neuert_lab          meyera1 production         1               normal 
>    vampire   p_neuert_lab          meyera1      debug         1               normal 
>    vampire   p_neuert_lab          neuertg production         1               normal 
>    vampire   p_neuert_lab          neuertg      debug         1               normal 
>    vampire   p_neuert_lab          singha4      debug         1               normal 
>    vampire   p_neuert_lab          singha4 production         1               normal 
>    vampire   p_neuert_lab         spielmcl      debug         1               normal 
>    vampire   p_neuert_lab         spielmcl production         1               normal 
>    vampire   p_neuert_lab          thiemia      debug         1               normal 
>    vampire   p_neuert_lab          thiemia production         1               normal 
>    vampire   p_neuert_lab         walkesd1 production         1               normal 
>    vampire   p_neuert_lab         walkesd1      debug         1               normal 
>    vampire   p_neuert_lab           wangel      debug         1               normal 
>    vampire   p_neuert_lab           wangel production         1               normal

Comment 3 Ben Roberts 2019-03-22 16:27:21 MDT

Hi Davide,

I setup a scenario similar to yours and can confirm that the jobs in the different partitions will have different priorities.  Here are the details of my test.  I created an account with a child account that a user was added to.  The user had one of two partitions specified for the association creation.

$ sacctmgr show assoc tree account=testacct,test_lab format=cluster,account,user,partition,share
   Cluster              Account       User  Partition     Share 
---------- -------------------- ---------- ---------- --------- 
   winston testacct                                          40 
   winston  test_lab                                          1 
   winston   test_lab                user1      debug         1 
   winston   test_lab                user1        gpu         1 



I then ran a job that occupied all the nodes on my cluster for a while to generate some usage.  I could see that the RawUsage reflected this and the FairShare value was adjusted accordingly.

$ sshare -Atest_lab,testacct -uuser1 --format=account,user,partition,rawshares,normshares,rawusage,effectvusage,fairshare
             Account       User    Partition  RawShares  NormShares    RawUsage  EffectvUsage  FairShare 
-------------------- ---------- ------------ ---------- ----------- ----------- ------------- ---------- 
testacct                                             40    0.909091      167935      0.130899            
 test_lab                                             1    1.000000      167935      1.000000            
  test_lab                user1        debug          1    0.250000      167935      1.000000   0.625000 
  test_lab                user1          gpu          1    0.250000           0      0.000000   0.718750 




Then I submitted two jobs that requested the same resources other than the partition and you can see the higher priority for the job in the gpu partition.

$ sbatch -Atest_lab -pdebug -N5 --exclusive -t 10:00 --wrap="srun sleep 600"
Submitted batch job 8193

$ sbatch -Atest_lab -pgpu -N5 --exclusive -t 10:00 --wrap="srun sleep 600"
Submitted batch job 8194

$ sprio
          JOBID PARTITION   PRIORITY        AGE  FAIRSHARE    JOBSIZE  PARTITION                 TRES
           8193 debug           7562          0       6250        271       1000               cpu=42
           8194 gpu             8499          0       7188        271       1000               cpu=42




I hope this helps.  Let me know if you have questions or if this is ok to close.

Thanks,
Ben

Comment 4 Davide Vanzo 2019-03-25 07:41:40 MDT

Ben,

Thank you for confirming this.
My initial question was on a different point though. I may not be fully clear and I apologize for that.

Let's take as an example the case of the p_neuert_lab account I sent you before. Right now all users have a maximum fairshare of 0.038483 for each association. If each user would be associated only with the production partition and not with the debug partition, would the priority be twice that value?
Granted, since every user has both associations, the maximum fairshare relative to the other users will remain the same. However the fairshare contribution to the priority with respect to age and size would be decreased and we should adjust the weight by a factor of two. And the other thing is that with usage the fairshare will approach zero much sooner than with a starting fairshare that is twice that value.

Comment 5 Ben Roberts 2019-03-25 10:12:13 MDT

Hi Davide,

> Thank you for confirming this.
> My initial question was on a different point though. I may not be fully clear 
> and I apologize for that.

No problem, I'm sorry I didn't quite get what you were asking the first time.

> Let's take as an example the case of the p_neuert_lab account I sent you 
> before. Right now all users have a maximum fairshare of 0.038483 for each 
> association. If each user would be associated only with the production 
> partition and not with the debug partition, would the priority be twice that 
> value?

If a user only had an association with one partition it wouldn't have twice the priority compared to users with an association for each partition.  Slurm will just split up the shares evenly among the existing associations.  

> Granted, since every user has both associations, the maximum fairshare 
> relative to the other users will remain the same. However the fairshare 
> contribution to the priority with respect to age and size would be decreased 
> and we should adjust the weight by a factor of two. And the other thing is 
> that with usage the fairshare will approach zero much sooner than with a 
> starting fairshare that is twice that value.

You're correct that the way to make things fair again for the users would be to set the number of shares to 2 for the user without an association for each partition.  If that user just has a single share then their fairshare priority will drop twice as fast as users who can spread their usage out over two partitions.  

Here's a quick example from my system.  I added user2 without specifying a partition for the association:

$ sacctmgr show assoc tree account=testacct,test_lab format=cluster,account,user,partition,share
   Cluster              Account       User  Partition     Share 
---------- -------------------- ---------- ---------- --------- 
   winston testacct                                          40 
   winston  test_lab                                          1 
   winston   test_lab                  ben        gpu         1 
   winston   test_lab                  ben      debug         1 
   winston   test_lab                user1      debug         1 
   winston   test_lab                user1        gpu         1 
   winston   test_lab                user2                    1 



You can see that the NormShares column shows that all 5 associations have .2 (or 1/5) of the total share for the test_lab account.  I ran some quick test jobs with user2 in both the 'debug' and 'gpu' partitions and both jobs incremented the usage for the association where as running a job in one partition for the other users will just affect the usage for one association.

$ sshare -Atest_lab,testacct -a --format=account,user,partition,rawshares,normshares,rawusage,effectvusage,fairshare
             Account       User    Partition  RawShares  NormShares    RawUsage  EffectvUsage  FairShare 
-------------------- ---------- ------------ ---------- ----------- ----------- ------------- ---------- 
testacct                                             40    0.909091     4344602      1.000000            
 test_lab                                             1    1.000000     4344602      1.000000            
  test_lab                  ben          gpu          1    0.200000           0      0.000000   0.151515 
  test_lab                  ben        debug          1    0.200000       64179      0.014772   0.090909 
  test_lab                user1        debug          1    0.200000     2120517      0.488081   0.060606 
  test_lab                user1          gpu          1    0.200000     2156344      0.496327   0.030303 
  test_lab                user2                       1    0.200000        3560      0.000820   0.121212 



You can see that if I set the fairshare value to 2 for user2 then they get a 'NormShares' value that is equal to the two associations for the other users:

$ sshare -Atest_lab,testacct -a --format=account,user,partition,rawshares,normshares,rawusage,effectvusage,fairshare
             Account       User    Partition  RawShares  NormShares    RawUsage  EffectvUsage  FairShare 
-------------------- ---------- ------------ ---------- ----------- ----------- ------------- ---------- 
testacct                                             40    0.909091     4302985      1.000000            
 test_lab                                             1    1.000000     4302985      1.000000            
  test_lab                  ben          gpu          1    0.166667           0      0.000000   0.151515 
  test_lab                  ben        debug          1    0.166667       63565      0.014772   0.090909 
  test_lab                user1        debug          1    0.166667     2100204      0.488081   0.060606 
  test_lab                user1          gpu          1    0.166667     2135689      0.496327   0.030303 
  test_lab                user2                       2    0.333333        3526      0.000820   0.121212 



Hopefully this helps clarify things a little more.  Let me know if you still have questions about this.

Thanks,
Ben

Comment 6 Davide Vanzo 2019-03-26 08:16:45 MDT

Ben,

Thank you for the clarification.
Please go ahead and close this ticket.

Comment 7 Ben Roberts 2019-03-26 08:28:51 MDT

Thanks, closing as InfoGiven.