| Summary: | Fairshare and partitions | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Davide Vanzo <davide.vanzo> |
| Component: | Accounting | Assignee: | Ben Roberts <ben> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 18.08.4 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Vanderbilt | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
Hi Davide, The multiple entries you are seeing for the users is due to them having multiple associations for the different partitions. I would like to see how you have the associations defined for the users in this account. Can I have you send the output of the following command: sacctmgr show assoc tree account=neuert_lab_account,p_neuert_lab format=cluster,account,user,partition,share,qos I would expect that jobs for the different partitions would have different priorities based on the fact that you see different "FairShare" values. You can gain some insight into how the priority is calculated for pending jobs with the sprio command. I'd like to make sure I'm testing the same scenario before I give a definitive answer though. Thanks, Ben Ben,
Here is the output you requested.
> Cluster Account User Partition Share QOS
> ---------- -------------------- ---------- ---------- --------- --------------------
> vampire neuert_lab_account 40 normal
> vampire p_neuert_lab 1 normal
> vampire p_neuert_lab hughesjj production 1 normal
> vampire p_neuert_lab hughesjj debug 1 normal
> vampire p_neuert_lab jashnsh debug 1 normal
> vampire p_neuert_lab jashnsh production 1 normal
> vampire p_neuert_lab keslerbk debug 1 normal
> vampire p_neuert_lab keslerbk production 1 normal
> vampire p_neuert_lab meyera1 production 1 normal
> vampire p_neuert_lab meyera1 debug 1 normal
> vampire p_neuert_lab neuertg production 1 normal
> vampire p_neuert_lab neuertg debug 1 normal
> vampire p_neuert_lab singha4 debug 1 normal
> vampire p_neuert_lab singha4 production 1 normal
> vampire p_neuert_lab spielmcl debug 1 normal
> vampire p_neuert_lab spielmcl production 1 normal
> vampire p_neuert_lab thiemia debug 1 normal
> vampire p_neuert_lab thiemia production 1 normal
> vampire p_neuert_lab walkesd1 production 1 normal
> vampire p_neuert_lab walkesd1 debug 1 normal
> vampire p_neuert_lab wangel debug 1 normal
> vampire p_neuert_lab wangel production 1 normal
Hi Davide,
I setup a scenario similar to yours and can confirm that the jobs in the different partitions will have different priorities. Here are the details of my test. I created an account with a child account that a user was added to. The user had one of two partitions specified for the association creation.
$ sacctmgr show assoc tree account=testacct,test_lab format=cluster,account,user,partition,share
Cluster Account User Partition Share
---------- -------------------- ---------- ---------- ---------
winston testacct 40
winston test_lab 1
winston test_lab user1 debug 1
winston test_lab user1 gpu 1
I then ran a job that occupied all the nodes on my cluster for a while to generate some usage. I could see that the RawUsage reflected this and the FairShare value was adjusted accordingly.
$ sshare -Atest_lab,testacct -uuser1 --format=account,user,partition,rawshares,normshares,rawusage,effectvusage,fairshare
Account User Partition RawShares NormShares RawUsage EffectvUsage FairShare
-------------------- ---------- ------------ ---------- ----------- ----------- ------------- ----------
testacct 40 0.909091 167935 0.130899
test_lab 1 1.000000 167935 1.000000
test_lab user1 debug 1 0.250000 167935 1.000000 0.625000
test_lab user1 gpu 1 0.250000 0 0.000000 0.718750
Then I submitted two jobs that requested the same resources other than the partition and you can see the higher priority for the job in the gpu partition.
$ sbatch -Atest_lab -pdebug -N5 --exclusive -t 10:00 --wrap="srun sleep 600"
Submitted batch job 8193
$ sbatch -Atest_lab -pgpu -N5 --exclusive -t 10:00 --wrap="srun sleep 600"
Submitted batch job 8194
$ sprio
JOBID PARTITION PRIORITY AGE FAIRSHARE JOBSIZE PARTITION TRES
8193 debug 7562 0 6250 271 1000 cpu=42
8194 gpu 8499 0 7188 271 1000 cpu=42
I hope this helps. Let me know if you have questions or if this is ok to close.
Thanks,
Ben
Ben, Thank you for confirming this. My initial question was on a different point though. I may not be fully clear and I apologize for that. Let's take as an example the case of the p_neuert_lab account I sent you before. Right now all users have a maximum fairshare of 0.038483 for each association. If each user would be associated only with the production partition and not with the debug partition, would the priority be twice that value? Granted, since every user has both associations, the maximum fairshare relative to the other users will remain the same. However the fairshare contribution to the priority with respect to age and size would be decreased and we should adjust the weight by a factor of two. And the other thing is that with usage the fairshare will approach zero much sooner than with a starting fairshare that is twice that value. Hi Davide, > Thank you for confirming this. > My initial question was on a different point though. I may not be fully clear > and I apologize for that. No problem, I'm sorry I didn't quite get what you were asking the first time. > Let's take as an example the case of the p_neuert_lab account I sent you > before. Right now all users have a maximum fairshare of 0.038483 for each > association. If each user would be associated only with the production > partition and not with the debug partition, would the priority be twice that > value? If a user only had an association with one partition it wouldn't have twice the priority compared to users with an association for each partition. Slurm will just split up the shares evenly among the existing associations. > Granted, since every user has both associations, the maximum fairshare > relative to the other users will remain the same. However the fairshare > contribution to the priority with respect to age and size would be decreased > and we should adjust the weight by a factor of two. And the other thing is > that with usage the fairshare will approach zero much sooner than with a > starting fairshare that is twice that value. You're correct that the way to make things fair again for the users would be to set the number of shares to 2 for the user without an association for each partition. If that user just has a single share then their fairshare priority will drop twice as fast as users who can spread their usage out over two partitions. Here's a quick example from my system. I added user2 without specifying a partition for the association: $ sacctmgr show assoc tree account=testacct,test_lab format=cluster,account,user,partition,share Cluster Account User Partition Share ---------- -------------------- ---------- ---------- --------- winston testacct 40 winston test_lab 1 winston test_lab ben gpu 1 winston test_lab ben debug 1 winston test_lab user1 debug 1 winston test_lab user1 gpu 1 winston test_lab user2 1 You can see that the NormShares column shows that all 5 associations have .2 (or 1/5) of the total share for the test_lab account. I ran some quick test jobs with user2 in both the 'debug' and 'gpu' partitions and both jobs incremented the usage for the association where as running a job in one partition for the other users will just affect the usage for one association. $ sshare -Atest_lab,testacct -a --format=account,user,partition,rawshares,normshares,rawusage,effectvusage,fairshare Account User Partition RawShares NormShares RawUsage EffectvUsage FairShare -------------------- ---------- ------------ ---------- ----------- ----------- ------------- ---------- testacct 40 0.909091 4344602 1.000000 test_lab 1 1.000000 4344602 1.000000 test_lab ben gpu 1 0.200000 0 0.000000 0.151515 test_lab ben debug 1 0.200000 64179 0.014772 0.090909 test_lab user1 debug 1 0.200000 2120517 0.488081 0.060606 test_lab user1 gpu 1 0.200000 2156344 0.496327 0.030303 test_lab user2 1 0.200000 3560 0.000820 0.121212 You can see that if I set the fairshare value to 2 for user2 then they get a 'NormShares' value that is equal to the two associations for the other users: $ sshare -Atest_lab,testacct -a --format=account,user,partition,rawshares,normshares,rawusage,effectvusage,fairshare Account User Partition RawShares NormShares RawUsage EffectvUsage FairShare -------------------- ---------- ------------ ---------- ----------- ----------- ------------- ---------- testacct 40 0.909091 4302985 1.000000 test_lab 1 1.000000 4302985 1.000000 test_lab ben gpu 1 0.166667 0 0.000000 0.151515 test_lab ben debug 1 0.166667 63565 0.014772 0.090909 test_lab user1 debug 1 0.166667 2100204 0.488081 0.060606 test_lab user1 gpu 1 0.166667 2135689 0.496327 0.030303 test_lab user2 2 0.333333 3526 0.000820 0.121212 Hopefully this helps clarify things a little more. Let me know if you still have questions about this. Thanks, Ben Ben, Thank you for the clarification. Please go ahead and close this ticket. Thanks, closing as InfoGiven. |
Hello all, I need a clarification regarding the fairshare distribution across multiple users in an account when associations to multiple partitions are present. In the sshare output below you see for example the fairshare of one of our accounts with its users. Does each association counts as two separate users even if the actual user is the same? What I am trying to figure out is if this has any effect on the final caluclated fairshare. > Account User Partition RawShares NormShares RawUsage EffectvUsage FairShare > -------------------- ---------- ------------ ---------- ----------- ----------- ------------- ---------- > neuert_lab_account 40 0.003642 33592782 0.017115 0.038483 > p_neuert_lab 1 0.003642 33592782 0.017115 0.038483 > p_neuert_lab hughesjj production 1 0.000182 0 0.000856 0.038483 > p_neuert_lab hughesjj debug 1 0.000182 0 0.000856 0.038483 > p_neuert_lab jashnsh debug 1 0.000182 0 0.000856 0.038483 > p_neuert_lab jashnsh production 1 0.000182 33357230 0.017001 0.000000 > p_neuert_lab keslerbk debug 1 0.000182 0 0.000856 0.038483 > p_neuert_lab keslerbk production 1 0.000182 3687 0.000858 0.038222 > p_neuert_lab meyera1 production 1 0.000182 213521 0.000959 0.025966 > p_neuert_lab meyera1 debug 1 0.000182 0 0.000856 0.038483 > p_neuert_lab neuertg production 1 0.000182 0 0.000856 0.038483 > p_neuert_lab neuertg debug 1 0.000182 0 0.000856 0.038483 > p_neuert_lab singha4 debug 1 0.000182 0 0.000856 0.038483 > p_neuert_lab singha4 production 1 0.000182 0 0.000856 0.038483 > p_neuert_lab spielmcl debug 1 0.000182 0 0.000856 0.038483 > p_neuert_lab spielmcl production 1 0.000182 0 0.000856 0.038483 > p_neuert_lab thiemia debug 1 0.000182 0 0.000856 0.038483 > p_neuert_lab thiemia production 1 0.000182 0 0.000856 0.038483 > p_neuert_lab walkesd1 production 1 0.000182 18344 0.000865 0.037204 > p_neuert_lab walkesd1 debug 1 0.000182 0 0.000856 0.038483 > p_neuert_lab wangel debug 1 0.000182 0 0.000856 0.038483 > p_neuert_lab wangel production 1 0.000182 0 0.000856 0.038483