Ticket 7824

Summary:	Different sets of fairshare parameters on a same slurmdbd / mysql
Product:	Slurm	Reporter:	Damien <damien.leong>
Component:	Scheduling	Assignee:	Albert Gil <albert.gil>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	4 - Minor Issue
Priority:	---
Version:	20.02.7
Hardware:	Linux
OS:	Linux
See Also:	https://bugs.schedmd.com/show_bug.cgi?id=12320
Site:	Monash University	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	CentOS	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description Damien 2019-09-27 00:36:00 MDT

Good Afternoon Slurm Support

How are you doing ?

We have this scenario, which needs some advice and suggestions.

Our present production cluster has some additional specialised servers (for VIP partners). These VIP partners or accounts have very high usage levels in the existing cluster, therefore their current fairshare value are fairly low. 

These additional machines are for VIP users only, but because of their high usage and fairshare in the existing cluster, their jobs to these specialised machines are being delayed because of fairshare scheduling. 

Current settings:

PriorityDecayHalfLife   = 14-00:00:00
PriorityWeightAge       = 30000
PriorityWeightFairShare = 50000
PriorityWeightJobSize   = 30000
PriorityWeightPartition = 30000
PriorityWeightQOS       = 40000
PriorityWeightTRES      = (null)
SchedulerParameters     = bf_continue,bf_max_time=200,bf_interval=30,bf_resolution=600,bf_window=20160,bf_yield_interval=1000000,sched_min_interval=2000000,bf_max_job_test=5000,bf_max_job_user=50
PriorityType            = priority/multifactor
FairShareDampeningFactor = 1
PriorityFlags           = FAIR_TREE
TreeWidth               = 50



The current account/sacctmgr structure:

Parents accounts --> Project accounts --> User accounts

pa001: 
-c001
--user01,user02,user03
-c002
--user04,user05,user06
-c003
--user07,user08

pa002: 
-c004
--user09,user10,user11
-c005
--user12,user13,user14
-c006
--user15,user16

pa003: 
-c007
--user17,user18
-c008
--user19,user20,user21
-c009
--user21


In this case, the parent account 'pa003' is our VIPs and with heavy usages. Let's said, I have configured these specialised nodes in a specialised partition with restriction, like:

New  PartitionName=spec Default=no Nodes=sp0[1-8] State=Up MaxTime=1-0 account=pa003 AllowQOS=pa003


We want to integrate these specialised servers into our existing cluster, so they can share the same UIDs/GIDs, software-stacks, parallel-filesystem, and slurmdbd/mysql.

Can we have different sets of fairshare parameters within a same slurmdbd/mysql ? Or how can we achieve this where these VIP users can submit jobs into this particular partition without being penalised by their fairshare from their usage in exiting cluster/partitions ? We don't want to increase these VIP's fairshare parameters, as this will affect other accounts in the existing partitions. 


We thought of these options:

1) Don't wish to use Reservation, it affects the actual usage figures from these accounts via sreports
2) Separate the slurmctld and slurm.conf from the existing cluster, this will increase the complexity in the maintainability and future updates 
3) Create separate parent account and project accounts for this, these project accounts contain tens of the same user accounts from our existing slurmdbd/mysql



Any advice for this particular setup will be greatly appreciated.


Thanks

Damien

Comment 1 Albert Gil 2019-09-30 10:25:42 MDT

Hi Damien,

I'm not sure if I understood you issue correctly.

> These additional machines are for VIP users only, but because of their high usage and fairshare in the existing cluster, their jobs to these specialised machines are being delayed because of fairshare scheduling. 

If only VIP users are allowed to use these specialized machines, the fairshare relation with non-VIP shouldn't be a problem.
If the specialized machines are *only* in the "spec" partition, if these users submit to that partition, the scheduler should run the jobs as soon as there are resources in that partition available, regardless of the fairshare.
Is not what you are seeing?


The only problem that I can see is the inverse one, that due the high usage of the specialized machines, the VIP users have less priority when they want to use non-specialized machines.
Is this your problem?
To solve this I would recommend any of these:
- Increase the Priority of the account to compensate it.
- Have a separated account to use only the specialized machines.


> New  PartitionName=spec Default=no Nodes=sp0[1-8] State=Up MaxTime=1-0 account=pa003 AllowQOS=pa003

Are you sure about using AllowQOS?
Why not AllowAccount?


Am I answering your question?
Albert

Comment 2 Damien 2019-10-01 08:57:05 MDT

Hi Albert

Thanks for your email.

Yes, it should be:
--
PartitionName=spec Default=no Nodes=sp0[1-8] State=Up MaxTime=1-0 AllowAccount=pa003 AllowQOS=pa003
--

From "If only VIP users are allowed to use these specialized machines, the fairshare relation with non-VIP shouldn't be a problem.
If the specialized machines are *only* in the "spec" partition, if these users submit to that partition, the scheduler should run the jobs as soon as there are resources in that partition available, regardless of the fairshare. "

Your observation is correct, but our VIP users/accounts would still want to continue use the existing cluster as per normal (using their fairshare) and semi-exclusive on these specialised servers. Basically, they don't want their fairshare parameters to be affected on the exisiting cluster, and also vice versa when using the specialised servers. So I am hoping to split the fairshare values into two separate metrics to manage these.

Especially, when we are still facing scheduling issues on our cluster: https://bugs.schedmd.com/show_bug.cgi?id=7686

We don't wish to increase the Priority of these VIP accounts to compensate it, because if that's the case, they will have an unfair advantage for submitting in the existing cluster, which could be mis-used or mis-managed.

Having separated accounts with different parent accounts for the VIPs to use the specialized machines, might be the right approach, but it would mean creating whole same sets of account/users with another parent accounts.

Your thoughts ?

Cheers

Damien

Comment 3 Albert Gil 2019-10-01 10:06:44 MDT

Hi Damien,

> Yes, it should be:
> --
> PartitionName=spec Default=no Nodes=sp0[1-8] State=Up MaxTime=1-0
> AllowAccount=pa003 AllowQOS=pa003
> --

I'm not sure if you need both Allow, but as you also mentioned "semi-exclusive", then maybe you do.

> Your observation is correct, but our VIP users/accounts would still want to
> continue use the existing cluster as per normal (using their fairshare) and
> semi-exclusive on these specialised servers. Basically, they don't want
> their fairshare parameters to be affected on the exisiting cluster, and also
> vice versa when using the specialised servers.

Ok, now I understand better the problem.

> So I am hoping to split the
> fairshare values into two separate metrics to manage these.

Yes, divide and conquer.

> We don't wish to increase the Priority of these VIP accounts to compensate
> it, because if that's the case, they will have an unfair advantage for
> submitting in the existing cluster, which could be mis-used or mis-managed.

I don't like it neither.

> Having separated accounts with different parent accounts for the VIPs to use
> the specialized machines, might be the right approach, but it would mean
> creating whole same sets of account/users with another parent accounts.
> Your thoughts ?

Yes, it means creating new accounts/users, but it's the right way to do what you want.
Actually, although we think in Users and Accounts, note that internally Slurm thinks more on Associations and Hierarchies.
Internally Users and Accounts are almost only a pair of a name and an id (some more info, but not really important).
But when we create a user or an account Slurm also creates an association in a hierarchical way, and that that association contains the really important values for the account or user: limits, qos, usage, parent...
All the accounting, resource limits and fairshare is fully done "per association", and one of the main reason is to allow the split that you want: independent associations for the same user.
Is the way that Slurm works.

Some scripting may help to create/replicate the "new" accounts and users (the new associations)?

Regards,
Albert

Comment 4 Albert Gil 2019-10-07 07:50:45 MDT

Hi Damien,

I'm closing the ticket as infogiven for now, but please feel free to reopen it if you have further questions.

Regards,
Albert

Comment 5 Damien 2021-08-19 08:33:07 MDT

Good Afternoon Slurm Support

How are you doing ?

I would like to re-open this ticket to review this in another view.

We are currently using Slurm V 20.02.7


Cheers

Damien

Comment 6 Albert Gil 2021-08-19 08:50:16 MDT

Hi Damien,

> How are you doing ?

Good!
I hope you too!
:-)

> I would like to re-open this ticket to review this in another view.
> We are currently using Slurm V 20.02.7

We are happy to help you reviewing this topic again.
But as this was a very old ticket (almost 2 years and based on 18.08), I think that it's better if we just create a new one mentioning this old one as a reference there, but to review it there, in a fresh one.

Do you mind to create a new bug with your new thoughts about this topic?

Thanks!
Albert

Comment 7 Damien 2021-08-19 08:57:28 MDT

This is the situation, 

Our production cluster has some additional specialised servers (for paid partners). These partners or accounts have very high usage levels in the existing cluster, therefore their current fairshare metric are fairly low. 

The additional machines are for paid users only, but because of their high usage and fairshare in the existing cluster, their jobs to these specialised machines are being delayed because of fairshare scheduling .

It is because of the high usage of the specialized nodes, these paid users have less priority when they want to use non-specialized nodes (The common nodes inside the cluster).

Our current TresBilling rate is 1 CPU = 1 Billing unit, For example:
CfgTRES=cpu=24,mem=257669M,billing=24,


***Question

We are thinking if we put all these specialised nodes in a 'special' partition for paid users only (Using AllowAccount=pa003 AllowQOS=pa003). Can we configure this partition to have TRESBillingWeights=0, or <1, so whoever users use this special partition, it will NOT affect or contribute to their respective accounts' fairshare metrics ?  

Is this a viable solution ?

Or can you propose an optimised option ? 



Kindly advise

Thanks

Damien

Comment 8 Damien 2021-08-19 08:59:07 MDT

Yes, if open a new ticket suits your workflow, Please open a new ticket for this query.

Thanks

Comment 9 Jason Booth 2021-08-19 12:47:36 MDT

Continuing on in bug#12320.