Ticket 11386

Summary: Federate more than two clusters
Product: Slurm Reporter: NASA JSC Aerolab <JSC-DL-AEROLAB-ADMIN>
Component: ConfigurationAssignee: Skyler Malinowski <skyler>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: skyler
Version: 20.11.4   
Hardware: Linux   
OS: Linux   
Site: Johnson Space Center Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description NASA JSC Aerolab 2021-04-15 10:32:13 MDT
Hello there, we have three different clusters two of them currently federated, we want to consider the following scenarios we need your input on how to handle them:

- clusterA and clusterB federated
- clusterA and clusterC federated
- clusterB and clusterC federated
- three clusters federated

Are there excluding options or can we have all of them on a single configuration? Please advise.

Thanks,
-Hugo
Comment 1 Skyler Malinowski 2021-04-15 12:12:25 MDT
Hello Hugo,

A cluster can only be a member of one federation at a time. So, your listed federation configurations are all mutually exclusive with each other.

Please refer to (https://slurm.schedmd.com/federation.html) for more information.

Any follow up questions?

Thanks,
Skyler
Comment 2 NASA JSC Aerolab 2021-04-15 12:19:57 MDT
Skyler, if this is the case, then the option for the three clusters federated is a valid one, right?  If this is true, then can I do something like this:

- user can submit a job to land on any of the three clusters as they are federated
- user can submit a job to land only in two of the three clusters in the federation excluding the third cluster

Is this the case?

Thanks,
-Hugo
Comment 3 Skyler Malinowski 2021-04-15 15:44:11 MDT
Hugo,

> user can submit a job to land on any of the three clusters as
> they are federated
Submit normally.

> user can submit a job to land only in two of the three clusters
> in the federation excluding the third cluster
Submit with flag `-M` or `--clusters=<string>`

For example, I have clusters fed[0-2] in federation and I run the below command. This submits a job that will be sent to fed0 and fed2 but not fed1.
> sbatch --clusters=fed0,fed2 --wrap "sleep 1m"

Regards,
Skyler
Comment 4 NASA JSC Aerolab 2021-04-15 15:46:09 MDT
Got it. Thank you!
Comment 5 NASA JSC Aerolab 2021-04-20 11:59:28 MDT
Skyler, one more question.  Let's say we have the three-clusters federation.  How it should look their configuration if we want X users from account Y to have high priority access to clusterA and low priority access to clusterB & clusterC, is this doable?
Thanks,
-Hugo
Comment 6 Skyler Malinowski 2021-04-20 13:30:49 MDT
Hi Hugo,

(In reply to NASA JSC Aerolab from comment #5)
> Skyler, one more question.  Let's say we have the three-clusters federation.
> How it should look their configuration if we want X users from account Y to
> have high priority access to clusterA and low priority access to clusterB &
> clusterC, is this doable?
Yes this is doable. Just adjust account priority on each cluster as needed.

For your example case, run the following commands depending on use case:

[case a] Intention: I want to just limit the entire account.
> sacctmgr modify account set Priority=100 where account=acctY cluster=clusterA
> sacctmgr modify account set Priority=1   where account=acctY cluster=clusterB,clusterC

[case b] Intention: I want to just limit a subset of users on an account. Assuming that acctY contains users "user[A,B,C]" and I want to limit "user[A,B]".
> sacctmgr modify user set Priority=100 where user=userA,userB account=acctY cluster=clusterA
> sacctmgr modify user set Priority=1   where user=userA,userB account=acctY cluster=clusterB,clusterC

Note: `Priority=100` and `Priority=1` are arbitrary values. Please use values that make sense for your site(s). You can also use both methods together.

Does this method work for you? Is that the priority access you mean?

Regards,
Skyler
Comment 7 NASA JSC Aerolab 2021-04-20 13:56:21 MDT
Great!  This is what we are looking for.  I will try this on emulated clusters.
Comment 8 Skyler Malinowski 2021-05-11 09:58:43 MDT
It seems like the information was sufficient. Closing ticket.

If you have more questions, please re-open the ticket and I will be more than happy to answer them.

Regards,
Skyler