| Summary: | Throttle cores down but allow single core interactive shell jobs | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Jeff Haferman <jlhaferm> |
| Component: | Scheduling | Assignee: | Oriol Vilarrubi <jvilarru> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 20.02.5 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | NPS HPC | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Jeff Haferman
2021-05-12 10:35:27 MDT
Hi Jeff, You can create an additional account for this purpose, for example if we name the new account name interactive: - sudo sacctmgr add account interactive - sudo sacctmgr add user john account=interactive GrpTRES=cpu=2 #Change this last number to whichever number of CPUs you want that user to be able to go "over limit" Then the user needs to specify this other account while launching interactive jobs with --account=interactive, I've tested this in my machine with the user foo: root@torre:~# sacctmgr modify user foo set GrpTRES=cpu=2 root@torre:~# sacctmgr add account interactive root@torre:~# sacctmgr add user john account=interactive GrpTRES=cpu=2 root@torre:~# su - foo foo@torre:~$ sbatch -n 2 --wrap="srun sleep 120" Submitted batch job 10 foo@torre:~$ srun hostname srun: job 11 queued and waiting for resources ^Csrun: Job allocation 11 has been revoked srun: Force Terminated job 11 foo@torre:~$ srun --account=interactive hostname torre As you see when I ran the first interactive job as I did not specified the account the job was stuck (that is why you see the ^C in the output, as I cancelled it), the second interactive job was OK as I specified the account. Please let me know if this is a suitable solution for your needs. Thank you! This looks like a nice solution. What I wonder though is what about the case where I have several such users that I want to throttle down. That is indeed what is happening on our cluster now - I have about half a dozen such users, and I would like to limit the total number of cores each one can access, but still allow them to run a maximum of 2 interactive jobs. In your example, if user foo run 2 interactive jobs using the account "interactive", he will have used up the allocation for this account? So user bar would then be unable to request a core in the same way? (In reply to Jeff Haferman from comment #3) > Thank you! > > This looks like a nice solution. What I wonder though is what about the case > where I have several such users that I want to throttle down. That is indeed > what is happening on our cluster now - I have about half a dozen such users, > and I would like to limit the total number of cores each one can access, but > still allow them to run a maximum of 2 interactive jobs. That should be no problem, you can set a different limit per association, an association is a combination of these 3 elements (Cluster, Account, User) [And you can also opcionally specify a partition] so when you are doing this command: sacctmgr modify user john set GrpTRES=cpu=128 You are applying it to: Cluster="The cluster from where you are launching this command" User=john Account="The default account of john" # You can check what is this with this command: sacctmgr list user defaultaccount <username> You can see all these values while sacctmgr asks for confirmation. > > In your example, if user foo run 2 interactive jobs using the account > "interactive", he will have used up the allocation for this account? So user > bar would then be unable to request a core in the same way? No, because the GrpTRES is applied to the association (Cluster,Account,User) not to the account. I've also tested that to be 100% sure: foo@torre:~$ sbatch -A interactive -n 2 --wrap="srun sleep 120" Submitted batch job 21 foo@torre:~$ logout jvilarru@torre:~/slurm-devel/2011/etc$ sudo su - bar bar@torre:~$ . /opt/slurm-src/env_slurm.sh 2011 bar@torre:~$ srun -A interactive hostname torre I've set the limit to be 2 cores, not 2 jobs, assuming that you do not mind if the users can launch a 2core job or 2 single-core jobs. If you really want to enforce the limit of 2 single-core jobs, then you need to set the limits using GrpJobs=2 and MaxTRESPerJob=cpu=1 Awesome! You said I can optionally apply this to a partition. That is something I am interested in, could you provide the syntax for that? Then I think I will be in great shape. Thank you so much. The syntax is this one: sacctmgr add user account=account_name partition=partition_name user_name whatever_limits_you_want Example with user foo, account users in partition other with a limit of 1 core. sacctmgr add user account=users partition=other foo GrpTRES=cpu=1 In this example I've set a limitation of 1 core, but you can change it to anything. It works as a normal CAU (Cluster,Account,User) association. This is the account information I have: sacctmgr list assoc format=Cluster,Account,User,Partition,GrpTRES user=foo Cluster Account User Partition GrpTRES ---------- ---------- ---------- ---------- ------------- cluster users foo other cpu=1 cluster users foo cpu=2 As you can see the limit cpu=2 will be applied to all the partitions except "other" and cpu=1 only to other. Please let me know if that solves your issues. This is perfect Oriol. Thank you so much. You can close this issue. You're welcome Jeff, closing the bug. If you need any more help do not hesitate to contact us. |