Ticket 17585 - How to limit job capacity on specific nodes
Summary: How to limit job capacity on specific nodes
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 21.08.4
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Ben Roberts
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2023-08-31 08:27 MDT by Thu-Ha Tran
Modified: 2023-09-27 08:20 MDT (History)
0 users

See Also:
Site: Shell
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Thu-Ha Tran 2023-08-31 08:27:26 MDT
Hi support,

We have nodes running on the cluster and nodes has working so hard.  Is there a way to limit capacity of running jobs on the nodes to prevent the nodes could be burned out?

Is there easy way to configure in SLURM (e.g.:limit on percentage)?

Thanks,
Thu-Ha
Comment 1 Ben Roberts 2023-08-31 13:31:19 MDT
Hi Thu-Ha,

There are a couple parameters that I think could help you out.  There is a SelectTypeParameter that tells the scheduler to place jobs on the least loaded node first, rather than packing them on nodes that are already busy.  This doesn't stop nodes from being loaded to capacity, but it may help if your cluster isn't fully occupied.  You can read more about it here:
https://slurm.schedmd.com/slurm.conf.html#OPT_CR_LLN

The other option I think is a better fit for what you are asking for.  You can specify that a certain number of cores are set aside for system processes rather than being scheduled for jobs.  If you specify more cores than you need for system processes then they will sit idle and will keep the nodes from being maxed out.
https://slurm.schedmd.com/slurm.conf.html#OPT_CoreSpecCount

Let me know if either of these sound like they'll work for you.

Thanks,
Ben
Comment 2 Ben Roberts 2023-09-26 13:28:31 MDT
Hi Thu-Ha,

Did either of the parameters I suggested work for what you are trying to do?  Let me know if you still need help with this ticket.

Thanks,
Ben
Comment 3 Thu-Ha Tran 2023-09-26 15:40:39 MDT
Those parameters seemed not achieve on what we expected.
Anyway, we leave this option on the side now.
You can close ticket

Thanks for your helps!

Regards,
Thu-ha
Comment 4 Ben Roberts 2023-09-27 08:20:17 MDT
I'm sorry to hear these didn't quite get you the behavior you wanted.  If you'd like to look at this again down the road feel free to update the ticket.

Thanks,
Ben