Hi support, We have nodes running on the cluster and nodes has working so hard. Is there a way to limit capacity of running jobs on the nodes to prevent the nodes could be burned out? Is there easy way to configure in SLURM (e.g.:limit on percentage)? Thanks, Thu-Ha
Hi Thu-Ha, There are a couple parameters that I think could help you out. There is a SelectTypeParameter that tells the scheduler to place jobs on the least loaded node first, rather than packing them on nodes that are already busy. This doesn't stop nodes from being loaded to capacity, but it may help if your cluster isn't fully occupied. You can read more about it here: https://slurm.schedmd.com/slurm.conf.html#OPT_CR_LLN The other option I think is a better fit for what you are asking for. You can specify that a certain number of cores are set aside for system processes rather than being scheduled for jobs. If you specify more cores than you need for system processes then they will sit idle and will keep the nodes from being maxed out. https://slurm.schedmd.com/slurm.conf.html#OPT_CoreSpecCount Let me know if either of these sound like they'll work for you. Thanks, Ben
Hi Thu-Ha, Did either of the parameters I suggested work for what you are trying to do? Let me know if you still need help with this ticket. Thanks, Ben
Those parameters seemed not achieve on what we expected. Anyway, we leave this option on the side now. You can close ticket Thanks for your helps! Regards, Thu-ha
I'm sorry to hear these didn't quite get you the behavior you wanted. If you'd like to look at this again down the road feel free to update the ticket. Thanks, Ben