Ticket 7007

Summary: How to reserve CPU cores for slurmd
Product: Slurm Reporter: George Hwa <george.hwa>
Component: ConfigurationAssignee: Ben Roberts <ben>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 3 - Medium Impact    
Priority: ---    
Version: 18.08.1   
Hardware: Linux   
OS: Linux   
Site: KLA-Tencor RAPID Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed: 1.8
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description George Hwa 2019-05-13 13:48:33 MDT
For any compute node, we want to reserve couple of CPU cores so that slurmd would always have dedicated resource to execute. For instance, we have 88 cores on a node. we don't want slurm to schedule 88 jobs/tasks on it. If that happens, slurmd on that node may get swapped out and become slow to respond.

What's the best way to accomplish that. I know I could cheat by tell slurm I only have 80 cores instead of 88. But that does not seems to be a clean way of doing it.
Comment 1 Ben Roberts 2019-05-13 15:35:37 MDT
Hi George,

We do have an option to reserve a number of CPUs/Cores for system use rather than jobs.  These options are called CpuSpecList and CoreSpecCount.  You must choose one or the other of those depending on whether you want to reserve CPUs or Cores.  You can also control whether you reserve those CPUs/Cores for use by slurmd and slurmstepd or reserve them for non-slurm related processes by setting TaskPluginParam=SlurmdOffSpec.  

Related to this, you can also reserve a certain amount of RAM for system usage with the MemSpecLimit parameter.  

The CpuSpecList, CoreSpecCount and MemSpecLimit parameters can be set on the node definition line in your slurm.conf file.  You can read more about these parameters in the documentation here:
https://slurm.schedmd.com/slurm.conf.html#OPT_CoreSpecCount

The TaskPluginParam is set in the slurm.conf file outside of the node definition and can be found in the documentation here:
https://slurm.schedmd.com/slurm.conf.html#OPT_TaskPluginParam


I hope this helps.  Let me know if you have questions about these parameters.

Thanks,
Ben
Comment 2 George Hwa 2019-05-16 15:18:40 MDT
great! thanks!

BTW, would 2 cores enough to ensure slurmd and slurmstepd get adequate resource? How about the amount memory to reserve?
Comment 3 Jason Booth 2019-05-16 15:46:37 MDT
Hi George,

Ben is out of the office for the next few days so I will answer your question.

2 processors are more than adequate for the slurmd process. The stepd processes are part of the user allocation so these are not considered CpuSpecList, CoreSpecCount and MemSpecLimit. Most sites will profile a system to see what the OS requires in terms of memory and Cores before setting these values, however, a value of 2 should be adequate for both the OS and slurmd. The memory is a bit more difficult to estimate without knowing how much is taken up by the OS and other services, so this is something you will need to look at and decide upon. 

The slurmd is rather small in memory on my test system (3mb-15mb) so I do not imagine you will be too worried about this as much as the memory used by the base OS. 


-Jason
Comment 4 Ben Roberts 2019-05-29 12:39:56 MDT
Hi George,

I wanted to follow up on this ticket.  Do you have any additional questions about reserving resources for the OS and other processes?  If not I'll go ahead and close this ticket.

Thanks,
Ben
Comment 5 George Hwa 2019-05-29 12:55:50 MDT
Ben,

I'm good for now. You may close it.

Thanks
George
Comment 6 Ben Roberts 2019-05-29 12:59:22 MDT
Thank you George.  Closing now.