Ticket 11186

Summary: advice needed: is SelectType=select/cons_res the right choice for us?
Product: Slurm Reporter: Michael Hebenstreit <michael.hebenstreit>
Component: slurmctldAssignee: Ben Roberts <ben>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 3 - Medium Impact    
Priority: ---    
Version: 20.02.5   
Hardware: Linux   
OS: Linux   
Site: Intel CRT Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Michael Hebenstreit 2021-03-23 14:06:23 MDT
our users always select complete nodes
we do not share any nodes between 2 users at the same time
nodes are selected via constraints, nothing else
users are expecting full control over the machine and are experts in setting core-bindings

might select/linear be a better choice or what would we loose?
Comment 1 Ben Roberts 2021-03-23 16:25:32 MDT
Hi Michael,

If your users are always going to be using whole nodes, then you're right that it might make more sense to switch to the select/linear plugin.  The primary benefit of using cons_res or cons_tres is the ability to track resources on the nodes.  As you can imagine, there is overhead associated with tracking all these resources instead of just tracking the nodes.  We have the following note in the Large Cluster Administration Guide:
-------------------------
While allocating individual processors within a node is great for smaller clusters, the overhead of keeping track of the individual processors and memory within each node adds significant overhead. For best scalability, allocate whole nodes using select/linear and avoid select/cons_res.
-------------------------
https://slurm.schedmd.com/big_sys.html

Another section with good information about differences between the plugins is the Mode of Operation section in the Select Plugin Design Guide, found here:
https://slurm.schedmd.com/select_design.html


To summarize, the differences between linear and cons_(res/tres) should be limited to the ability of the scheduler to break down the resources on the nodes.  The ability to use backfill or generic resources is broken out into other plugins so it shouldn't be much of a sacrifice to make this change.

Please let me know if you have any questions about this.

Thanks,
Ben
Comment 2 Michael Hebenstreit 2021-03-23 16:56:44 MDT
what needs to be in place to ensure backfill and features/constraints are working?
Comment 3 Ben Roberts 2021-03-24 08:38:25 MDT
In order to use backfill you need to specify the SchedulerType:
SchedulerType=sched/backfill

Features and constraints don't require an additional plugin to work, you just need to have the feature(s) defined for the nodes.  

Thanks,
Ben
Comment 4 Michael Hebenstreit 2021-03-24 08:39:47 MDT
Thanks, request can be closed

From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Wednesday, March 24, 2021 8:38 AM
To: Hebenstreit, Michael <michael.hebenstreit@intel.com>
Subject: [Bug 11186] advice needed: is SelectType=select/cons_res the right choice for us?

Comment # 3<https://bugs.schedmd.com/show_bug.cgi?id=11186#c3> on bug 11186<https://bugs.schedmd.com/show_bug.cgi?id=11186> from Ben Roberts<mailto:ben@schedmd.com>

In order to use backfill you need to specify the SchedulerType:

SchedulerType=sched/backfill



Features and constraints don't require an additional plugin to work, you just

need to have the feature(s) defined for the nodes.



Thanks,

Ben

________________________________
You are receiving this mail because:

  *   You reported the bug.
Comment 5 Ben Roberts 2021-03-24 08:41:23 MDT
Sounds good, closing now.  Let us know if there's anything we can do to help in the future.