Ticket 14155 - Requesting CPU resources
Summary: Requesting CPU resources
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 21.08.8
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Oscar Hernández
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-05-24 08:39 MDT by John Hudson
Modified: 2022-06-01 02:01 MDT (History)
0 users

See Also:
Site: Dartmouth
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description John Hudson 2022-05-24 08:39:55 MDT
Hello,

I think that this is an easy question but my  searches are not turning anything up so I figured I'd ask here.

We have members who would like to run jobs on like hardware. We order our compute nodes in waves so one year we will have a particular CPU in nodes, then the next year it could be different. In the past we used moab / torque and I could assign nodes features like for example "intel_model" then I could tell the users to submit to that feature and it will guarantee their code runs on like hardware. I am not finding a similar option to SLURM... I do see that the gres for our GPU nodes provides this functionality but I can't seem to find it for CPU's.

Is there a way users can select specific hardware without putting the hardware into their own separate partitions?

Thanks for your continued support!

Best,

John
Comment 2 Oscar Hernández 2022-05-25 02:43:49 MDT
Hi John,

From what you are mentioning, I think you are interested in the "Feature" slurm.conf option. 

It will allow you to assign the desired characteristic to a set of nodes, and then allow cluster users to filter target nodes via the "#SBATCH --constraint" option.

This configuration must be defined in the node configuration line of the slurm.conf, as it is a node property. The keyword is "Feature".

Let me put you an example:

Let's say I have a 4 node cluster (node[1-4]). Half of it has intel cpus (node[1-2]), and the other half has amd (node[3-4]).

Node definitions in slurm.conf should be something like:

#we set some default that will be inherited by all nodes
NodeName=DEFAULT Sockets=1 CoresPerSocket=4 ThreadsPerCore=2 RealMemory=16384

#define intel nodes (it can be combined with gres with no problem)
NodeName=node[1-2] NodeHostname=host[1-2] Port=2205 Gres=gpu:2 Feature=intel
 
#define amd nodes (it can be combined with gres with no problem)
NodeName=node[3-4] NodeHostname=host[3-4] Port=2205 Gres=gpu:4 Feature=amd

With that conf, if a user requests in the job for:

#SBATCH --constraint=intel

The job can only be allocated in node1 or node2. If no constraint is specified, all cluster nodes can be potentially allocated.

(OPTIONAL/ADDITIONAL TIP)

To help organize cluster nodes and partitions, there is also the option to create NodeSets. This option will allow you to have a name that will represent all nodes with a given constraint. Following The previous example, we can organize our 2 groups of nodes in 2 nodesets:

#AMD nodes
NodeSet=amdnodes Feature=amd
#INTEL nodes
NodeSet=intelnodes Feature=intel

it will make it easier to organize them in partitions later. For example, if you want all nodes in the same partition you can set:

PartitionName=main Nodes=intelnodes,amdnodes Default=YES MaxTime=INFINITE State=UP

But if you decide to separate them in different partitons, you can make it easily by:

PartitionName=intel Nodes=intelnodes Default=YES MaxTime=INFINITE State=UP
PartitionName=amd Nodes=amdnodes Default=YES MaxTime=INFINITE State=UP

You will find details on mentioned options here:

https://slurm.schedmd.com/slurm.conf.html#OPT_Features |
https://slurm.schedmd.com/sbatch.html#OPT_constraint |
https://slurm.schedmd.com/slurm.conf.html#SECTION_NODESET-CONFIGURATION |

Give it a go and let us know if it serves your purposes, or if you have any other doubt in this regard.

Kind regards,
Oscar
Comment 3 Oscar Hernández 2022-06-01 02:01:59 MDT
Hi John,

I am closing this issue. If you have any follow-up question, feel free to re-open the thread.

Kind regards,
Oscar