| Summary: | Requesting CPU resources | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | John Hudson <john.p.hudson> |
| Component: | Configuration | Assignee: | Oscar Hernández <oscar.hernandez> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 21.08.8 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Dartmouth | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
John Hudson
2022-05-24 08:39:55 MDT
Hi John, From what you are mentioning, I think you are interested in the "Feature" slurm.conf option. It will allow you to assign the desired characteristic to a set of nodes, and then allow cluster users to filter target nodes via the "#SBATCH --constraint" option. This configuration must be defined in the node configuration line of the slurm.conf, as it is a node property. The keyword is "Feature". Let me put you an example: Let's say I have a 4 node cluster (node[1-4]). Half of it has intel cpus (node[1-2]), and the other half has amd (node[3-4]). Node definitions in slurm.conf should be something like: #we set some default that will be inherited by all nodes NodeName=DEFAULT Sockets=1 CoresPerSocket=4 ThreadsPerCore=2 RealMemory=16384 #define intel nodes (it can be combined with gres with no problem) NodeName=node[1-2] NodeHostname=host[1-2] Port=2205 Gres=gpu:2 Feature=intel #define amd nodes (it can be combined with gres with no problem) NodeName=node[3-4] NodeHostname=host[3-4] Port=2205 Gres=gpu:4 Feature=amd With that conf, if a user requests in the job for: #SBATCH --constraint=intel The job can only be allocated in node1 or node2. If no constraint is specified, all cluster nodes can be potentially allocated. (OPTIONAL/ADDITIONAL TIP) To help organize cluster nodes and partitions, there is also the option to create NodeSets. This option will allow you to have a name that will represent all nodes with a given constraint. Following The previous example, we can organize our 2 groups of nodes in 2 nodesets: #AMD nodes NodeSet=amdnodes Feature=amd #INTEL nodes NodeSet=intelnodes Feature=intel it will make it easier to organize them in partitions later. For example, if you want all nodes in the same partition you can set: PartitionName=main Nodes=intelnodes,amdnodes Default=YES MaxTime=INFINITE State=UP But if you decide to separate them in different partitons, you can make it easily by: PartitionName=intel Nodes=intelnodes Default=YES MaxTime=INFINITE State=UP PartitionName=amd Nodes=amdnodes Default=YES MaxTime=INFINITE State=UP You will find details on mentioned options here: https://slurm.schedmd.com/slurm.conf.html#OPT_Features | https://slurm.schedmd.com/sbatch.html#OPT_constraint | https://slurm.schedmd.com/slurm.conf.html#SECTION_NODESET-CONFIGURATION | Give it a go and let us know if it serves your purposes, or if you have any other doubt in this regard. Kind regards, Oscar Hi John, I am closing this issue. If you have any follow-up question, feel free to re-open the thread. Kind regards, Oscar |