| Summary: | Mixed memory nodes | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | NASA JSC Aerolab <JSC-DL-AEROLAB-ADMIN> |
| Component: | Configuration | Assignee: | Ben Glines <ben.glines> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 21.08.5 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Johnson Space Center | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
NASA JSC Aerolab
2022-11-16 15:05:27 MST
Hi Patrick, There are several options you could try that would work great. It really depends on what what is best for your site's needs. If these nodes are all on the same queue (Slurm partition): - I would recommend weighted nodes, so that jobs flow to low memory nodes first. Your exact case is even documented in the description for node weights: https://slurm.schedmd.com/slurm.conf.html#OPT_Weight Basically nodes with more resources should have a higher weight. - Adding a node feature such as "largemem" is also an excellent idea if you want to give users the ability to explicitly request and require these higher memory nodes for their jobs. If these nodes are in different queues (Slurm partitions): - Specifying partition names is all that is needed, e.g. PartitionName=largemem, PartitionName=smallmem. This does mean though that jobs will be targeted to specific queues upon submission. Some sites will also use the job_submit plugin and look at what the job is requesting and change partitions / or request a feature etc. Let me know if you have any questions Thank you for your suggestion. Our nodes are shared between three partitions. Most of our workflow uses default 4GB/core memory, but there are few workflow that requires more memory per core. Will adding weight and a feature (largemem) cause any issue? This way SLURM will fill up low memory nodes first and leave largemem nodes for workflow with more memory requirement. Patrick. (In reply to NASA JSC Aerolab from comment #2) > Will adding weight and a feature (largemem) cause any issue? No, you shouldn't run into any problems. I just tested this personally and all you'll need to do is run `scontrol reconfigure` after adding the weights and features. > This way SLURM will fill up low memory nodes first and leave largemem nodes > for workflow with more memory requirement. Yes, this is correct. Although just to be clear, even jobs without memory requirements will run on the "largemem" nodes if all other low memory nodes (lower weight) are filled up. Ben, Thanks for testing, and yes we are alright if jobs without/low memory requirements run on the largemem nodes once the low memory nodes are filled up. Okay sounds good! I'll close this bug now, but feel free to reopen if you have question related to what we talked about. |