| Summary: | Slurm upgrade query | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Sandeep Sridhara <sandeep.sridhara> |
| Component: | slurmd | Assignee: | Ben Glines <ben.glines> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | CC: | sandeep.sridhara |
| Version: | 21.08.7 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | PMI | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: |
slurm.conf file
slurmctld log cgroup.conf slurmdbd.conf |
||
|
Description
Sandeep Sridhara
2022-11-03 16:58:53 MDT
Could you please send over the following: - cgroup.conf - topology.conf - switch.conf - slurmctld.log Created attachment 27593 [details]
slurmctld log
Hi, Kindly find the requested files. Please check and revert to us with your insights about our queries. We didn't find the files named topology.conf and switch.conf. Can you please tell us is there any path where it can be found. Have a spectacular day and weekend. Created attachment 27594 [details]
cgroup.conf
Created attachment 27595 [details]
slurmdbd.conf
Kindly note that mn002 is a primary slurm database controller node. On the contrary, mn001 is a backup database controller node. Hi, I've reviewed your configure files and everything looks good for the most part. You shouldn't see any issues with your exact current configuration and 22.05. I do have a couple suggestions on some things you could consider. I see that you have TaskPlugin=task/cgroup. We recommend using task/affinity in addition to task/cgroup when it is set. See our documentation: https://slurm.schedmd.com/slurm.conf.html#OPT_TaskPlugin: > NOTE: It is recommended to stack task/affinity,task/cgroup together when > configuring TaskPlugin, and setting ConstrainCores=yes in cgroup.conf. This > setup uses the task/affinity plugin for setting the affinity of the tasks and > uses the task/cgroup plugin to fence tasks into the specified resources. slurm.conf: > TaskPlugin=task/cgroup,task/affinity Depending on your usage of GPUs, you could consider switching from cons_res to cons_tres. There are several benefits for GPUs. It may add some minor slow downs, but definitely not anything major, and you can decide whether that would be worth the additional GPU features. See these links for more info on cons_tres: https://slurm.schedmd.com/cons_res.html#using_cons_tres https://slurm.schedmd.com/SLUG19/GPU_Scheduling_and_Cons_Tres.pdf Let me know if you have any further questions. Hi, Thanks for the information. We will incorporate the suggested changes. Is there any other changes do you recommend in slurm.conf or whereever it's necessary to help us upgrade the slurm to 22.05. (In reply to Sandeep Sridhara from comment #8) > Hi, > > Thanks for the information. We will incorporate the suggested changes. Is > there any other changes do you recommend in slurm.conf or whereever it's > necessary to help us upgrade the slurm to 22.05. Everything looks good to me, so no other recommended changes. Remember to reference https://slurm.schedmd.com/quickstart_admin.html#upgrade during the process, and let us know if you need any help! Feel free to open another bug if any problems arise. Closing this one now |