Ticket 15350

Summary: Slurm upgrade query
Product: Slurm Reporter: Sandeep Sridhara <sandeep.sridhara>
Component: slurmdAssignee: Ben Glines <ben.glines>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 3 - Medium Impact    
Priority: --- CC: sandeep.sridhara
Version: 21.08.7   
Hardware: Linux   
OS: Linux   
Site: PMI Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: slurm.conf file
slurmctld log
cgroup.conf
slurmdbd.conf

Description Sandeep Sridhara 2022-11-03 16:58:53 MDT
Created attachment 27580 [details]
slurm.conf file

Hello,

This is in continuation to the bug 15325 which was raised recently.

We are planning to upgrade the slurm from 21.08.7 to 22.05 soon. We are using RHEL 7.9 in our environment. However, we would like to know if we try to upgrade from 21.08.7 to 22.05, is it going to break any configuration?

Also,we have enclosed the slurm.conf file for your reference. Kindly check and suggest us what kind of configuration changes do we need to make in it to upgrade it to 22.05.

Thanks in advance.
Comment 1 Ben Glines 2022-11-04 10:12:44 MDT
Could you please send over the following:

- cgroup.conf
- topology.conf
- switch.conf
- slurmctld.log
Comment 2 Sandeep Sridhara 2022-11-04 12:28:18 MDT
Created attachment 27593 [details]
slurmctld log
Comment 3 Sandeep Sridhara 2022-11-04 12:28:40 MDT
Hi,

Kindly find the requested files. Please check and revert to us with your insights about our queries.

We didn't find the files named topology.conf and switch.conf. Can you please tell us is there any path where it can be found.

Have a spectacular day and weekend.
Comment 4 Sandeep Sridhara 2022-11-04 12:29:09 MDT
Created attachment 27594 [details]
cgroup.conf
Comment 5 Sandeep Sridhara 2022-11-04 12:29:42 MDT
Created attachment 27595 [details]
slurmdbd.conf
Comment 6 Sandeep Sridhara 2022-11-04 12:31:09 MDT
Kindly note that mn002 is a primary slurm database controller node. On the contrary, mn001 is a backup database controller node.
Comment 7 Ben Glines 2022-11-04 16:38:11 MDT
Hi,

I've reviewed your configure files and everything looks good for the most part. You shouldn't see any issues with your exact current configuration and 22.05. I do have a couple suggestions on some things you could consider.

I see that you have TaskPlugin=task/cgroup. We recommend using task/affinity in addition to task/cgroup when it is set. See our documentation: https://slurm.schedmd.com/slurm.conf.html#OPT_TaskPlugin:
> NOTE: It is recommended to stack task/affinity,task/cgroup together when 
> configuring TaskPlugin, and setting ConstrainCores=yes in cgroup.conf. This 
> setup uses the task/affinity plugin for setting the affinity of the tasks and 
> uses the task/cgroup plugin to fence tasks into the specified resources.

slurm.conf:
> TaskPlugin=task/cgroup,task/affinity

Depending on your usage of GPUs, you could consider switching from cons_res to cons_tres. There are several benefits for GPUs. It may add some minor slow downs, but definitely not anything major, and you can decide whether that would be worth the additional GPU features. See these links for more info on cons_tres:
https://slurm.schedmd.com/cons_res.html#using_cons_tres
https://slurm.schedmd.com/SLUG19/GPU_Scheduling_and_Cons_Tres.pdf

Let me know if you have any further questions.
Comment 8 Sandeep Sridhara 2022-11-05 13:05:33 MDT
Hi,

Thanks for the information. We will incorporate the suggested changes. Is there any other changes do you recommend in slurm.conf or whereever it's necessary to help us upgrade the slurm to 22.05.
Comment 9 Ben Glines 2022-11-07 17:13:49 MST
(In reply to Sandeep Sridhara from comment #8)
> Hi,
> 
> Thanks for the information. We will incorporate the suggested changes. Is
> there any other changes do you recommend in slurm.conf or whereever it's
> necessary to help us upgrade the slurm to 22.05.

Everything looks good to me, so no other recommended changes.

Remember to reference https://slurm.schedmd.com/quickstart_admin.html#upgrade during the process, and let us know if you need any help!

Feel free to open another bug if any problems arise. Closing this one now