Ticket 11832

Summary: SLURM upgrade after bright cluster manager upgrade
Product: Slurm Reporter: Nathan Elger <elger>
Component: ConfigurationAssignee: Jason Booth <jbooth>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: - Unsupported Older Versions   
Hardware: Linux   
OS: Linux   
Site: U of South Carolina Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Nathan Elger 2021-06-15 08:48:57 MDT
We're planning to upgrade our bright cluster manager version, and go to the 20.02 version they support in bright 8. It looks like we will need to manually upgrade our current slurm 16.05.08 after the bright upgrade. Are there any major gotchas I should look out for going from slurm 16 to 20? I plan to do this upgrade during a maintenance window where the cluster will be offline and all jobs killed. I plan to follow your process in the docs here https://slurm.schedmd.com/quickstart_admin.html#upgrade

Thanks!
Comment 1 Jason Booth 2021-06-15 10:29:30 MDT
> Are there any major gotchas I should look out for going from slurm 16 to 20?

If you intend to upgrade and not install a fresh instance of Slurm then you will need to stagger your upgrade.

Slurm daemons will support RPCs and state files from the two previous major releases (e.g. a version 20.11.x SlurmDBD will support slurmctld daemons and commands with a version of 20.11.x, 20.02.x, or 19.05.x). 

So, if you are upgrading from 16.05 you will need to make a few jumps.
16.05 -> 17.11
17.11 -> 19.05
19.05 -> 20.11


> I plan to do this upgrade during a maintenance window where the cluster will
> be offline and all jobs killed.

This is the most optimal way in this situation. There are a number of changes related to configuration options. Most of these are reported in the logs as you try to start the controller.

Please also make sure to look over the NEWS as there are other changes that you may need to be aware of.
https://github.com/SchedMD/slurm/blob/master/NEWS
Comment 2 Jason Booth 2021-06-25 15:49:26 MDT
Please feel free to re-open if you have further questions.