Ticket 14451 - Advice on upgrading from 21.08.4 to 22.05
Summary: Advice on upgrading from 21.08.4 to 22.05
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmctld (show other tickets)
Version: 21.08.4
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Oriol Vilarrubi
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-06-30 16:32 MDT by Renata Dart
Modified: 2022-07-07 10:04 MDT (History)
1 user (show)

See Also:
Site: SLAC
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Renata Dart 2022-06-30 16:32:24 MDT
Hi SchedMD, we are planning to upgrade our slurm system in a few weeks from 21.08.4 to 22.05 (or some subversion of it).  Jobs won't be running (we are taking an outage for a filesystem upgrade as well) so I am hoping it will be pretty straightforward.   But there were a number of reports in the slurm-users mailing list of problems from various users when they tried to upgrade to the version with the security fix, 21.08.8/9, and I am looking for any advice of what else I might need to do besides recompile and follow typical upgrade steps.

We do a compile with:

rpmbuild -ta slurm-<ver>.tar.bz2
.rpmmacros:
    %_prefix           /opt/slurm/slurm-<ver>


I plan to upgrade as follows:

1. upgrade slurmdbd:

systemctl stop slurmdbd
run mysqlbackup
yum update slurm rpms
do the database conversion:
   time /usr/sbin/slurmdbd -D -vvv
systemctl start slurmdbd

Check this still works: sacctmgr show user -s


2.  upgrade slurmctld

stop slurmd on all clients
systemctl stop slurmctld
tar up /var/spool/slurmctld
yum update slurm rpms
restart slurmctld

3.  upgrade slurmd on all clients and restart slurmd

Thanks, Renata
Comment 1 Oriol Vilarrubi 2022-07-01 11:11:06 MDT
Hello Renata,

You already have a good plan there, but if you want more recommendations about the upgrade you could take a look here: https://slurm.schedmd.com/quickstart_admin.html#upgrade.
The fact that you will have no jobs makes it safer and faster.

The issue we had with the security fix is that we missed to include a commit in 21.08.8, the lack of this commit broke the communications, and that is the reason why we released the 21.08.8-2. But in your case you have nothing to worry as these commit are all in 22.05.

Regards.
Comment 2 Jason Booth 2022-07-01 11:41:16 MDT
Renata - In addition to what Oriol mentioned, some of the issues reported on the user list were either due to mixing versions or not restarting munged.

If you plan to move the entire cluster, then you should not run into issues. I would suggest that you make sure the time is in sync across the cluster and also restart munged as part of your post upgrade process.
Comment 3 Renata Dart 2022-07-01 11:45:29 MDT
Hi Oriol, thank you, that is really good to hear "nothing to worry about"!
That is what I wanted to know.

Thanks for the pointer to the upgrade documentation.

Renata

On Fri, 1 Jul 2022, bugs@schedmd.com wrote:

>https://bugs.schedmd.com/show_bug.cgi?id=14451
>
>Oriol Vilarrubi <jvilarru@schedmd.com> changed:
>
>           What    |Removed                     |Added
>----------------------------------------------------------------------------
>           Assignee|support@schedmd.com         |jvilarru@schedmd.com
>                 CC|                            |jvilarru@schedmd.com
>
>--- Comment #1 from Oriol Vilarrubi <jvilarru@schedmd.com> ---
>Hello Renata,
>
>You already have a good plan there, but if you want more recommendations about
>the upgrade you could take a look here:
>https://slurm.schedmd.com/quickstart_admin.html#upgrade.
>The fact that you will have no jobs makes it safer and faster.
>
>The issue we had with the security fix is that we missed to include a commit in
>21.08.8, the lack of this commit broke the communications, and that is the
>reason why we released the 21.08.8-2. But in your case you have nothing to
>worry as these commit are all in 22.05.
>
>Regards.
>
>-- 
>You are receiving this mail because:
>You reported the bug.
Comment 4 Renata Dart 2022-07-01 11:48:32 MDT
Hi Jason, do I need to restart munge on all systems - slurmdbd,
slurmctld, slurmd?

Renata

On Fri, 1 Jul 2022, bugs@schedmd.com wrote:

>https://bugs.schedmd.com/show_bug.cgi?id=14451
>
>--- Comment #2 from Jason Booth <jbooth@schedmd.com> ---
>Renata - In addition to what Oriol mentioned, some of the issues reported on
>the user list were either due to mixing versions or not restarting munged.
>
>If you plan to move the entire cluster, then you should not run into issues. I
>would suggest that you make sure the time is in sync across the cluster and
>also restart munged as part of your post upgrade process.
>
>-- 
>You are receiving this mail because:
>You reported the bug.
Comment 5 Jason Booth 2022-07-01 12:10:48 MDT
> Hi Jason, do I need to restart munge on all systems - slurmdbd,
> slurmctld, slurmd?

Yes, the entire cluster.
Comment 6 Renata Dart 2022-07-01 14:03:27 MDT
Thanks Jason, I'll add that to the upgrade steps.

Renata

On Fri, 1 Jul 2022, bugs@schedmd.com wrote:

>https://bugs.schedmd.com/show_bug.cgi?id=14451
>
>--- Comment #5 from Jason Booth <jbooth@schedmd.com> ---
>> Hi Jason, do I need to restart munge on all systems - slurmdbd,
>> slurmctld, slurmd?
>
>Yes, the entire cluster.
>
>-- 
>You are receiving this mail because:
>You reported the bug.
Comment 7 Oriol Vilarrubi 2022-07-07 02:06:05 MDT
Hello Renata,

Is there anything else regarding the upgrade you want to ask us or can I proceed closing the ticket?

Regards
Comment 8 Renata Dart 2022-07-07 08:49:34 MDT
Hi Oriol, thanks for the advice.  Please go ahead and close the ticket.

Thanks,
Renata

On Thu, 7 Jul 2022, bugs@schedmd.com wrote:

>https://bugs.schedmd.com/show_bug.cgi?id=14451
>
>--- Comment #7 from Oriol Vilarrubi <jvilarru@schedmd.com> ---
>Hello Renata,
>
>Is there anything else regarding the upgrade you want to ask us or can I
>proceed closing the ticket?
>
>Regards
>
>-- 
>You are receiving this mail because:
>You reported the bug.
Comment 9 Oriol Vilarrubi 2022-07-07 10:04:58 MDT
I'm closing this ticket as infogiven, in case you need more information related to the upgrade please do not hesitate to reopen it.

Regards.