Hi SchedMD, we are planning to upgrade our slurm system in a few weeks from 21.08.4 to 22.05 (or some subversion of it). Jobs won't be running (we are taking an outage for a filesystem upgrade as well) so I am hoping it will be pretty straightforward. But there were a number of reports in the slurm-users mailing list of problems from various users when they tried to upgrade to the version with the security fix, 21.08.8/9, and I am looking for any advice of what else I might need to do besides recompile and follow typical upgrade steps. We do a compile with: rpmbuild -ta slurm-<ver>.tar.bz2 .rpmmacros: %_prefix /opt/slurm/slurm-<ver> I plan to upgrade as follows: 1. upgrade slurmdbd: systemctl stop slurmdbd run mysqlbackup yum update slurm rpms do the database conversion: time /usr/sbin/slurmdbd -D -vvv systemctl start slurmdbd Check this still works: sacctmgr show user -s 2. upgrade slurmctld stop slurmd on all clients systemctl stop slurmctld tar up /var/spool/slurmctld yum update slurm rpms restart slurmctld 3. upgrade slurmd on all clients and restart slurmd Thanks, Renata
Hello Renata, You already have a good plan there, but if you want more recommendations about the upgrade you could take a look here: https://slurm.schedmd.com/quickstart_admin.html#upgrade. The fact that you will have no jobs makes it safer and faster. The issue we had with the security fix is that we missed to include a commit in 21.08.8, the lack of this commit broke the communications, and that is the reason why we released the 21.08.8-2. But in your case you have nothing to worry as these commit are all in 22.05. Regards.
Renata - In addition to what Oriol mentioned, some of the issues reported on the user list were either due to mixing versions or not restarting munged. If you plan to move the entire cluster, then you should not run into issues. I would suggest that you make sure the time is in sync across the cluster and also restart munged as part of your post upgrade process.
Hi Oriol, thank you, that is really good to hear "nothing to worry about"! That is what I wanted to know. Thanks for the pointer to the upgrade documentation. Renata On Fri, 1 Jul 2022, bugs@schedmd.com wrote: >https://bugs.schedmd.com/show_bug.cgi?id=14451 > >Oriol Vilarrubi <jvilarru@schedmd.com> changed: > > What |Removed |Added >---------------------------------------------------------------------------- > Assignee|support@schedmd.com |jvilarru@schedmd.com > CC| |jvilarru@schedmd.com > >--- Comment #1 from Oriol Vilarrubi <jvilarru@schedmd.com> --- >Hello Renata, > >You already have a good plan there, but if you want more recommendations about >the upgrade you could take a look here: >https://slurm.schedmd.com/quickstart_admin.html#upgrade. >The fact that you will have no jobs makes it safer and faster. > >The issue we had with the security fix is that we missed to include a commit in >21.08.8, the lack of this commit broke the communications, and that is the >reason why we released the 21.08.8-2. But in your case you have nothing to >worry as these commit are all in 22.05. > >Regards. > >-- >You are receiving this mail because: >You reported the bug.
Hi Jason, do I need to restart munge on all systems - slurmdbd, slurmctld, slurmd? Renata On Fri, 1 Jul 2022, bugs@schedmd.com wrote: >https://bugs.schedmd.com/show_bug.cgi?id=14451 > >--- Comment #2 from Jason Booth <jbooth@schedmd.com> --- >Renata - In addition to what Oriol mentioned, some of the issues reported on >the user list were either due to mixing versions or not restarting munged. > >If you plan to move the entire cluster, then you should not run into issues. I >would suggest that you make sure the time is in sync across the cluster and >also restart munged as part of your post upgrade process. > >-- >You are receiving this mail because: >You reported the bug.
> Hi Jason, do I need to restart munge on all systems - slurmdbd, > slurmctld, slurmd? Yes, the entire cluster.
Thanks Jason, I'll add that to the upgrade steps. Renata On Fri, 1 Jul 2022, bugs@schedmd.com wrote: >https://bugs.schedmd.com/show_bug.cgi?id=14451 > >--- Comment #5 from Jason Booth <jbooth@schedmd.com> --- >> Hi Jason, do I need to restart munge on all systems - slurmdbd, >> slurmctld, slurmd? > >Yes, the entire cluster. > >-- >You are receiving this mail because: >You reported the bug.
Hello Renata, Is there anything else regarding the upgrade you want to ask us or can I proceed closing the ticket? Regards
Hi Oriol, thanks for the advice. Please go ahead and close the ticket. Thanks, Renata On Thu, 7 Jul 2022, bugs@schedmd.com wrote: >https://bugs.schedmd.com/show_bug.cgi?id=14451 > >--- Comment #7 from Oriol Vilarrubi <jvilarru@schedmd.com> --- >Hello Renata, > >Is there anything else regarding the upgrade you want to ask us or can I >proceed closing the ticket? > >Regards > >-- >You are receiving this mail because: >You reported the bug.
I'm closing this ticket as infogiven, in case you need more information related to the upgrade please do not hesitate to reopen it. Regards.