Hi, I am in the process of upgrading about 100 systems from Ubuntu 16 (slurm 20.02.3) to Ubuntu 20 (20.11.4). Since that is a lot of systems to rebuild, we wanted to know if there was a recommended way of doing it? We are currently using no clustering software (such as Rockscluster or Bright computing). Also, do you recommend a specific clustering software? Thanks, Mike
BTW, the new cluster is already running with a few nodes in it, so I don't need info on how to configure the headnode, just the best way to migrate 100 nodes. Mike
Hi Mike, The recommendations I have for you primarily revolve around the normal upgrade procedure for Slurm. Upgrading the OS is outside the realm of what we can help with. When you do upgrade the nodes to 20.11 you would want to make sure that the Slurm controllers (slurmctld and slurmdbd) are already on 20.11. Slurm is designed to allow a newer version of a controller to communicate with an older slurmd instance, but not the other way around. We do have documentation that gives a good overview of how to upgrade your cluster that I would recommend you review. https://slurm.schedmd.com/quickstart_admin.html#upgrade Please let me know if you have any questions about the procedure as outlined in the documentation. Thanks, Ben
I have always found the upgrade portion of the quickstart_admin page to be vague and not much help. Since I have upgraded the cluster a couple of times now, that is not the issue. However, as I think that I stated, we already have the new cluster up and running and was just trying to find a good way to migrate close to 100 servers. We know that you all do not do the OS but wondered if you had any recommendations on best practices for cluster software. It sounds like you don't, which answers my question. Thanks, Mike From: "bugs@schedmd.com" <bugs@schedmd.com> Date: Tuesday, April 20, 2021 at 1:20 PM To: Michael Anthony Woodson <maw349@cornell.edu> Subject: [Bug 11418] suggestions for how to upgrade cluster Comment # 3<https://bugs.schedmd.com/show_bug.cgi?id=11418#c3> on bug 11418<https://bugs.schedmd.com/show_bug.cgi?id=11418> from Ben Roberts<mailto:ben@schedmd.com> Hi Mike, The recommendations I have for you primarily revolve around the normal upgrade procedure for Slurm. Upgrading the OS is outside the realm of what we can help with. When you do upgrade the nodes to 20.11 you would want to make sure that the Slurm controllers (slurmctld and slurmdbd) are already on 20.11. Slurm is designed to allow a newer version of a controller to communicate with an older slurmd instance, but not the other way around. We do have documentation that gives a good overview of how to upgrade your cluster that I would recommend you review. https://slurm.schedmd.com/quickstart_admin.html#upgrade Please let me know if you have any questions about the procedure as outlined in the documentation. Thanks, Ben ________________________________ You are receiving this mail because: * You reported the bug.
My apologies that I skipped over the cluster management part of your question. I'm afraid we can't endorse any kind of cluster management software. Since it sounds like you have the upgrade portion of the question under control I'll go ahead and close this ticket. Let us know if there's anything else we can do to help. Thanks, Ben