| Summary: | suggestions for how to upgrade cluster | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Mike Woodson <maw349> |
| Component: | Configuration | Assignee: | Ben Roberts <ben> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 20.11.4 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Cornell ITSG | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Mike Woodson
2021-04-20 06:50:40 MDT
BTW, the new cluster is already running with a few nodes in it, so I don't need info on how to configure the headnode, just the best way to migrate 100 nodes. Mike Hi Mike, The recommendations I have for you primarily revolve around the normal upgrade procedure for Slurm. Upgrading the OS is outside the realm of what we can help with. When you do upgrade the nodes to 20.11 you would want to make sure that the Slurm controllers (slurmctld and slurmdbd) are already on 20.11. Slurm is designed to allow a newer version of a controller to communicate with an older slurmd instance, but not the other way around. We do have documentation that gives a good overview of how to upgrade your cluster that I would recommend you review. https://slurm.schedmd.com/quickstart_admin.html#upgrade Please let me know if you have any questions about the procedure as outlined in the documentation. Thanks, Ben I have always found the upgrade portion of the quickstart_admin page to be vague and not much help. Since I have upgraded the cluster a couple of times now, that is not the issue. However, as I think that I stated, we already have the new cluster up and running and was just trying to find a good way to migrate close to 100 servers. We know that you all do not do the OS but wondered if you had any recommendations on best practices for cluster software. It sounds like you don't, which answers my question. Thanks, Mike From: "bugs@schedmd.com" <bugs@schedmd.com> Date: Tuesday, April 20, 2021 at 1:20 PM To: Michael Anthony Woodson <maw349@cornell.edu> Subject: [Bug 11418] suggestions for how to upgrade cluster Comment # 3<https://bugs.schedmd.com/show_bug.cgi?id=11418#c3> on bug 11418<https://bugs.schedmd.com/show_bug.cgi?id=11418> from Ben Roberts<mailto:ben@schedmd.com> Hi Mike, The recommendations I have for you primarily revolve around the normal upgrade procedure for Slurm. Upgrading the OS is outside the realm of what we can help with. When you do upgrade the nodes to 20.11 you would want to make sure that the Slurm controllers (slurmctld and slurmdbd) are already on 20.11. Slurm is designed to allow a newer version of a controller to communicate with an older slurmd instance, but not the other way around. We do have documentation that gives a good overview of how to upgrade your cluster that I would recommend you review. https://slurm.schedmd.com/quickstart_admin.html#upgrade Please let me know if you have any questions about the procedure as outlined in the documentation. Thanks, Ben ________________________________ You are receiving this mail because: * You reported the bug. My apologies that I skipped over the cluster management part of your question. I'm afraid we can't endorse any kind of cluster management software. Since it sounds like you have the upgrade portion of the question under control I'll go ahead and close this ticket. Let us know if there's anything else we can do to help. Thanks, Ben |