Hello, This is a generic upgrade ticket. We will be attempting to upgrade Slurm from 16.05.8 Bright 7.3 managed RPMs to custom built 17.02.8. I understand the first step here is to update the slurmdbd. It seems that the only thing involved in the upgrade itself is installing the new slurmdbd then starting it. So just to be clear, are database tables are automatically read and changed when the new slurmdbd is launched for the first time? What is the procedure for rolling back (in case I need to)? I understand that in addition to the mysqldump, we should backup the StateSaveLocation. Is there anything else we should back up? Thanks, Rob
(In reply to Robert Yelle from comment #0) > Hello, > > This is a generic upgrade ticket. We will be attempting to upgrade Slurm > from 16.05.8 Bright 7.3 managed RPMs to custom built 17.02.8. I understand > the first step here is to update the slurmdbd. It seems that the only thing > involved in the upgrade itself is installing the new slurmdbd then starting > it. So just to be clear, are database tables are automatically read and > changed when the new slurmdbd is launched for the first time? What is the > procedure for rolling back (in case I need to)? > > I understand that in addition to the mysqldump, we should backup the > StateSaveLocation. Is there anything else we should back up? > > Thanks, > > Rob Hello Rob, As you say it is mandatory that SlurmDBD be upgraded in the first place. When you restart the daemon, SlurmDBD itself will modify and adjust the required tables and fields. Once done you will not be able to downgrade the daemon directly. If you need to do so you should use a mysql dump file in order to restore the database, and then start the older daemon. The State files should also be backed up if you want to be sure to have the possibility to do a downgrade. Failing to backup this data would produce a loss of all running and pending jobs. Things like MPI libraries with Slurm integration should be recompiled because libslurm.so is changed in your case (16.05->17.02). Remember that slurmctld daemon must be upgraded before (or at the same time) than the slurmd daemons. Basically I recommend you to follow the steps and advice from https://slurm.schedmd.com/quickstart_admin.html Regarding the rollback procedure: ------------------------------------ 1. Stop all daemons 2. Downgrade all daemons 3. Delete database contents and restore the mysqldump copy 4. Restore the StateSaveLocation 5. Start SlurmDBD and check if it works, sacctmgr show cluster/assoc 6. Start slurmctld and check sinfo --version, sinfo, squeue, etc. 7. Start slurmd daemons In this case jobs and queue will be recovered, but be carefull with the timeouts. If you have any other question don't hesitate and reopen the bug. Best Regards Felip M
Hi Felip, Thank you for the info. Our Slurm upgrade seems to have gone well, no issues discovered yet... Rob On Oct 31, 2017, at 9:28 AM, bugs@schedmd.com<mailto:bugs@schedmd.com> wrote: Felip Moll<mailto:felip.moll@schedmd.com> changed bug 4319<https://bugs.schedmd.com/show_bug.cgi?id=4319> What Removed Added Resolution --- INFOGIVEN CC felip.moll@schedmd.com<mailto:felip.moll@schedmd.com> Assignee support@schedmd.com<mailto:support@schedmd.com> felip.moll@schedmd.com<mailto:felip.moll@schedmd.com> Status UNCONFIRMED RESOLVED Severity 2 - High Impact 4 - Minor Issue Comment # 1<https://bugs.schedmd.com/show_bug.cgi?id=4319#c1> on bug 4319<https://bugs.schedmd.com/show_bug.cgi?id=4319> from Felip Moll<mailto:felip.moll@schedmd.com> (In reply to Robert Yelle from comment #0<x-msg://33/show_bug.cgi?id=4319#c0>) > Hello, > > This is a generic upgrade ticket. We will be attempting to upgrade Slurm > from 16.05.8 Bright 7.3 managed RPMs to custom built 17.02.8. I understand > the first step here is to update the slurmdbd. It seems that the only thing > involved in the upgrade itself is installing the new slurmdbd then starting > it. So just to be clear, are database tables are automatically read and > changed when the new slurmdbd is launched for the first time? What is the > procedure for rolling back (in case I need to)? > > I understand that in addition to the mysqldump, we should backup the > StateSaveLocation. Is there anything else we should back up? > > Thanks, > > Rob Hello Rob, As you say it is mandatory that SlurmDBD be upgraded in the first place. When you restart the daemon, SlurmDBD itself will modify and adjust the required tables and fields. Once done you will not be able to downgrade the daemon directly. If you need to do so you should use a mysql dump file in order to restore the database, and then start the older daemon. The State files should also be backed up if you want to be sure to have the possibility to do a downgrade. Failing to backup this data would produce a loss of all running and pending jobs. Things like MPI libraries with Slurm integration should be recompiled because libslurm.so is changed in your case (16.05->17.02). Remember that slurmctld daemon must be upgraded before (or at the same time) than the slurmd daemons. Basically I recommend you to follow the steps and advice from https://slurm.schedmd.com/quickstart_admin.html Regarding the rollback procedure: ------------------------------------ 1. Stop all daemons 2. Downgrade all daemons 3. Delete database contents and restore the mysqldump copy 4. Restore the StateSaveLocation 5. Start SlurmDBD and check if it works, sacctmgr show cluster/assoc 6. Start slurmctld and check sinfo --version, sinfo, squeue, etc. 7. Start slurmd daemons In this case jobs and queue will be recovered, but be carefull with the timeouts. If you have any other question don't hesitate and reopen the bug. Best Regards Felip M ________________________________ You are receiving this mail because: * You reported the bug.
Hi Felip, So far, so good with the Slurm upgrade, no issues except for MPI libraries that you already mentioned below. We would also like to implement the network topology plugin, but were unable to get this done during our outage earlier this week. Would implementing this plugin require an outage, or can we implement this while the cluster is in production? Thanks, Rob On Nov 1, 2017, at 4:14 PM, Rob Yelle <ryelle@uoregon.edu<mailto:ryelle@uoregon.edu>> wrote: Hi Felip, Thank you for the info. Our Slurm upgrade seems to have gone well, no issues discovered yet... Rob On Oct 31, 2017, at 9:28 AM, bugs@schedmd.com<mailto:bugs@schedmd.com> wrote: Felip Moll<mailto:felip.moll@schedmd.com> changed bug 4319<https://bugs.schedmd.com/show_bug.cgi?id=4319> What Removed Added Resolution --- INFOGIVEN CC felip.moll@schedmd.com<mailto:felip.moll@schedmd.com> Assignee support@schedmd.com<mailto:support@schedmd.com> felip.moll@schedmd.com<mailto:felip.moll@schedmd.com> Status UNCONFIRMED RESOLVED Severity 2 - High Impact 4 - Minor Issue Comment # 1<https://bugs.schedmd.com/show_bug.cgi?id=4319#c1> on bug 4319<https://bugs.schedmd.com/show_bug.cgi?id=4319> from Felip Moll<mailto:felip.moll@schedmd.com> (In reply to Robert Yelle from comment #0<x-msg://33/show_bug.cgi?id=4319#c0>) > Hello, > > This is a generic upgrade ticket. We will be attempting to upgrade Slurm > from 16.05.8 Bright 7.3 managed RPMs to custom built 17.02.8. I understand > the first step here is to update the slurmdbd. It seems that the only thing > involved in the upgrade itself is installing the new slurmdbd then starting > it. So just to be clear, are database tables are automatically read and > changed when the new slurmdbd is launched for the first time? What is the > procedure for rolling back (in case I need to)? > > I understand that in addition to the mysqldump, we should backup the > StateSaveLocation. Is there anything else we should back up? > > Thanks, > > Rob Hello Rob, As you say it is mandatory that SlurmDBD be upgraded in the first place. When you restart the daemon, SlurmDBD itself will modify and adjust the required tables and fields. Once done you will not be able to downgrade the daemon directly. If you need to do so you should use a mysql dump file in order to restore the database, and then start the older daemon. The State files should also be backed up if you want to be sure to have the possibility to do a downgrade. Failing to backup this data would produce a loss of all running and pending jobs. Things like MPI libraries with Slurm integration should be recompiled because libslurm.so is changed in your case (16.05->17.02). Remember that slurmctld daemon must be upgraded before (or at the same time) than the slurmd daemons. Basically I recommend you to follow the steps and advice from https://slurm.schedmd.com/quickstart_admin.html Regarding the rollback procedure: ------------------------------------ 1. Stop all daemons 2. Downgrade all daemons 3. Delete database contents and restore the mysqldump copy 4. Restore the StateSaveLocation 5. Start SlurmDBD and check if it works, sacctmgr show cluster/assoc 6. Start slurmctld and check sinfo --version, sinfo, squeue, etc. 7. Start slurmd daemons In this case jobs and queue will be recovered, but be carefull with the timeouts. If you have any other question don't hesitate and reopen the bug. Best Regards Felip M ________________________________ You are receiving this mail because: * You reported the bug.