Ticket 17230

Summary: We are planing to upgrade from 21.08.6 to 23.02.3
Product: Slurm Reporter: Ali Siavosh <Ali.Siavosh-haghighi>
Component: Build System and PackagingAssignee: Ben Roberts <ben>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 23.02.3   
Hardware: Linux   
OS: Linux   
Site: NYUMC Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: RHEL
Machine Name: Bigpurple CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: cgroup.conf
slurm.conf

Description Ali Siavosh 2023-07-18 19:37:57 MDT
Hi, 
We have been advised to contact and sync with support for upcoming SLURM upgrade from 21.08.6 to 23.02.3 By recent trainer Brandon Heaton. Our SLurm instance is detached from the cluster manager (Bright). We appreciate your input.
Comment 1 Jason Booth 2023-07-19 09:43:50 MDT
Please attach your current config files, making sure to sanitize any passwords from the slurmdbd.conf. Please also let us know when you plan to upgrade and roughly how large your database is for slurmdbd.
Comment 2 Ali Siavosh 2023-07-19 09:58:14 MDT
Created attachment 31301 [details]
slurmdbd.conf
Comment 3 Ali Siavosh 2023-07-19 10:00:15 MDT
Hi Jason,
- It will be on July 30th.
- slurmdbd is about 40 MB
- the slurmdbd.conf is attached.
Comment 4 Jason Booth 2023-07-19 10:03:02 MDT
Thank you for the config file. Please also include the other .conf files as well which includes the slurm.conf, cgroup.conf and switch.conf. The switch.conf may not be in use by your site and not present.
Comment 5 Ali Siavosh 2023-07-19 10:13:30 MDT
Created attachment 31303 [details]
cgroup.conf
Comment 6 Ali Siavosh 2023-07-19 10:14:23 MDT
Created attachment 31304 [details]
slurm.conf
Comment 7 Ali Siavosh 2023-07-19 10:14:53 MDT
here they are.
Comment 8 Ben Roberts 2023-07-20 13:46:55 MDT
Hi Ali,

You may have already seen that we have an upgrade guide that goes through the important points you need to consider for an upgrade.  If you haven't read through that already I would recommend that you take some time to familiarize yourself with it.
https://slurm.schedmd.com/quickstart_admin.html#upgrade

It looks like you are going to be upgrading from a version that is within the 2 major release window, so you won't need to take an intermediate step.  There are changes that come along between major releases, so I would recommend that you review the release notes so you have an idea of what to expect.  One thing that stands out to me is that the ConstrainKmemSpace parameter is deprecated in 23.02 and you currently have that set in your cgroup.conf file:
ConstrainKmemSpace=no

I would recommend you remove that line (or comment it out) as part of the upgrade.  To see the rest of the changes you can see the RELEASE_NOTES for 22.05 and 23.02 here:
https://github.com/SchedMD/slurm/blob/slurm-22.05/RELEASE_NOTES
https://github.com/SchedMD/slurm/blob/slurm-23.02/RELEASE_NOTES

You stated that your database is about 40 MB in size.  That will make the database upgrade go very quickly since it doesn't have a lot of data to convert.  You would want to upgrade slurmdbd first and make sure you get a message in the logs that the database conversion has completed before proceeding with the rest of the upgrade.  I also like to emphasize that it's a good idea to make backups before beginning the upgrade.

I'll also mention that the slurmdbd.conf file you sent has a password included.  Jason mentioned that the password should be sanitized before sending it, so it's possible that you changed it to something invalid, but I've marked the file as private so it's only visible to SchedMD employees now.  If it was your production password then it's possible that someone else could have seen the file and you would want to change your password.

Let me know if you have any other questions about the upgrade process.

Thanks,
Ben
Comment 9 Ben Roberts 2023-08-21 10:45:22 MDT
Hi Ali,

I saw that you updated the version information on the ticket.  Does that mean that you complete the upgrade?  Let me know if there's anything else I can do to help in this ticket or if it's ok to close.

Thanks,
Ben
Comment 10 Ali Siavosh 2023-08-23 16:13:32 MDT
Hi Yes the upgrade is completed.


Thanks

Ali Siavosh-Haghighi
Manager - HPC Solutions,
High-Performance Computing

NYU Langone Health
Medical Center Information Technology
1 Park Ave South, 9th Floor
New York, NY 10016

O: 646-524-0860
C: 347-843-2357
Ali.Siavosh-haghighi@nyumc.org<mailto:Ali.Siavosh-haghighi@nyumc.org>
nyulangone.org<http://nyulangone.org>






On Aug 21, 2023, at 12:45 PM, bugs@schedmd.com wrote:


[EXTERNAL]

Comment # 9<https://urldefense.com/v3/__https://bugs.schedmd.com/show_bug.cgi?id=17230*c9__;Iw!!MXfaZl3l!ezwjzOCDcKhAGuS3HvBoTq9VJkrl8W0ugycWhylRCv0prCvOIugpgDmA5V5_lb86GwcOYZ03fFpo_kvrxXLfZO65Rw$> on bug 17230<https://urldefense.com/v3/__https://bugs.schedmd.com/show_bug.cgi?id=17230__;!!MXfaZl3l!ezwjzOCDcKhAGuS3HvBoTq9VJkrl8W0ugycWhylRCv0prCvOIugpgDmA5V5_lb86GwcOYZ03fFpo_kvrxXK6_RRYMg$> from Ben Roberts<mailto:ben@schedmd.com>

Hi Ali,

I saw that you updated the version information on the ticket.  Does that mean
that you complete the upgrade?  Let me know if there's anything else I can do
to help in this ticket or if it's ok to close.

Thanks,
Ben

________________________________
You are receiving this mail because:

  *   You reported the bug.
Comment 11 Ben Roberts 2023-08-24 08:28:01 MDT
Ok, I'll close this ticket now.  Let us know if you need anything in the future.

Thanks,
Ben