Ticket 9148 - Is slurm ready to work on Ubuntu 20.04 LTS?
Summary: Is slurm ready to work on Ubuntu 20.04 LTS?
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Build System and Packaging (show other tickets)
Version: 20.11.x
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Jason Booth
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2020-06-02 14:07 MDT by Mike Woodson
Modified: 2020-06-05 14:56 MDT (History)
0 users

See Also:
Site: Cornell ITSG
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Mike Woodson 2020-06-02 14:07:35 MDT
We are getting ready to move our slurm installation to a newer version of Ubuntu LTS and wondered if slurm is ready to work with Ubuntu 20.04 LTS, or if you know when. If not soon, then we will want to go to Ubuntu 18.04 LTS (which I assume is workable). 

We will want to move to the latest slurm that is available at the time. 

We are currently on Ubuntu 16.04 LTS and Slurm 17.11.6. 

Mike
Comment 1 Jason Booth 2020-06-02 14:27:26 MDT
Hi Mike - Slurm will work on Ubuntu 20.04. I assume that you will be installing via source? If not please let us know how you will be installing Slurm on your new system.
Comment 2 Mike Woodson 2020-06-02 14:37:33 MDT
We will be installing from source.

I will start building a test environment within a couple of weeks. I would love to have some help with the config files, as when I originally built our slurm cluster (summer 2018), I was flying by the seat of my pants without support and we are now several versions back. One possibility is that I update the cluster to the latest Slurm while still on Ubuntu 16, but I have not yet found documentation that says how to do it. If I decide to do this to give NVIDIA time to catch up on Ubuntu 20.04 LTS, should I create a new ticket to get help doing so?

I will create another ticket when I am ready to build this replacement head-node (test environment) on Ubuntu 20. I don’t think that we will be able to upgrade Ubuntu 16.04 to Ubuntu 20.04 on the same server, so it will need to be a rebuild from scratch.

Thanks!

Mike


From: "bugs@schedmd.com" <bugs@schedmd.com>
Date: Tuesday, June 2, 2020 at 4:27 PM
To: Michael Anthony Woodson <maw349@cornell.edu>
Subject: [Bug 9148] Is slurm ready to work on Ubuntu 20.04 LTS?

Jason Booth<mailto:jbooth@schedmd.com> changed bug 9148<https://bugs.schedmd.com/show_bug.cgi?id=9148>
What
Removed
Added
Assignee
support@schedmd.com
jbooth@schedmd.com
Comment # 1<https://bugs.schedmd.com/show_bug.cgi?id=9148#c1> on bug 9148<https://bugs.schedmd.com/show_bug.cgi?id=9148> from Jason Booth<mailto:jbooth@schedmd.com>

Hi Mike - Slurm will work on Ubuntu 20.04. I assume that you will be installing

via source? If not please let us know how you will be installing Slurm on your

new system.

________________________________
You are receiving this mail because:

  *   You reported the bug.
Comment 3 Jason Booth 2020-06-02 15:58:19 MDT
> One possibility is that I update the cluster to the latest Slurm while still on Ubuntu 16

We do not test more recent versions of Slurm again older versions of Ubuntu such as 16.04 so you may run into issues building more recent versions of Slurm. As a quick test, I did try to compile 20.02.3 on 16.04 and ran into problems so upgrading on 16.04 may not be possible for you. 

> I will create another ticket when I am ready to build this replacement head-node (test environment) on Ubuntu 20. I don’t think that we will be able to upgrade Ubuntu 16.04 to Ubuntu 20.04 on the same server, so it will need to be a rebuild from scratch.


I would agree with this observation. Almost all sites that we work with do a fresh install when they upgrade between major versions of the OS.

As far as configuration help is concerned, you are welcome to open another issue with us. Please attach your slurm.conf, cgroup.conf, and topology.conf and also what your site is trying to accomplish. We can offer up some suggestions for you to consider for your upgrade.
Comment 4 Mike Woodson 2020-06-02 16:49:24 MDT
Ok.

What versions of Slurm are supported with Ubuntu 18?

My concern is that NVIDIA/CUDA does not yet support Ubuntu 20 and this is primarily a GPU cluster.

Mike


From: "bugs@schedmd.com" <bugs@schedmd.com>
Date: Tuesday, June 2, 2020 at 5:58 PM
To: Michael Anthony Woodson <maw349@cornell.edu>
Subject: [Bug 9148] Is slurm ready to work on Ubuntu 20.04 LTS?

Comment # 3<https://bugs.schedmd.com/show_bug.cgi?id=9148#c3> on bug 9148<https://bugs.schedmd.com/show_bug.cgi?id=9148> from Jason Booth<mailto:jbooth@schedmd.com>

> One possibility is that I update the cluster to the latest Slurm while still on Ubuntu 16



We do not test more recent versions of Slurm again older versions of Ubuntu

such as 16.04 so you may run into issues building more recent versions of

Slurm. As a quick test, I did try to compile 20.02.3 on 16.04 and ran into

problems so upgrading on 16.04 may not be possible for you.



> I will create another ticket when I am ready to build this replacement head-node (test environment) on Ubuntu 20. I don’t think that we will be able to upgrade Ubuntu 16.04 to Ubuntu 20.04 on the same server, so it will need to be a rebuild from scratch.





I would agree with this observation. Almost all sites that we work with do a

fresh install when they upgrade between major versions of the OS.



As far as configuration help is concerned, you are welcome to open another

issue with us. Please attach your slurm.conf, cgroup.conf, and topology.conf

and also what your site is trying to accomplish. We can offer up some

suggestions for you to consider for your upgrade.

________________________________
You are receiving this mail because:

  *   You reported the bug.
Comment 5 Jason Booth 2020-06-03 12:59:44 MDT
Hi Mike

> What versions of Slurm are supported with Ubuntu 18?

I do need to back backup a comment I mand about Ubuntu 16.04.

I went through the process of setting up a 16.04 system and walking through the full install. It turns out that you can run 20.02 on 16.04. You can also run this version on Ubuntu 18.
Comment 6 Mike Woodson 2020-06-03 17:19:15 MDT
Hmmm, ok. Is it possible then, to upgrade from 17.11.6 to x.x to 20.x on the existing Ubuntu 16 server? If so, what is the procedure to do so (Do I need to create a new ticket for this question)? Or, am I better off just building a new Ubuntu 16.04 LTS server and build directly Slurm 20.x.

This would give us time to wait for NVIDIA to catch up with Ubuntu 20 and yet benefit from the new version of Slurm now.

Mike


From: "bugs@schedmd.com" <bugs@schedmd.com>
Date: Wednesday, June 3, 2020 at 2:59 PM
To: Michael Anthony Woodson <maw349@cornell.edu>
Subject: [Bug 9148] Is slurm ready to work on Ubuntu 20.04 LTS?

Comment # 5<https://bugs.schedmd.com/show_bug.cgi?id=9148#c5> on bug 9148<https://bugs.schedmd.com/show_bug.cgi?id=9148> from Jason Booth<mailto:jbooth@schedmd.com>

Hi Mike



> What versions of Slurm are supported with Ubuntu 18?



I do need to back backup a comment I mand about Ubuntu 16.04.



I went through the process of setting up a 16.04 system and walking through the

full install. It turns out that you can run 20.02 on 16.04. You can also run

this version on Ubuntu 18.

________________________________
You are receiving this mail because:

  *   You reported the bug.
Comment 7 Jason Booth 2020-06-03 20:36:30 MDT
>Hmmm, ok. Is it possible then, to upgrade from 17.11.6 to x.x to 20.x on the existing Ubuntu 16 server?

Please consult the upgrade section for more details.

https://slurm.schedmd.com/quickstart_admin.html#upgrade

Slurm daemons will support RPCs and state files from the two previous major releases (e.g. a version 20.02.x SlurmDBD will support slurmctld daemons and commands with a version of 20.02.x, 19.05.x, or 18.08.x). This means that upgrading at least once each year is recommended. Otherwise, intermediate upgrades will be required to preserve state information.

This means you would need to jump to either 18.08 or 19.05 first before moving to 20.02.
Comment 8 Mike Woodson 2020-06-04 06:25:28 MDT
Thanks for the info. But the problem I have always had with that page is that the instructions to upgrade are basically:

Upgrade slurmdbd
Upgrade slurm

It does not tell how to actually do the upgrade. Do you just build the new version and change all of your links pointing to it? Do you build the new version in the same directory as the old version? Do you run some program which upgrades Slurm?

I don’t think that I am dense, but I don’t see a real how-to on how to upgrade slurm.

Mike


From: "bugs@schedmd.com" <bugs@schedmd.com>
Date: Wednesday, June 3, 2020 at 10:36 PM
To: Michael Anthony Woodson <maw349@cornell.edu>
Subject: [Bug 9148] Is slurm ready to work on Ubuntu 20.04 LTS?

Comment # 7<https://bugs.schedmd.com/show_bug.cgi?id=9148#c7> on bug 9148<https://bugs.schedmd.com/show_bug.cgi?id=9148> from Jason Booth<mailto:jbooth@schedmd.com>

>Hmmm, ok. Is it possible then, to upgrade from 17.11.6 to x.x to 20.x on the existing Ubuntu 16 server?



Please consult the upgrade section for more details.



https://slurm.schedmd.com/quickstart_admin.html#upgrade



Slurm daemons will support RPCs and state files from the two previous major

releases (e.g. a version 20.02.x SlurmDBD will support slurmctld daemons and

commands with a version of 20.02.x, 19.05.x, or 18.08.x). This means that

upgrading at least once each year is recommended. Otherwise, intermediate

upgrades will be required to preserve state information.



This means you would need to jump to either 18.08 or 19.05 first before moving

to 20.02.

________________________________
You are receiving this mail because:

  *   You reported the bug.
Comment 9 Jason Booth 2020-06-04 11:44:04 MDT
Mike so the upgrade consists of a few steps. As you noted in your previous comment there is some ambiguity because each site may do the upgrade differently.

Most sites will have some type of build directory for example:

/storage/src/slurm-19.05.7

Then a build directory

/storage/build/

Finally, there is a target "--prefix="

Some sites like to create symlinks to the different versions and switch when they rebuild and some sites like to just do a binary overlay from the previous install, meaning make install into the same directory and override the older binaries. This all just comes down to your preference.  


So an example upgrade would look like this:

1. Clear out all the jobs on the cluster and shutdown the slurmctld slurmdbd and slumds.
2. Backup the database "mysqldump -p slurm_db >  /storage/slurm_backup-`date +%Y-%m-%d_%H-%M-%S`"
3. Backup the state save location. You may also just want to backup your entire Slurm directory if you need to restore.
4. Run your configure e.g. 
> ./configure ---prefix=/some/location/ > /dev/null
5. make -j install
6. Next, while the cluster is still shutdown start the slurmdbd. This process 
 of upgrading the database should happen rather quickly all depending on your database size. You can "tail -f slurmdbd.log" to follow the progress.
7. After the database has upgrade you can safely start the slurmctld and then the slurmd processes. 
8. Verify a job can run
Comment 10 Jason Booth 2020-06-05 14:40:07 MDT
Mike do you require any further assistance with this issue or shall we proceed to close it out?
Comment 11 Mike Woodson 2020-06-05 14:44:50 MDT
Thanks for the info! I will give this a try and see how it goes.

I believe that I have all the info I need now, so you can close the ticket.

Thanks,

Mike


From: "bugs@schedmd.com" <bugs@schedmd.com>
Date: Thursday, June 4, 2020 at 1:44 PM
To: Michael Anthony Woodson <maw349@cornell.edu>
Subject: [Bug 9148] Is slurm ready to work on Ubuntu 20.04 LTS?

Comment # 9<https://bugs.schedmd.com/show_bug.cgi?id=9148#c9> on bug 9148<https://bugs.schedmd.com/show_bug.cgi?id=9148> from Jason Booth<mailto:jbooth@schedmd.com>

Mike so the upgrade consists of a few steps. As you noted in your previous

comment there is some ambiguity because each site may do the upgrade

differently.



Most sites will have some type of build directory for example:



/storage/src/slurm-19.05.7



Then a build directory



/storage/build/



Finally, there is a target "--prefix="



Some sites like to create symlinks to the different versions and switch when

they rebuild and some sites like to just do a binary overlay from the previous

install, meaning make install into the same directory and override the older

binaries. This all just comes down to your preference.





So an example upgrade would look like this:



1. Clear out all the jobs on the cluster and shutdown the slurmctld slurmdbd

and slumds.

2. Backup the database "mysqldump -p slurm_db >  /storage/slurm_backup-`date

+%Y-%m-%d_%H-%M-%S`"

3. Backup the state save location. You may also just want to backup your entire

Slurm directory if you need to restore.

4. Run your configure e.g.

> ./configure ---prefix=/some/location/ > /dev/null

5. make -j install

6. Next, while the cluster is still shutdown start the slurmdbd. This process

 of upgrading the database should happen rather quickly all depending on your

database size. You can "tail -f slurmdbd.log" to follow the progress.

7. After the database has upgrade you can safely start the slurmctld and then

the slurmd processes.

8. Verify a job can run

________________________________
You are receiving this mail because:

  *   You reported the bug.
Comment 12 Jason Booth 2020-06-05 14:56:24 MDT
Resolving