Ticket 17788 - Mixing host operating systems in cluster
Summary: Mixing host operating systems in cluster
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other tickets)
Version: 23.02.3
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Marcin Stolarek
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2023-09-27 05:27 MDT by sarah.summers
Modified: 2023-09-28 03:22 MDT (History)
1 user (show)

See Also:
Site: STFC UKR
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description sarah.summers 2023-09-27 05:27:15 MDT
Hi,

We are looking to migrate from using CentOS 7 to Rocky 9 on our cluster.
Ideally we would like to migrate gradually over a period of time.

Is it possible to have hosts with different OS in the cluster or will this be problematic. I am wondering if Slurm RPMs built on CentOS host could be installed/used on a Rocky host.

I would be grateful for any advice. We are currently using Slurm version  21.08.8-2 but are planning to move to Slurm version 23.02.3 soon.

Kind regards,

Sarah
Comment 1 Marcin Stolarek 2023-09-27 06:23:47 MDT
Sarah,

>Is it possible to have hosts with different OS in the cluster or will this be problematic.
This shouldn't be an issue, but the binaries slurmd/slurmstepd has to be build on the same OS. This comes from the fact that on newer OS we're linking to other libraries, in different locations etc. and the fact that depending on the features provided by those final source code may differ.

Just to give you one example CentOS7 and Rocky 9 are likely using different major version of hwloc, which results in different code being used on Slurm side[1], for instance library decided to change the function prototype:
>#if HWLOC_API_VERSION >= 0x00020000
>	return hwloc_topology_export_xml(topology, hwloc_xml, 0);
>#else
>	return hwloc_topology_export_xml(topology, hwloc_xml);
>#endif

cheers,
Marcin
[1]https://github.com/SchedMD/slurm/blob/slurm-23-02-5-1/src/slurmd/common/xcpuinfo.c#L161-L165
Comment 2 sarah.summers 2023-09-28 01:56:10 MDT
Hi Marcin,

Thanks for the information.
Just to clarify, for a host with CentOS installed the Slurm rpms which are installed must have been built on an equivalent CentOS host; and for a Rocky 9 host the Slurm rpms installed must have been built on a Rocky 9 host. Is that correct?

As you only mentioned slurmd/slurmstepd am I correct in thinking that it is OK for the Slurm controller and Slurm database nodes to be running CentOS 7 with some compute running Rocky 9?

Thanks.

Kind regards,

Sarah
Comment 3 Marcin Stolarek 2023-09-28 03:15:04 MDT
>Just to clarify, for a host with CentOS installed the Slurm rpms which are installed must have been built on an equivalent CentOS host; and for a Rocky 9 host the Slurm rpms installed must have been built on a Rocky 9 host. Is that correct?
Yes.

>As you only mentioned slurmd/slurmstepd am I correct in thinking that it is OK for the Slurm controller and Slurm database nodes to be running CentOS 7 with some compute running Rocky 9?
Yes - it should be fine, as long as the binaries are build on the same OS where used.

Let me know if you have any further questions.

cheers,
Marcin
Comment 4 sarah.summers 2023-09-28 03:21:30 MDT
Hi Marcin,

Thanks for all of the information, it is very helpful.
Please feel free to close the ticket.

Kind regards,

Sarah