Ticket 17788

Summary: Mixing host operating systems in cluster
Product: Slurm Reporter: sarah.summers
Component: OtherAssignee: Marcin Stolarek <cinek>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: cinek
Version: 23.02.3   
Hardware: Linux   
OS: Linux   
Site: STFC UKR Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description sarah.summers 2023-09-27 05:27:15 MDT
Hi,

We are looking to migrate from using CentOS 7 to Rocky 9 on our cluster.
Ideally we would like to migrate gradually over a period of time.

Is it possible to have hosts with different OS in the cluster or will this be problematic. I am wondering if Slurm RPMs built on CentOS host could be installed/used on a Rocky host.

I would be grateful for any advice. We are currently using Slurm version  21.08.8-2 but are planning to move to Slurm version 23.02.3 soon.

Kind regards,

Sarah
Comment 1 Marcin Stolarek 2023-09-27 06:23:47 MDT
Sarah,

>Is it possible to have hosts with different OS in the cluster or will this be problematic.
This shouldn't be an issue, but the binaries slurmd/slurmstepd has to be build on the same OS. This comes from the fact that on newer OS we're linking to other libraries, in different locations etc. and the fact that depending on the features provided by those final source code may differ.

Just to give you one example CentOS7 and Rocky 9 are likely using different major version of hwloc, which results in different code being used on Slurm side[1], for instance library decided to change the function prototype:
>#if HWLOC_API_VERSION >= 0x00020000
>	return hwloc_topology_export_xml(topology, hwloc_xml, 0);
>#else
>	return hwloc_topology_export_xml(topology, hwloc_xml);
>#endif

cheers,
Marcin
[1]https://github.com/SchedMD/slurm/blob/slurm-23-02-5-1/src/slurmd/common/xcpuinfo.c#L161-L165
Comment 2 sarah.summers 2023-09-28 01:56:10 MDT
Hi Marcin,

Thanks for the information.
Just to clarify, for a host with CentOS installed the Slurm rpms which are installed must have been built on an equivalent CentOS host; and for a Rocky 9 host the Slurm rpms installed must have been built on a Rocky 9 host. Is that correct?

As you only mentioned slurmd/slurmstepd am I correct in thinking that it is OK for the Slurm controller and Slurm database nodes to be running CentOS 7 with some compute running Rocky 9?

Thanks.

Kind regards,

Sarah
Comment 3 Marcin Stolarek 2023-09-28 03:15:04 MDT
>Just to clarify, for a host with CentOS installed the Slurm rpms which are installed must have been built on an equivalent CentOS host; and for a Rocky 9 host the Slurm rpms installed must have been built on a Rocky 9 host. Is that correct?
Yes.

>As you only mentioned slurmd/slurmstepd am I correct in thinking that it is OK for the Slurm controller and Slurm database nodes to be running CentOS 7 with some compute running Rocky 9?
Yes - it should be fine, as long as the binaries are build on the same OS where used.

Let me know if you have any further questions.

cheers,
Marcin
Comment 4 sarah.summers 2023-09-28 03:21:30 MDT
Hi Marcin,

Thanks for all of the information, it is very helpful.
Please feel free to close the ticket.

Kind regards,

Sarah