10334 – Srun fails to load correct slurm.conf in multi-cluster setup

Ticket 10334 - Srun fails to load correct slurm.conf in multi-cluster setup

Summary: Srun fails to load correct slurm.conf in multi-cluster setup

Status:	RESOLVED INVALID

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	User Commands (show other tickets)
Version:	20.02.6
Hardware:	Linux Linux

Severity:	6 - No support contract
Assignee:	Jacob Jenson
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2020-12-02 03:42 MST by peter.georg
Modified:	2021-01-27 05:31 MST (History)
CC List:	0 users

See Also:
Site:	-Other-
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description peter.georg 2020-12-02 03:42:32 MST

Disclaimer: Actually this bug report is about a group of bugs. All these bugs (as others before) always boil down to this generic issue. Indeed this might not only affects srun, but potentially all slurm commands/tools.
Bug #7863 is also a similar case.

We are running a multi-cluster setup with shared "gateways". I.e. users login to one of the gateways and submit their jobs to a specific cluster using the --clusters=<cluster> option. These gateways are not part of any cluster (and hence not running slurmd) and only have a very basic slurm.conf effectively only specifying the AccountingStorage information. This works fine for submitting jobs using sbatch. However, for users using `srun --clusters=<cluster>` directly on one of the login to start a job on a remote cluster there is at least one issue:

Using the srun option `--cpu-bind` reports about this feature not being supported. However it is enabled on the particular cluster requested via --clusters.
This boils down to the _have_task_affinity function defined here (https://github.com/SchedMD/slurm/blob/23cbe39d98cfacd9434f10c19a415c6092e4c61c/src/common/slurm_resource_info.c#L133) always checking the local slurm.conf, not the cluster specific slurm.conf.

Using cons_tres to support GPU allocations and requesting at least one gpu in an allocation via srun results in an error message:

srun: error: gres_plugin_job_state_unpack: no plugin configured to unpack data type 7696487 from job 837

This is due to srun checking for GresTypes in the local slurm.conf and not, as it obviously should, in the cluster specific slurm.conf.

These are probably not the only issues caused by this bug.

Current work-around it to add GresTypes and TaskPlugin to the slurm.conf stored on the gateways. However, this limits the usage of the multi-cluster setup as it requires all clusters to have the same setting for these parameters.

Other possible work-around: Wrap srun (and pobably other slurm commands) and point SLURM_CONF to a local copy of the specified cluster's slurm.conf. Limits: I do not know how this is supposed to work in case a user specifies multiple clusters for --clusters, as this parameter actually takes a list of clusters. How is srun supposed to handle options like --cpu-bind that might not be supported by some clusters in this case?

It seems that the multi-cluster feature has not been thought about in a while. E.g., we recently combined it with the new configless feature, which led to some unexpected behaviors as well.

Comment 1 peter.georg 2021-01-27 05:31:24 MST

Just found another bug of that group of bugs:

`scontrol --clusters=dummy reboot`.

Once again the "generic" slurm.conf (typically the local /etc/slurm/slurm.conf) is loaded to verify "RebootProgram" has been set. Indeed it should check the cluster specific slurm.conf.

Again, a work-around is to set RebootProgram to an arbitrary value (the value is not used at any time).

This can be easily seen at the source code here:
https://github.com/SchedMD/slurm/blob/23cbe39d98cfacd9434f10c19a415c6092e4c61c/src/scontrol/reboot_node.c#L70