Ticket 22422 - salloc breaks Integration with Intel MPI I_MPI_HYDRA_BOOTSTRAP variable
Summary: salloc breaks Integration with Intel MPI I_MPI_HYDRA_BOOTSTRAP variable
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other tickets)
Version: 23.11.1
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Tim Wickberg
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2025-03-24 09:41 MDT by Vadim Kutovoi
Modified: 2025-03-24 10:53 MDT (History)
0 users

See Also:
Site: -Other-
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Vadim Kutovoi 2025-03-24 09:41:47 MDT
Dear SLURM team,

I hope this message finds you well. My name is Vadim, and I am part of the Intel MPI development team.

I am writing to discuss a commit that introduced the setting of the '--external-launcher' option for srun in the Slurm project. You can view the commit here: https://github.com/SchedMD/slurm/commit/bd50e1d8b02df675eed451880ab3479eb99c26b2

Additionally, there was a subsequent commit aimed at resolving an issue when users have already set the I_MPI_HYDRA_BOOTSTRAP environment variable. This continuation commit can be found here: https://github.com/SchedMD/slurm/commit/bd50e1d8b02df675eed451880ab3479eb99c26b2

However, this continuation commit addresses the problem only if the user sets I_MPI_HYDRA_BOOTSTRAP before node allocation. The issue arises because these variables are set when salloc is called, meaning that I_MPI_HYDRA_BOOTSTRAP_EXTRA_ARGS is already configured, and alternative values for the bootstrap variable will fail.

To resolve this issue when nodes are already allocated, we propose introducing a separate environment variable on the Intel MPI side. For example, I_MPI_HYDRA_BOOTSTRAP_SLURM_EXTRA_ARGS (please note that this is not a final name proposal).

I would appreciate your thoughts on this proposal.
Thank you for your attention to this matter, looking forward for collaboration.

--
Best regards,
Vadim kutovoi

Intel MPI Development
+353(89) 407 3335
vadim.kutovoi@intel.com
R148, Easton, Co. Kildare, Ireland
Intel Corporation | www.intel.com
Comment 1 Tim Wickberg 2025-03-24 10:52:38 MDT
Hi Vadim -

> However, this continuation commit addresses the problem only if the user
> sets I_MPI_HYDRA_BOOTSTRAP before node allocation. The issue arises because
> these variables are set when salloc is called, meaning that
> I_MPI_HYDRA_BOOTSTRAP_EXTRA_ARGS is already configured, and alternative
> values for the bootstrap variable will fail.

Can you give me a concrete example of how this would fail?

Our working assumption there is that, if someone did want to specify alternative options, they'd have those present in their bash_profile or similar, and those envvars would be set before salloc is called.

I take it you have use cases where the envvars are only getting after the salloc command is run?

> To resolve this issue when nodes are already allocated, we propose
> introducing a separate environment variable on the Intel MPI side. For
> example, I_MPI_HYDRA_BOOTSTRAP_SLURM_EXTRA_ARGS (please note that this is
> not a final name proposal).

That would be great, but one headache here is that we'd still want to set the existing value until we knew sites very on versions that supported the newer envvar. And that would be years off in the future, if ever.

- Tim