Summary: | salloc breaks Integration with Intel MPI I_MPI_HYDRA_BOOTSTRAP variable | ||
---|---|---|---|
Product: | Slurm | Reporter: | Vadim Kutovoi <vadim.kutovoi> |
Component: | Other | Assignee: | Tim Wickberg <tim> |
Status: | OPEN --- | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | ||
Version: | 23.11.1 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | -Other- | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Tzag Elita Sites: | --- |
Linux Distro: | --- | Machine Name: | |
CLE Version: | Version Fixed: | ||
Target Release: | --- | DevPrio: | --- |
Emory-Cloud Sites: | --- |
Description
Vadim Kutovoi
2025-03-24 09:41:47 MDT
Hi Vadim - > However, this continuation commit addresses the problem only if the user > sets I_MPI_HYDRA_BOOTSTRAP before node allocation. The issue arises because > these variables are set when salloc is called, meaning that > I_MPI_HYDRA_BOOTSTRAP_EXTRA_ARGS is already configured, and alternative > values for the bootstrap variable will fail. Can you give me a concrete example of how this would fail? Our working assumption there is that, if someone did want to specify alternative options, they'd have those present in their bash_profile or similar, and those envvars would be set before salloc is called. I take it you have use cases where the envvars are only getting after the salloc command is run? > To resolve this issue when nodes are already allocated, we propose > introducing a separate environment variable on the Intel MPI side. For > example, I_MPI_HYDRA_BOOTSTRAP_SLURM_EXTRA_ARGS (please note that this is > not a final name proposal). That would be great, but one headache here is that we'd still want to set the existing value until we knew sites very on versions that supported the newer envvar. And that would be years off in the future, if ever. - Tim |