Dear SLURM team, I hope this message finds you well. My name is Vadim, and I am part of the Intel MPI development team. I am writing to discuss a commit that introduced the setting of the '--external-launcher' option for srun in the Slurm project. You can view the commit here: https://github.com/SchedMD/slurm/commit/bd50e1d8b02df675eed451880ab3479eb99c26b2 Additionally, there was a subsequent commit aimed at resolving an issue when users have already set the I_MPI_HYDRA_BOOTSTRAP environment variable. This continuation commit can be found here: https://github.com/SchedMD/slurm/commit/bd50e1d8b02df675eed451880ab3479eb99c26b2 However, this continuation commit addresses the problem only if the user sets I_MPI_HYDRA_BOOTSTRAP before node allocation. The issue arises because these variables are set when salloc is called, meaning that I_MPI_HYDRA_BOOTSTRAP_EXTRA_ARGS is already configured, and alternative values for the bootstrap variable will fail. To resolve this issue when nodes are already allocated, we propose introducing a separate environment variable on the Intel MPI side. For example, I_MPI_HYDRA_BOOTSTRAP_SLURM_EXTRA_ARGS (please note that this is not a final name proposal). I would appreciate your thoughts on this proposal. Thank you for your attention to this matter, looking forward for collaboration. -- Best regards, Vadim kutovoi Intel MPI Development +353(89) 407 3335 vadim.kutovoi@intel.com R148, Easton, Co. Kildare, Ireland Intel Corporation | www.intel.com
Hi Vadim - > However, this continuation commit addresses the problem only if the user > sets I_MPI_HYDRA_BOOTSTRAP before node allocation. The issue arises because > these variables are set when salloc is called, meaning that > I_MPI_HYDRA_BOOTSTRAP_EXTRA_ARGS is already configured, and alternative > values for the bootstrap variable will fail. Can you give me a concrete example of how this would fail? Our working assumption there is that, if someone did want to specify alternative options, they'd have those present in their bash_profile or similar, and those envvars would be set before salloc is called. I take it you have use cases where the envvars are only getting after the salloc command is run? > To resolve this issue when nodes are already allocated, we propose > introducing a separate environment variable on the Intel MPI side. For > example, I_MPI_HYDRA_BOOTSTRAP_SLURM_EXTRA_ARGS (please note that this is > not a final name proposal). That would be great, but one headache here is that we'd still want to set the existing value until we knew sites very on versions that supported the newer envvar. And that would be years off in the future, if ever. - Tim