| Summary: | MPI parameters oob_tcp_if_exclude and oob_tcp_if_include are ignored by srun | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Nancy <nancy.kritkausky> |
| Component: | Other | Assignee: | Moe Jette <jette> |
| Status: | RESOLVED DUPLICATE | QA Contact: | |
| Severity: | 5 - Enhancement | ||
| Priority: | --- | CC: | da, guillaume.papaure, Rod.Schultz |
| Version: | 2.3.x | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: | https://bugs.schedmd.com/show_bug.cgi?id=8494 | ||
| Site: | CEA | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
|
Description
Nancy
2012-07-10 02:31:55 MDT
From what I can tell bullxmpi is a variant of OpenMPI, so I would expect slurm would be configured to use the mpi/openmpi or mpi/none plugin (both are essentially identical), correct? Both plugins do essentially nothing. Since slurm isn't creating these connections, I just need to know how a user specifies the oob_tcp_if_include or exclude parameters and how that information should be passed along to the spawned tasks (e.g. setting some environment variable perhaps). We'll need some advice from the bullxmpi experts. I will request some more information for you on how this works. Bull is using a flavor of OpenMPI. I have requested the version. Here is a recent comment on this problem. I have also attached the versions of bullxmpi they are running. The problem exists on CEA/T100 also. The Slurm version is 2.3.3 and bullxmpi-1.1.14. But I don't think this is a recent problem, I noticed this with our firsts test of srun a year ago (versions was slurm 2.2.x and bullxmpi 1.1.x). With bullxmpi trace activated we see that the parameters are used and correctly managed (interfaces rejected or included), but without effects on the final results : the eth0 is always used. Same tests with salloc + mpirun is OK. This problem can reproduced by an simple "hello world" : export OMPI_MCA_oob_tcp_if_exclude=eth0 srun -n 2 -N 2 ./hello I have the rpms, but they are very large, can I email them to you. Please email the RPM for Bull's MPI directly to me. SLURM is configured with MpiDefault=none, so all that srun is doing is launching the processes without doing anything with the network other than setting up application stdin/out/err over the communication network assigned to slurm (typically ethernet). I would expect the application's libraries to interpret oob_tcp_if_include and oob_tcp_if_include and establish its own network connections for MPI. We will probably need to work with Bull's MPI developers to resolve this. Do you have any contact information for them? I see quite a few places where bullxmpi sets and gets environment variables with a prefix of "OMPI_MCA" as shown below, but I see no signs of anything that would reference "OMPI_MCA_oob_tcp_if_exclude". I can confirm in my tests that srun does forward the environment variable to the spawned user tasks, but from that point on it is the responsibility of the MPI libraries to interpret those environment variables and open network connections. Perhaps Bull has an in-house MPI expert? I'd be happy to work with him, but am not really in a position to debug the MPI source code.
orte/mca/ess/env/ess_env_module.c: nodelist = getenv("OMPI_MCA_orte_nodelist");
orte/mca/ess/lsf/ess_lsf_module.c: nodelist = getenv("OMPI_MCA_orte_nodelist");
orte/mca/ess/slurmd/ess_slurmd_module.c: putenv("OMPI_MCA_grpcomm=hier");
orte/mca/ess/slurmd/ess_slurmd_module.c: putenv("OMPI_MCA_routed=direct");
Thanks for the analysis. I will find out who you can interface with the Bull MPI team and we can go from there. Nancy Hi, i'm the bullxmpi interface. This mca paramter is ignored when using srun because, in this case, ranks are finding the ip address of their oob endpoints by calling gethosbyname (i think this is the only way they have to find it). Is your /etc/hosts or dns server configured with ip from ib devices devices ? *** This ticket has been marked as a duplicate of ticket 144 *** |