Ticket 9979 - MPI_COMM_SPAWN fails with srun
Summary: MPI_COMM_SPAWN fails with srun
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other tickets)
Version: 20.02.3
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: Felip Moll
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2020-10-12 23:42 MDT by issp2020support
Modified: 2020-10-30 07:25 MDT (History)
0 users

See Also:
Site: U of Tokyo
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description issp2020support 2020-10-12 23:42:41 MDT
I cannot run the program using MPI_COMM_SPAWN with srun.
I use OpenMPI 4.0.4.
The program runs successfully when I use "mpirun" instead of srun.
Is there any way to run the program with srun?


You can get sample program from following URL:
https://github.com/yomichi/Test_MPI/tree/master/Comm_spawn/ready


The error log is as follows:

[c15u01n1:411284] *** An error occurred in MPI_Comm_spawn
[c15u01n1:411284] *** reported by process [9223372037753929728,0]
[c15u01n1:411284] *** on communicator MPI_COMM_SELF
[c15u01n1:411284] *** MPI_ERR_SPAWN: could not spawn processes
[c15u01n1:411284] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[c15u01n1:411284] ***    and potentially your MPI job)
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
I'm parent
slurmstepd: error: *** STEP 13720.0 ON c15u01n1 CANCELLED AT 2020-10-13T13:57:52 ***
srun: error: c15u01n1: task 0: Exited with exit code 50
Comment 1 issp2020support 2020-10-23 12:00:50 MDT
Could you give me update?
Comment 2 Felip Moll 2020-10-26 07:09:59 MDT
Sorry for late response, I was out last week.

(In reply to issp2020support from comment #1)
> Could you give me update?

This is something related to OpenMPI. I am doing some tests. Will let you know about it asap.


With Intel MPI it just works:
]$ srun -n4 ./master
I'm parent
I'm parent
I'm parent
I'm parent
  I'm 0 of 1
  I'm spawned by MPI_Comm_spawn
Master received value: 12345
  I'm 0 of 1
  I'm spawned by MPI_Comm_spawn
Master received value: 12345
  I'm 0 of 1
  I'm spawned by MPI_Comm_spawn
Master received value: 12345
  I'm 0 of 1
  I'm spawned by MPI_Comm_spawn
Master received value: 12345
Comment 5 Felip Moll 2020-10-29 09:26:29 MDT
Hi,

I investigated more this issue and also talked with OpenMPI developers.

The point is that nowadays Slurm's PMI and PMI-2 API accepts the comm_spawn() calls and manages it, but back when the Slurm PMI support was originally written for OpenMPI it didn't support comm_spawn, so OpenMPI didn't implement that. This changed but there has not been any effort in OpenMPI to implement this.

So basically if OpenMPI detects PMI or PMI-2 it won't pass the call to Slurm. It will however pass the request to the Slurm daemon under PMIx, but the Slurm PMIx plugin doesn't support that call and will return "not supported", so it doesn't work in that scenario either.

Note also that Intel MPI has implemented support for dynamic processes (mpi_comm_spawn()) but only in their most recent version: Intel MPI Library 2019 release 8. (see release notes item: - PMI2 spawn support). It isn't either in 2021 (Beta) until Beta 07.

To summarize: mpi_comm_spawn() is not supported in OpenMPI + Slurm because OpenMPI won't pass that call to Slurm's pmi or pmi-2, and in pmix plugin it is not supported either.

Does it make sense?
Comment 6 issp2020support 2020-10-29 12:09:18 MDT
Thank you for the update.
I understand that I need to use "mpirun" for mpi_comm_spawn.

In this situation, I think I cannot get process accounting information such as cputime and memory usage because I don't use "srun".
Is it correct?
My customer wants those info.

And could you give me permission to see bug #10092?
>You are not authorized to access bug #10092.
Comment 7 Felip Moll 2020-10-29 13:47:22 MDT
(In reply to issp2020support from comment #6)
> Thank you for the update.
> I understand that I need to use "mpirun" for mpi_comm_spawn.
> 
> In this situation, I think I cannot get process accounting information such
> as cputime and memory usage because I don't use "srun".
> Is it correct?
> My customer wants those info.
> 
> And could you give me permission to see bug #10092?
> >You are not authorized to access bug #10092.

You will have accounting for the job but not for the sub-processes, because you won't launch Slurm steps. Correct.

Why do your customer need to use mpi_comm_spawn()? Don't he have any other way to spawn processes?
Comment 8 issp2020support 2020-10-29 14:58:56 MDT
I don't know the reason at this moment.
I will tell him the limitation WRT OpenMPI with mpi_comm_spawn() and ask him to use the latest Intel MPI.
Comment 9 Felip Moll 2020-10-30 07:25:09 MDT
(In reply to issp2020support from comment #8)
> I don't know the reason at this moment.
> I will tell him the limitation WRT OpenMPI with mpi_comm_spawn() and ask him
> to use the latest Intel MPI.

Ok.

I am closing this issue, we've opened bug 10092 to document all this stuff.

Thanks and don't hesitate to ask again if more questions arise.