6772 – mpiexec launch fails when using --export=NONE

Ticket 6772 - mpiexec launch fails when using --export=NONE

Summary: mpiexec launch fails when using --export=NONE

Status:	RESOLVED FIXED

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	slurmd (show other tickets)
Version:	18.08.6
Hardware:	Linux Linux

Severity:	4 - Minor Issue
Assignee:	Nate Rini
QA Contact:	Alejandro Sanchez

URL:

Duplicates (2):	6781 6977 (view as ticket list)
Depends on:
Blocks:

Reported:	2019-03-28 15:57 MDT by Mark Schmitz
Modified:	2019-09-12 13:33 MDT (History)
CC List:	9 users (show)

See Also:	6977 6198 7734
Site:	Sandia National Laboratories
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:	18.08.9, 19.05.3, 20.02
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
script that submits the job (496 bytes, application/x-shellscript) 2019-03-28 15:57 MDT, Mark Schmitz	Details
job launch script (680 bytes, text/plain) 2019-03-28 15:58 MDT, Mark Schmitz	Details
Failed job output (6.95 KB, text/plain) 2019-03-28 15:59 MDT, Mark Schmitz	Details
Successful job output (17.91 KB, text/plain) 2019-03-28 15:59 MDT, Mark Schmitz	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Mark Schmitz 2019-03-28 15:57:23 MDT

Created attachment 9733 [details]
script that submits the job

In testing remote job submittal with -A/--clusters= in a test cluster configuration that has a common slurmdbd server, I found that simple bash type jobs worked fine, but when I tried to run openmpi jobs there were problems with the environment being passed to the remote cluster. So I started playing with --export=NONE, and found that here again the bash jobs run fine, but when I tried to run a job with mpiexec it failed. I then tried running the job locally and it also failed. In doing some debugging it looks like the srun that starts orted on the rank 0 node is failing because it can't find orted. I've attached the submit and launch scripts along with output files for jobs submitted with and without the --export=NONE. I think I've echoed enough of the environment to show what's happening. I'm not sure what I'm doing wrong as from what I can see the environment is the same for both jobs.

Thanks,
Mark

Comment 1 Mark Schmitz 2019-03-28 15:58:19 MDT

Created attachment 9734 [details]
job launch script

Comment 2 Mark Schmitz 2019-03-28 15:59:17 MDT

Created attachment 9735 [details]
Failed job output

Comment 3 Mark Schmitz 2019-03-28 15:59:56 MDT

Created attachment 9736 [details]
Successful job output

Comment 4 Marshall Garey 2019-04-04 10:29:55 MDT

I'm looking into this. It does look like the environment is basically the same (with minor differences that I wouldn't think would make a difference).

To clarify - this only happens when running with mpiexec/mpirun, not when running with srun, correct? Or do these mpi jobs with --export=none also fail with srun?


(I believe that bug 6781 is also a duplicate of this question, so that may get merged into this one at some point (or vice-versa).)

Comment 5 Marshall Garey 2019-04-04 11:53:15 MDT

Also, from the outputs you uploaded, it appears that you're using IntelMPI and not OpenMPI. Is that correct?

I've reproduced what you're seeing with --export=none. I'm using OpenMPI with pmi or pmi2 right now, so I guess it probably doesn't matter if you're using IntelMPI or OpenMPI.

Comment 7 Mark Schmitz 2019-04-08 16:48:47 MDT

This simple mpi job does fail when using srun instead of mpirun in the sbatch script, but the error is a little different.

The error from srun is:

/home/user1/cbench/cbench-test1/bin/mpi_hello_ordered: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory

and this one also runs fine with not using --export=NONE.

Used the following in the sbatch script:

export mpiprogram=/home/user1/cbench/cbench-test1/bin/mpi_hello_ordered
export I_MPI_PMI_LIBRARY=/lib64/slurm/mpi_pmi2.so
srun --mpi=pmi2 $mpiprogram

Comment 10 Marshall Garey 2019-04-23 16:10:55 MDT

*** Ticket 6781 has been marked as a duplicate of this ticket. ***

Comment 12 Nate Rini 2019-05-06 12:38:23 MDT

(In reply to Mark Schmitz from comment #0)
> started playing with --export=NONE, and found that here again the bash jobs
> run fine, but when I tried to run a job with mpiexec it failed.

Did you try adding PATH and LD_LIBRARY path to --export?
Here is an example:
>sbatch --cluster=test1 --export=NONE,PATH=/usr/bin/:/bin:/usr/local/bin,LD_LIBRARY_PATH=/lib:/lib64:/usr/lib:/usr/lib64 --account=${myaccount} --partition=batch --output=slurm-test1-mpi_hello-n6-%j.out --ntasks-per-node=36 --nodes=6 --time=00:04:00 --job-name=mpi_helloTEST $mpi_program

Can you please call the following (bash shell):
> ldd $(which $mpi_program)

Can you please grep your slurmctld logs for job 6460 and attach that output.

--Nate

Comment 13 Mark Schmitz 2019-05-08 11:05:37 MDT

Okay tried a few things. Answering the questions in reverse order. First here are the entries from the slurmctld.log for the original jobs submitted in the attachments:

[2019-03-21T09:05:21.919] _slurm_rpc_submit_batch_job: JobId=9460 InitPrio=1773140 usec=287
[2019-03-21T09:05:22.077] sched: Allocate JobID=9460 NodeList=test[3-8] #CPUs=216 Partition=batch
[2019-03-21T09:05:24.949] _job_complete: JobID=9460 State=0x1 NodeCnt=6 WEXITSTATUS 0
[2019-03-21T09:05:24.950] _job_complete: JobID=9460 State=0x8003 NodeCnt=6 done

doesn't really show anything.

Next here is the ldd for the example mpi program:

ldd /home/user1/cbench/cbench-test1/bin/mpi_hello_ordered
        linux-vdso.so.1 =>  (0x00002aaaaaacd000)
        libmpi.so.12 => /opt/openmpi/1.10/intel/lib/libmpi.so.12 (0x00002aaaaaccf000)
        libm.so.6 => /lib64/libm.so.6 (0x00002aaaaafdb000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002aaaab2dd000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00002aaaab4f3000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00002aaaab70f000)
        libc.so.6 => /lib64/libc.so.6 (0x00002aaaab913000)
        libopen-rte.so.12 => /opt/openmpi/1.10/intel/lib/libopen-rte.so.12 (0x00002aaaabce0000)
        libopen-pal.so.13 => /opt/openmpi/1.10/intel/lib/libopen-pal.so.13 (0x00002aaaabf79000)
        libnuma.so.1 => /lib64/libnuma.so.1 (0x00002aaaac284000)
        librt.so.1 => /lib64/librt.so.1 (0x00002aaaac490000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00002aaaac698000)
        libimf.so => /opt/intel/16.0/compiler/lib/intel64/libimf.so (0x00002aaaac89b000)
        libsvml.so => /opt/intel/16.0/compiler/lib/intel64/libsvml.so (0x00002aaaacd97000)
        libirng.so => /opt/intel/16.0/compiler/lib/intel64/libirng.so (0x00002aaaadc54000)
        libintlc.so.5 => /opt/intel/16.0/compiler/lib/intel64/libintlc.so.5 (0x00002aaaadfb4000)
        /lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000)

So from that it appears that I need to add both the executable path for mpiexec and orted, and the ld library path for the intel compiler and mpi. So I tried:

--export=NONE,PATH=/usr/bin/:/bin:/usr/local/bin:/opt/openmpi/1.10/intel/bin,LD_LIBRARY_PATH=/lib:/lib64:/usr/lib:/usr/lib64:/opt/openmpi/1.10/intel/lib:/opt/intel/16.0/compiler/lib/intel64

and while this did allow all the libraries to be found (the ldd command above)
and mpiexec to be found, it still failed to find orted when running the job and the results are the same as those attached with the following error:

--------------------------------------------------------------------------
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
--------------------------------------------------------------------------
status=0

So as with the original attachments doing the additions to --export did not seem to make any difference. And I did try a number of combinations of PATH and LD_LIBRARY_PATH with this starting with the bare essentials in your example. It wasn't until I added the intel bin, compiler lib and mpi lib that it got this far. I don't understand why the setup in the launch script is being ignored, since all the proper paths are there. That would be the behavior I would expect.

I am also curious since Marshall was able to reproduce this a month ago and I've heard nothing until now, why we seem to be essentially starting over with this issue?

Comment 15 Nate Rini 2019-05-08 12:05:43 MDT

> I don't understand why the setup in the launch script
> is being ignored, since all the proper paths are there. That would be the
> behavior I would expect.

When "--export=NONE" is on a salloc/sbatch, it will strip the environment from the caller except the SLURM variables and a basic PATH. Can you try adding "unset SLURM_EXPORT_ENV" before calling mpiexec to stop it from inheriting your "--export" arguments.

> I am also curious since Marshall was able to reproduce this a month ago and
> I've heard nothing until now, why we seem to be essentially starting over
> with this issue?

AFAIK, Marshall was unable to replicate your issue. He is currently on vacation and I am covering this ticket in the meantime.

Comment 16 Mark Schmitz 2019-05-08 12:23:27 MDT

(In reply to Nate Rini from comment #15)
> > I don't understand why the setup in the launch script
> > is being ignored, since all the proper paths are there. That would be the
> > behavior I would expect.
> 
> When "--export=NONE" is on a salloc/sbatch, it will strip the environment
> from the caller except the SLURM variables and a basic PATH. Can you try
> adding "unset SLURM_EXPORT_ENV" before calling mpiexec to stop it from
> inheriting your "--export" arguments.

So the receiving slurmctld strips the commands being executed in the launch script that sets up the environment for running mpiexec? That doesn't make sense. I can see that any environment that might be gathered when the sbatch is submitted would be stripped, but the launch script is being executed on the receiving cluster when launching the job and it is the job, so I would expect that commands executed in that script (just before the mpiexec or srun) that modify the environment would be effective in changing the environment. If not then I would question why have such a switch in the first place.

> > I am also curious since Marshall was able to reproduce this a month ago and
> > I've heard nothing until now, why we seem to be essentially starting over
> > with this issue?
> 
> AFAIK, Marshall was unable to replicate your issue. He is currently on
> vacation and I am covering this ticket in the meantime.

Well that's interesting because in Comment 5 of this ticket he says:
I've reproduced what you're seeing with --export=none. I'm using OpenMPI with pmi or pmi2 right now, so I guess it probably doesn't matter if you're using IntelMPI or OpenMPI.

Comment 17 Nate Rini 2019-05-08 12:41:27 MDT

(In reply to Mark Schmitz from comment #16)
> So the receiving slurmctld strips the commands being executed in the launch
> script that sets up the environment for running mpiexec? That doesn't make
> sense.
mpiexec eventually calls srun to execute your jobs if Openmpi is compiled and configured with Slurm support. 

> I can see that any environment that might be gathered when the sbatch
> is submitted would be stripped, but the launch script is being executed on
> the receiving cluster when launching the job and it is the job, so I would
> expect that commands executed in that script (just before the mpiexec or
> srun) that modify the environment would be effective in changing the
> environment. If not then I would question why have such a switch in the
> first place.
Some sites need their environment completely scrubbed to run jobs across different distros (or containers) and set them up explicitly at job start.

> > > I am also curious since Marshall was able to reproduce this a month ago and
> > > I've heard nothing until now, why we seem to be essentially starting over
> > > with this issue?
> > 
> > AFAIK, Marshall was unable to replicate your issue. He is currently on
> > vacation and I am covering this ticket in the meantime.
> 
> Well that's interesting because in Comment 5 of this ticket he says:
> I've reproduced what you're seeing with --export=none. I'm using OpenMPI
> with pmi or pmi2 right now, so I guess it probably doesn't matter if you're
> using IntelMPI or OpenMPI.
I suppose which issue was ambiguous there, I'll try to answer per comment. Yes, he reproduced the issue with "--export=NONE" in comment #5 but not the issue in comment #7. That is why I suggested adding the PATH and LD_LIBRARY_PATH to export which comment #13 verified that ldd is now working. 

> --------------------------------------------------------------------------
> An ORTE daemon has unexpectedly failed after launch and before
> communicating back to mpirun. This could be caused by a number
> of factors, including an inability to create a connection back
> to mpirun due to a lack of common network interfaces and/or no
> route found between them. Please check network connectivity
> (including firewalls and network routing requirements).
> --------------------------------------------------------------------------

This error message could also be generated from ORTE daemon being started but failing fatally before or during MPI_Init().

Could you please start your program again with srun as in comment #7 while adding the extra paths to export and unsetting SLURM_EXPORT_ENV before calling srun in your job script?

Comment 18 Mark Schmitz 2019-05-08 14:03:20 MDT

(In reply to Nate Rini from comment #17)
> (In reply to Mark Schmitz from comment #16)
> > So the receiving slurmctld strips the commands being executed in the launch
> > script that sets up the environment for running mpiexec? That doesn't make
> > sense.
> mpiexec eventually calls srun to execute your jobs if Openmpi is compiled
> and configured with Slurm support. 

Well mpiexec uses srun to start orted on the nodes within a job, which then runs the the actual program specified, so yes.

> 
> > I can see that any environment that might be gathered when the sbatch
> > is submitted would be stripped, but the launch script is being executed on
> > the receiving cluster when launching the job and it is the job, so I would
> > expect that commands executed in that script (just before the mpiexec or
> > srun) that modify the environment would be effective in changing the
> > environment. If not then I would question why have such a switch in the
> > first place.
> Some sites need their environment completely scrubbed to run jobs across
> different distros (or containers) and set them up explicitly at job start.

Well essentially that's what I'm trying to do, in setting up the environment inside the job so the launch or job script should be executed by slurmd on each node, because I am trying to set up an environment where my users can submit jobs to multiple cluster with possibly different architectures. However if this switch is for ignoring what the job script sets up in the environment, and overriding it with what's on the command line then I guess I'm out of luck.

> > > > I am also curious since Marshall was able to reproduce this a month ago and
> > > > I've heard nothing until now, why we seem to be essentially starting over
> > > > with this issue?
> > > 
> > > AFAIK, Marshall was unable to replicate your issue. He is currently on
> > > vacation and I am covering this ticket in the meantime.
> > 
> > Well that's interesting because in Comment 5 of this ticket he says:
> > I've reproduced what you're seeing with --export=none. I'm using OpenMPI
> > with pmi or pmi2 right now, so I guess it probably doesn't matter if you're
> > using IntelMPI or OpenMPI.
> I suppose which issue was ambiguous there, I'll try to answer per comment.
> Yes, he reproduced the issue with "--export=NONE" in comment #5 but not the
> issue in comment #7. That is why I suggested adding the PATH and
> LD_LIBRARY_PATH to export which comment #13 verified that ldd is now
> working. 

Fair enough, I did misunderstand that. 

> 
> > --------------------------------------------------------------------------
> > An ORTE daemon has unexpectedly failed after launch and before
> > communicating back to mpirun. This could be caused by a number
> > of factors, including an inability to create a connection back
> > to mpirun due to a lack of common network interfaces and/or no
> > route found between them. Please check network connectivity
> > (including firewalls and network routing requirements).
> > --------------------------------------------------------------------------
> 
> This error message could also be generated from ORTE daemon being started
> but failing fatally before or during MPI_Init().

Actually mpiexec is failing because it can't find orted when using srun to start orted on all the nodes in the job. Even though the path to orted is in the environment setup in the job script.

> Could you please start your program again with srun as in comment #7 while
> adding the extra paths to export and unsetting SLURM_EXPORT_ENV before
> calling srun in your job script?

Yes and thanks, this works for both srun and mpiexec when submitted locally. However, I will need to try submitting this to another cluster or clusters remotely, because I went down this path because of problems submitting jobs remotely and finding that because OMPI_MCA_pml was set differently on some clusters the jobs would fail.

Maybe I'm going about this all wrong, but after a suggestion from Brian on how to submit jobs to multiple heterogeneous clusters, I ran into environment problems across clusters and looked in the documentation which said that --export=none would be needed when submitting jobs remotely. And by remotely I mean sbatch --clusters=clustera,clusterb,...

So i needed to make sure the environment was not being copied from one cluster to another when submitting jobs remotely. Also this solution of setting the path in --export=none,PATH={path spec} will be problematic, as the OpenMPI/IntelMPI paths may not be identical on different architecture clusters, so when submitting jobs to multiple clusters, I would need a way to differentiate. That is why I'm trying to set the environment inside the job script. We do have a common environment across all clusters regardless of architecture, but there will always be small differences because of architecture.

So hopefully there is another solution?

Comment 20 Nate Rini 2019-05-08 14:52:49 MDT

(In reply to Mark Schmitz from comment #18)
> (In reply to Nate Rini from comment #17)
> > (In reply to Mark Schmitz from comment #16)
> > > I can see that any environment that might be gathered when the sbatch
> > > is submitted would be stripped, but the launch script is being executed on
> > > the receiving cluster when launching the job and it is the job, so I would
> > > expect that commands executed in that script (just before the mpiexec or
> > > srun) that modify the environment would be effective in changing the
> > > environment. If not then I would question why have such a switch in the
> > > first place.
> > Some sites need their environment completely scrubbed to run jobs across
> > different distros (or containers) and set them up explicitly at job start.
> 
> Well essentially that's what I'm trying to do, in setting up the environment
> inside the job so the launch or job script should be executed by slurmd on
> each node, because I am trying to set up an environment where my users can
> submit jobs to multiple cluster with possibly different architectures.
> However if this switch is for ignoring what the job script sets up in the
> environment, and overriding it with what's on the command line then I guess
> I'm out of luck.

I think long term question as to how Slurm inheirts "--export" (and other env) from sbatch into srun warrants a RFE ticket all of its own. It is also possible that this is a good candidate for a clirefactor plugin in 19.05 (749d51d7d648d97).

I have already opened a secondary ticket (bug#6977) to add SLURM_EXPORT_ENV to the srun/sbatch man pages.

For the purposes of your current setup, does using "unset SLURM_EXPORT_ENV" not work before calling mpiexec/srun in your job script?

> > > --------------------------------------------------------------------------
> > > An ORTE daemon has unexpectedly failed after launch and before
> > > communicating back to mpirun. This could be caused by a number
> > > of factors, including an inability to create a connection back
> > > to mpirun due to a lack of common network interfaces and/or no
> > > route found between them. Please check network connectivity
> > > (including firewalls and network routing requirements).
> > > --------------------------------------------------------------------------
> > 
> > This error message could also be generated from ORTE daemon being started
> > but failing fatally before or during MPI_Init().
> 
> Actually mpiexec is failing because it can't find orted when using srun to
> start orted on all the nodes in the job. Even though the path to orted is in
> the environment setup in the job script.

Shall we attempt to determine exactly where the command fails?

Please try using using mpirun instead of mpiexec with these arguments for your job:
> -mca grpcomm_base_verbose 5 -mca orte_nidmap_verbose 10
> > Could you please start your program again with srun as in comment #7 while
> > adding the extra paths to export and unsetting SLURM_EXPORT_ENV before
> > calling srun in your job script?
> 
> Yes and thanks, this works for both srun and mpiexec when submitted locally.
> However, I will need to try submitting this to another cluster or clusters
> remotely, because I went down this path because of problems submitting jobs
> remotely and finding that because OMPI_MCA_pml was set differently on some
> clusters the jobs would fail.


Can you please call following cross cluster to see where it is failing?
> srun -vvvvv ldd ${path to binary}
> srun -vvvvv ldd $(which orted)


Is it safe to assume the clusters share the same fabric for MPI?
 
> Maybe I'm going about this all wrong, but after a suggestion from Brian on
> how to submit jobs to multiple heterogeneous clusters, I ran into
> environment problems across clusters and looked in the documentation which
> said that --export=none would be needed when submitting jobs remotely. And
> by remotely I mean sbatch --clusters=clustera,clusterb,...

If you want to use only "--export=none", then you will need to call script/binary that configures all of your environment. 

How env passing works with mpiexec calling srun of the actual MPI binary a little more complicated. I believe you will need to set your environment using the following (https://www.open-mpi.org/doc/v3.1/man1/mpirun.1.php):
> Environmental parameters can also be set/forwarded to the new processes using the MCA parameter mca_base_env_list.

An example is here: https://github.com/open-mpi/ompi/issues/4877
> So i needed to make sure the environment was not being copied from one
> cluster to another when submitting jobs remotely. Also this solution of
> setting the path in --export=none,PATH={path spec} will be problematic, as
> the OpenMPI/IntelMPI paths may not be identical on different architecture
> clusters, so when submitting jobs to multiple clusters, I would need a way
> to differentiate.

I believe this is one of the main use cases for environment modules as long as you ensure that modules source is loaded universally. This could also be done using a job wrapper that calls "/usr/bin/env -i PATH=$PATH LD_LIBRARY_PATH=$LD_LIBRARY_PATH ..all.env.. $@" and then not use the Slurm "--export" argument to avoid inheriting "--export" in your srun commands. Using /etc/profile.d/* should also be per host option and just use 'bash -l' when starting your job steps.

> That is why I'm trying to set the environment inside the
> job script. We do have a common environment across all clusters regardless
> of architecture, but there will always be small differences because of
> architecture.

Slurm doesn't apply any limitations to environment of the jobs once executed (excluding PMI/PMIX). I believe we are entering RFE territory if the idea is to have Slurm load a per node/partition environment for a given job step.

Comment 21 Nate Rini 2019-05-08 15:03:34 MDT

I would also like to note that in 19.05, you can use SRUN_EXPORT_ENV to override SLURM_EXPORT_ENV.

To ensure that srun inherits all of the environment:
> export SRUN_EXPORT_ENV=ALL

Comment 22 Mark Schmitz 2019-05-08 16:23:40 MDT

(In reply to Nate Rini from comment #21)
> I would also like to note that in 19.05, you can use SRUN_EXPORT_ENV to
> override SLURM_EXPORT_ENV.
> 
> To ensure that srun inherits all of the environment:
> > export SRUN_EXPORT_ENV=ALL

Okay "unset SLURM_EXPORT_ENV" works with --export=NONE, so can you please explain SLURM_EXPORT_ENV and hopefully both this and SRUN_EXPORT_ENV will be documented in 19.05. 

I double checked the environment both before and after the "unset SLURM_EXPORT_ENV" and it appears identical to me, however mpiexec works when unset and not without it.

Comment 23 Nate Rini 2019-05-08 18:32:05 MDT

(In reply to Mark Schmitz from comment #22)
> (In reply to Nate Rini from comment #21)
> > To ensure that srun inherits all of the environment:
> > > export SRUN_EXPORT_ENV=ALL
> 
> Okay "unset SLURM_EXPORT_ENV" works with --export=NONE, so can you please
> explain SLURM_EXPORT_ENV and hopefully both this and SRUN_EXPORT_ENV will be
> documented in 19.05. 

The documentation will be updated:
(In reply to Nate Rini from comment #20)
> I have already opened a secondary ticket (bug#6977) to add SLURM_EXPORT_ENV
> to the srun/sbatch man pages.

(In reply to Mark Schmitz from comment #22)
> I double checked the environment both before and after the "unset
> SLURM_EXPORT_ENV" and it appears identical to me, however mpiexec works when
> unset and not without it.
I just want to verify, does it work for your cross cluster jobs?

Comment 24 Mark Schmitz 2019-05-09 10:46:35 MDT

(In reply to Nate Rini from comment #23)
> (In reply to Mark Schmitz from comment #22)
> > (In reply to Nate Rini from comment #21)
> > > To ensure that srun inherits all of the environment:
> > > > export SRUN_EXPORT_ENV=ALL
> > 
> > Okay "unset SLURM_EXPORT_ENV" works with --export=NONE, so can you please
> > explain SLURM_EXPORT_ENV and hopefully both this and SRUN_EXPORT_ENV will be
> > documented in 19.05. 
> 
> The documentation will be updated:
> (In reply to Nate Rini from comment #20)
> > I have already opened a secondary ticket (bug#6977) to add SLURM_EXPORT_ENV
> > to the srun/sbatch man pages.
> 
> (In reply to Mark Schmitz from comment #22)
> > I double checked the environment both before and after the "unset
> > SLURM_EXPORT_ENV" and it appears identical to me, however mpiexec works when
> > unset and not without it.
> I just want to verify, does it work for your cross cluster jobs?

Well I asked for an explanation of SLURM_EXPORT_ENV as well since this is an undocumented "feature", but that is beside the point. All this side discussion is all well and good, but fails to address my original problem.

I would like you to address my original problem. Since in Comment 5 it is acknowledged that the problem can be reproduced, why does this test case fail? The environment in the test case has all the appropriate paths, so why does the local environment appear to be ignored and mpiexec fail? Why should I have to unset SLURM_EXPORT_ENV to make this work?

Comment 25 Nate Rini 2019-05-09 13:31:21 MDT

(In reply to Mark Schmitz from comment #24)
> I would like you to address my original problem. Since in Comment 5 it is
> acknowledged that the problem can be reproduced, why does this test case
> fail? The environment in the test case has all the appropriate paths, so why
> does the local environment appear to be ignored and mpiexec fail? Why should
> I have to unset SLURM_EXPORT_ENV to make this work?

When "--export" is set in sbatch, it will set SLURM_EXPORT_ENV to ensure the same options are carried to any job steps. In your specific case, this is counter intuitive and currently undocumented. Coincidentally, the same issue had a fix added for 19.05 for a different issue.

To avoid the job step inheriting this setting, one merely needs to unset SLURM_EXPORT_ENV or export SLURM_EXPORT_ENV=ALL. mpiexec is (eventually) calling srun which is making a new job step. Although it is not explicitly obvious in the job output to the user, one can verify by call "squeue -s" or 'sacct -j $JOBID' against the job in question to see the steps.

We will update the man pages accordingly.

Does that explanation help? Do you have any more questions or issues?

Comment 26 Mark Schmitz 2019-05-09 14:42:30 MDT

(In reply to Nate Rini from comment #25)
> (In reply to Mark Schmitz from comment #24)
> > I would like you to address my original problem. Since in Comment 5 it is
> > acknowledged that the problem can be reproduced, why does this test case
> > fail? The environment in the test case has all the appropriate paths, so why
> > does the local environment appear to be ignored and mpiexec fail? Why should
> > I have to unset SLURM_EXPORT_ENV to make this work?
> 
> When "--export" is set in sbatch, it will set SLURM_EXPORT_ENV to ensure the
> same options are carried to any job steps. In your specific case, this is
> counter intuitive and currently undocumented. Coincidentally, the same issue
> had a fix added for 19.05 for a different issue.
> 
> To avoid the job step inheriting this setting, one merely needs to unset
> SLURM_EXPORT_ENV or export SLURM_EXPORT_ENV=ALL. mpiexec is (eventually)
> calling srun which is making a new job step. Although it is not explicitly
> obvious in the job output to the user, one can verify by call "squeue -s" or
> 'sacct -j $JOBID' against the job in question to see the steps.
> 
> We will update the man pages accordingly.
> 
> Does that explanation help? Do you have any more questions or issues?

Let me see if I understand this correctly. Since I set --export=NONE on the sbatch command, SLURM_EXPORT_ENV gets set, which initially means that the job launched will not inherit any of the submission environment. Further then as each job step is launched it does not inherit the environment setup by the job script itself, and therefore cannot find orted when mpiexec calls srun to start orted and thus fails? And further the environment setup by the job script will not be propagated to any of the job steps launched by the job script? Is that description correct?

Comment 27 Mark Schmitz 2019-05-09 19:13:43 MDT

(In reply to Mark Schmitz from comment #26)
> (In reply to Nate Rini from comment #25)
> > (In reply to Mark Schmitz from comment #24)
> > > I would like you to address my original problem. Since in Comment 5 it is
> > > acknowledged that the problem can be reproduced, why does this test case
> > > fail? The environment in the test case has all the appropriate paths, so why
> > > does the local environment appear to be ignored and mpiexec fail? Why should
> > > I have to unset SLURM_EXPORT_ENV to make this work?
> > 
> > When "--export" is set in sbatch, it will set SLURM_EXPORT_ENV to ensure the
> > same options are carried to any job steps. In your specific case, this is
> > counter intuitive and currently undocumented. Coincidentally, the same issue
> > had a fix added for 19.05 for a different issue.
> > 
> > To avoid the job step inheriting this setting, one merely needs to unset
> > SLURM_EXPORT_ENV or export SLURM_EXPORT_ENV=ALL. mpiexec is (eventually)
> > calling srun which is making a new job step. Although it is not explicitly
> > obvious in the job output to the user, one can verify by call "squeue -s" or
> > 'sacct -j $JOBID' against the job in question to see the steps.
> > 
> > We will update the man pages accordingly.
> > 
> > Does that explanation help? Do you have any more questions or issues?
> 
> Let me see if I understand this correctly. Since I set --export=NONE on the
> sbatch command, SLURM_EXPORT_ENV gets set, which initially means that the
> job launched will not inherit any of the submission environment. Further
> then as each job step is launched it does not inherit the environment setup
> by the job script itself, and therefore cannot find orted when mpiexec calls
> srun to start orted and thus fails? And further the environment setup by the
> job script will not be propagated to any of the job steps launched by the
> job script? Is that description correct?

I have no other questions, and look forward to the SRUN_EXPORT_ENV option in 19.05.

Thank you.

Comment 28 Nate Rini 2019-05-09 20:06:52 MDT

*** Ticket 6977 has been marked as a duplicate of this ticket. ***

Comment 36 Nate Rini 2019-07-17 09:22:56 MDT

(In reply to Mark Schmitz from comment #27)
> I have no other questions, and look forward to the SRUN_EXPORT_ENV option in
> 19.05.

The documentation has been updated with this commit: https://github.com/SchedMD/slurm/commit/0ea757980a936539052721189c01c47417b13993

It usually takes a day or two for the slurm website to update with these changes in case your looking there.

Closing this ticket per your response.

Thanks,
--Nate