Ticket 12142

Summary: Environmental Variables for Batch Jobs Submitted from Interactive Jobs
Product: Slurm Reporter: Paul Edmon <pedmon>
Component: User CommandsAssignee: Tim McMullan <mcmullan>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: kilian
Version: 20.11.7   
Hardware: Linux   
OS: Linux   
Site: Harvard University Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Paul Edmon 2021-07-27 13:49:56 MDT
It's a pretty common occurrence in our environment to have jobs that submit jobs, or in this case an interactive job (started from salloc) that a user starts and then submits a batch job from.  We've found that Slurm gets confused in these situations in that it maps the full environment over including the SLURM_* variables from the salloc job.  Really what it should do is clear these from the salloc job and set new ones for the sbatch job that is being submitted.

We've tried --export=NONE but the SLURM_* variables still get mapped.  Is there anyway to prevent this?  Is this a bug?  As it seems to me at least that each invocation of a job submission should be independent of its parent submission unless you are invoking something like srun which is intended to hook into an extant allocation.

Currently we are working around this using unset for conflicting variables but this isn't really a great solution.
Comment 1 Tim McMullan 2021-08-16 10:43:02 MDT
Hi Paul,

I'm sorry about the late reply on this!  The SLURM_* options are set partly with that intent, to be able to pass options from the parent into child jobs.  The intent here is to preserve options for new steps or jobs inside a different allocation or that the allocation you have right now is what you are trying to use to run a job.  This definitely has pitfalls, like the one you mention here.

The cross-talk here happens because the input and output variables for salloc/sbatch/srun are usually the same and will get carried forward automatically.  Its been a long-standing behavior, and something that we have chatted internally about changing, but the change is considered an enhancement and work on it hasn't yet happened.

Apart from unsettling the variables, submitting new jobs from outside of salloc is the easiest way I can see right now to work around it.  It might be possible to do this in a SPANK plugin automatically (this is the only admin-accessible spot where the environment can be directly manipulated) as well.

Thanks, let me know if you have any other questions on this!
--Tim
Comment 2 Paul Edmon 2021-08-17 07:43:25 MDT
Nope, thanks for the info.  This makes perfect sense.

-Paul Edmon-

On 8/16/2021 12:43 PM, bugs@schedmd.com wrote:
>
> *Comment # 1 <https://bugs.schedmd.com/show_bug.cgi?id=12142#c1> on 
> bug 12142 <https://bugs.schedmd.com/show_bug.cgi?id=12142> from Tim 
> McMullan <mailto:mcmullan@schedmd.com> *
> Hi Paul,
>
> I'm sorry about the late reply on this!  The SLURM_* options are set partly
> with that intent, to be able to pass options from the parent into child jobs.
> The intent here is to preserve options for new steps or jobs inside a different
> allocation or that the allocation you have right now is what you are trying to
> use to run a job.  This definitely has pitfalls, like the one you mention here.
>
> The cross-talk here happens because the input and output variables for
> salloc/sbatch/srun are usually the same and will get carried forward
> automatically.  Its been a long-standing behavior, and something that we have
> chatted internally about changing, but the change is considered an enhancement
> and work on it hasn't yet happened.
>
> Apart from unsettling the variables, submitting new jobs from outside of salloc
> is the easiest way I can see right now to work around it.  It might be possible
> to do this in a SPANK plugin automatically (this is the only admin-accessible
> spot where the environment can be directly manipulated) as well.
>
> Thanks, let me know if you have any other questions on this!
> --Tim
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
>   * You reported the bug.
>
Comment 3 Tim McMullan 2021-08-17 08:31:38 MDT
Sure thing!  Thanks Paul, let us know if you have any other questions!

--Tim