Ticket 3607 - sbatch jobs do not source scripts in /etc/profile.d/*.sh at startup
Summary: sbatch jobs do not source scripts in /etc/profile.d/*.sh at startup
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmd (show other tickets)
Version: 16.05.10
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: Tim Wickberg
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2017-03-22 10:16 MDT by Ole.H.Nielsen@fysik.dtu.dk
Modified: 2017-03-22 10:43 MDT (History)
0 users

See Also:
Site: DTU Physics
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Ole.H.Nielsen@fysik.dtu.dk 2017-03-22 10:16:30 MDT
We try to set up the environment in sbatch jobs by adding scripts to the /etc/profile.d/ directory, but they don't seem to be sourced by the bash shell in the job spawned by slurmd (we assume the bash shell on CentOS 7.3 systems).

The FAQ https://slurm.schedmd.com/faq.html#user_env says that user .profile and .bashrc are not sourced, but how about the system files in /etc/profile.d/*.sh?  Also, the bash man-page explains under INVOCATION how startup is done, but I don't know how slurmd does the startup.

We would like to set up automatically for the users the batch job environment (which may well differ from the environment on the login nodes) such as this one:

# cat /etc/profile.d/cpu_arch.sh 
export CPU_ARCH="broadwell"
function cpu_arch { echo $CPU_ARCH }

But in the batch job this obviously hasn't been set up:
type cpu_arch
/bin/bash: line 3: type: cpu_arch: not found

Question 1: Can we use /etc/profile.d/*.sh scripts to set up the environment in jobs?

If your answer is that /etc/profile.d/*.sh scripts are ignored (and then please add this to the FAQ), then I don't understand why the Lmod "module" command actually works inside the job (and seen if you do "srun bash -x"):

type module
module is a function
module () 
{ 
    eval $($LMOD_CMD bash "$@");
    [ $? = 0 ] && eval $(${LMOD_SETTARG_CMD:-:} -s sh)
}

which I believe is initialized by the script /etc/profile.d/z00_lmod.sh that eventually sources /usr/share/lmod/lmod/init/sh for defining the module() function.  I'm really puzzled here!

Question 2: Can you suggest better ways to set up a default environment in jobs, which will differ from the environment in the login nodes?  The goal is for this to be automatic so that users won't have to worry about setting the correct CPU architecture and variables like OMP_NUM_THREADS=1.
Comment 1 Tim Wickberg 2017-03-22 10:27:01 MDT
(In reply to Ole.H.Nielsen@fysik.dtu.dk from comment #0)
> We try to set up the environment in sbatch jobs by adding scripts to the
> /etc/profile.d/ directory, but they don't seem to be sourced by the bash
> shell in the job spawned by slurmd (we assume the bash shell on CentOS 7.3
> systems).
> 
> The FAQ https://slurm.schedmd.com/faq.html#user_env says that user .profile
> and .bashrc are not sourced, but how about the system files in
> /etc/profile.d/*.sh?  Also, the bash man-page explains under INVOCATION how
> startup is done, but I don't know how slurmd does the startup.
> 
> We would like to set up automatically for the users the batch job
> environment (which may well differ from the environment on the login nodes)
> such as this one:
> 
> # cat /etc/profile.d/cpu_arch.sh 
> export CPU_ARCH="broadwell"
> function cpu_arch { echo $CPU_ARCH }
> 
> But in the batch job this obviously hasn't been set up:
> type cpu_arch
> /bin/bash: line 3: type: cpu_arch: not found
> 
> Question 1: Can we use /etc/profile.d/*.sh scripts to set up the environment
> in jobs?
>
> If your answer is that /etc/profile.d/*.sh scripts are ignored (and then
> please add this to the FAQ), then I don't understand why the Lmod "module"
> command actually works inside the job (and seen if you do "srun bash -x"):
> 
> type module
> module is a function
> module () 
> { 
>     eval $($LMOD_CMD bash "$@");
>     [ $? = 0 ] && eval $(${LMOD_SETTARG_CMD:-:} -s sh)
> }
> 
> which I believe is initialized by the script /etc/profile.d/z00_lmod.sh that
> eventually sources /usr/share/lmod/lmod/init/sh for defining the module()
> function.  I'm really puzzled here!

The user profile is captured at job submission time, then restored verbatim. Profiles scripts and the like are not run when setting up the user environment. This is also what allows lmod to continue functioning when restored on the job.

> Question 2: Can you suggest better ways to set up a default environment in
> jobs, which will differ from the environment in the login nodes?  The goal
> is for this to be automatic so that users won't have to worry about setting
> the correct CPU architecture and variables like OMP_NUM_THREADS=1.

Look into using a TaskProlog script. https://slurm.schedmd.com/prolog_epilog.html
Comment 2 Ole.H.Nielsen@fysik.dtu.dk 2017-03-22 10:38:39 MDT
(In reply to Tim Wickberg from comment #1)
> The user profile is captured at job submission time, then restored verbatim.
> Profiles scripts and the like are not run when setting up the user
> environment. This is also what allows lmod to continue functioning when
> restored on the job.

Thanks, now this makes sense to me!

> > Question 2: Can you suggest better ways to set up a default environment in
> > jobs, which will differ from the environment in the login nodes?  The goal
> > is for this to be automatic so that users won't have to worry about setting
> > the correct CPU architecture and variables like OMP_NUM_THREADS=1.
> 
> Look into using a TaskProlog script.
> https://slurm.schedmd.com/prolog_epilog.html

OK, it's time to dive into yet another Slurm feature :-)

You may close this case now, thanks.