| Summary: | Aliases set in custom module not working if called from submission script | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Marco <marco.delapierre> |
| Component: | User Commands | Assignee: | Alejandro Sanchez <alex> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | alex |
| Version: | 17.02.9 | ||
| Hardware: | Cray XC | ||
| OS: | Linux | ||
| Site: | Pawsey | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | CLE 6.0 update 05 | Version Fixed: | |
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
|
Description
Marco
2018-06-14 01:25:43 MDT
When a job allocation is granted for a submitted batch script, Slurm runs a single copy of the batch script on the first node in the set of allocated nodes. If this node hasn't loaded the custom modules that create the aliases, the batch script won't be able to do the alias substitutions, because the shell in the batch host doesn't maintain the list of aliases that were set in the submission host (login node). It is common to load modules from within the batch script. Depending on your SallocDefaultCommand, your "salloc" session by default just executes $SHELL on the login node. A typical SallocDefaultCommand could be defined like this: # For systems with generic resources (GRES) defined, the SallocDefaultCommand value should explicitly specify a zero count for the configured GRES SallocDefaultCommand = "srun -n1 -N1 --mem-per-cpu=0 --pty --preserve-env --cpu_bind=no --mpi=none $SHELL" I guess your "salloc" session just executed $SHELL from within the submission host, such shell sourced the .profile and triggered the chain of points 1 to 4. If this chain isn't sourced in the batch host where the batch script is executed then the alias won't be available. Does it make sense? Hi Alejandro, thanks for coming back to me. Actually, I know the batch host goes through steps 1. to 4., as modules of point 4. turn out as loaded, with for instance all of the variables loaded by such modules being available in the batch environment. The ONLY missing bit are the aliases set by some of those modules. Hey Marco,
Although I doubt this is a Slurm problem, and it is more a how bash and bash builtins work when executing scripts, I've been digging a bit into the bash man page and found this:
Aliases are not expanded when the shell is not interactive, unless the expand_aliases shell option is set using shopt (see the description of shopt under SHELL BUILTIN COMMANDS below).
So I tried this and it seems the alias defined in the ~/.profile file is properly expanded within the executed script when using 'bash -l' + 'shopt -s expand_aliases'.
alex@smd-server:~/tests$ ssh smd1 'grep alex ~/.profile'
alias alex='hostname -s'
alex@smd-server:~/tests$ cat test.bash
#!/bin/bash -l
shopt -s expand_aliases
alex
alex@smd-server:~/tests$ sbatch -w smd1 test.bash
Submitted batch job 20028
alex@smd-server:~/tests$ cat slurm-20028.out
smd1
alex@smd-server:~/tests$
Could you try this out and see if it works for you? Thank you.
Hi Alejandro, thanks for sharing this. I am actually using bash -l, and have already tried the shopt tip without success. I agree with you that this is most probably not a Slurm problem, and besides bash there are also the Environment modules that add complexity to the situation. I filed in this ticket mainly to know whether on the Slurm side you had seen this before. I have just a couple of last related questions then: - are there any Slurm configuration variables/settings that could change variable behaviour? - what are the differences between how salloc and sbatch create the shell environments? (as a starting point to this, you previously mentioned SallocDefaultCommand) Many thanks, best regards, Marco I think this FAQ entry is very related to this: https://slurm.schedmd.com/faq.html#user_env So when executing the job's spawned applications Slurm doesn't source the ~/.profile and/or ~/,bashrc files. There's the --export option to setup which environment variables are meant to be propagated to the compute nodes (all by default). But I think bash doesn't store the list of alias substitutions in the env, so changing the value won't do anything. With an SallocDefaultCommand like this: alex@smd-server:~/tests$ scontrol show conf | grep -i salloc SallocDefaultCommand = srun -n1 -N1 --mem-per-cpu=0 --pty --preserve-env --cpu_bind=no --mpi=none $SHELL -l alex@smd-server:~/tests$ I get the .profile sourced for 'salloc'. Same if I manually execute 'srun --pty bash -l'. Perhaps a workaround for this would be using a TaskProlog script: https://slurm.schedmd.com/prolog_epilog.html Please, let me know if there's anything else you need from here. Thanks Alejandro, your comments have been valuable for better understanding how Slurm works under the hood. For this specific case, as only one user is having issues, I think it is not worth changing system-wide Slurm configurations. As he is only having troubles with scripts submitted with sbatch, I will advise him to load the modules from within the script rather than in his .profile. Thanks again for your support, with kind regards, Marco (In reply to Marco from comment #6) > Thanks Alejandro, > > your comments have been valuable for better understanding how Slurm works > under the hood. > For this specific case, as only one user is having issues, I think it is not > worth changing system-wide Slurm configurations. > As he is only having troubles with scripts submitted with sbatch, I will > advise him to load the modules from within the script rather than in his > .profile. > > Thanks again for your support, > with kind regards, > Marco All right. Anyhow as I tested in my comment 3 a combination of bash -l plus the shopt tip is sourcing the .profile for me. What I think is that there's a weird thing happening in between the rest of your 1-4 steps with the tcl alias generation. I'm gonna go ahead and close this. Please, reopen if there's anything else. |