| Summary: | SRUN_EXPORT | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Daniel Grimwood <daniel.grimwood> |
| Component: | User Commands | Assignee: | Tim Wickberg <tim> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | matthews |
| Version: | 17.02.9 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Pawsey | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 19.05.0pre4 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Daniel Grimwood
2018-08-07 22:14:37 MDT
The main issue I can see with this is inconsistent behavior if your users use srun commands off the login nodes. They'd then export the environment in that situation - but not in batch scripts - which could further confuse matters. I believe you can accomplish what you're after today through the use of a trivial TaskProlog script: =========== #!/bin/sh echo "export SLURM_EXPORT_ENV=ALL" =========== This would ensure an srun within a batch job has that setting overridden, but not for salloc or srun commands directly launched from the login nodes. Let me know if that works for you, in some quick testing here it appears to handle the use case you'd described. cheers, - Tim Hi Tim, thanks for the quick reply and sharing your concerns. We'll discuss and test it internally. I like the TaskProlog script but am a bit concerned about a blanket overwriting of SLURM_EXPORT_ALL, but I think we can do a bash test for whether SBATCH_EXPORT is set and then do the setting of SLURM_EXPORT_ALL. Part of the appeal of a new SRUN_EXPORT is that users could override it in their jobscripts and be in more control. For the potential login node inconsistency, that can be dealt with by only having the slurm modulefile on the compute nodes do the export, while the login nodes do nothing. Having said that, we only promote the use of srun to our users inside of sbatch and salloc. With regards, Daniel. Hi Tim, we discussed internally and are not in favour of modifying the environment from within a TaskProlog script. This can result in unexpected behaviour for our more advanced users, who may be changing these environment variables themselves. We could set a SRUN_EXPORT=NONE on login nodes if that addresses your concerns about interactive srun. Do you have any other suggestions? With regards, Daniel. (In reply to Daniel Grimwood from comment #3) > Hi Tim, > > we discussed internally and are not in favour of modifying the environment > from within a TaskProlog script. This can result in unexpected behaviour > for our more advanced users, who may be changing these environment variables > themselves. Why don't you just test in the TaskProlog if the environment variable was set by the user? The TaskProlog script is launched with a copy of the users' environment (as it existed when the step was submitted). > We could set a SRUN_EXPORT=NONE on login nodes if that addresses your > concerns about interactive srun. That still gets messy - now Slurm would have to decide to strip that back out of the environment or not, or it'd be getting propagated in some locations that would cause problems. I'd suggest you revisit the TaskProlog - something like the following seems to address all your concerns, and is something you could drop in place today: if [ -z "${SLURM_EXPORT_ENV}" ]; then echo "export SLURM_EXPORT_ENV=ALL" fi Thanks Tim for the suggestion. We tell our users to always set #SBATCH --export=NONE as it makes reproducible jobscripts, and we have quite a few workflows that span multiple slurm clusters which have different environments. Because of this, SLURM_EXPORT_ENV is always set, and the test you suggested would always return true. We could change the test to be either the environment variable being not existent or value being NONE, but even then it is possible that the user actually wants to export NONE in their srun. Ideally for us, srun would not use the environment variable set by sbatch, so sbatch not setting SLURM_EXPORT_ENV would also work. With regards, Daniel. Hey guys - I've made it back to the states, and got a chance to look at this again based on what we'd discussed last week. I do see the value in this based on your description, and the new SRUN_EXPORT_ENV environment variable which will be in the 19.05 releases should cover your use case. While we will not be including this in any 18.08 maintenance releases, if you desire, you should be able to back-port this easily - it's just a single line change to add the new input variable. Commit details follow for posterity. cheers, - Tim commit 9d529ae6d98d8bbea7325ac8faa9bcb20dc4d7f8 Author: Tim Wickberg <tim@schedmd.com> AuthorDate: Wed Apr 17 23:14:54 2019 -0600 Add SRUN_EXPORT_ENV as input to srun. Overrides any setting for SLURM_EXPORT_ENV, which can make nesting jobs simpler. If SBATCH_EXPORT_ENV=NONE (which will cause SLURM_EXPORT_ENV=NONE to be set in the batch step) is used alongside SRUN_EXPORT_ENV=ALL, this allows for the batch environment to be reset, but then for changes made in the batch script (e.g., loading modules with 'module load') to propagate out as part of the step launch. The same can be accomplished by the user in their scripts by explicitly setting 'srun --export=ALL ...' for every step launch, but this should provide an easier mechanism for sites to make this behavior the default for their users by pushing this pair of environment variables into their users default profiles. Bug 5537. |