Ticket 2609 - Documentation missing for SLURM_UMASK environment variable
Summary: Documentation missing for SLURM_UMASK environment variable
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Documentation (show other tickets)
Version: 15.08.7
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Moe Jette
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2016-04-04 05:29 MDT by Michael Gutteridge
Modified: 2022-02-04 15:00 MST (History)
1 user (show)

See Also:
Site: FHCRC - Fred Hutchinson Cancer Research Center
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 16.05.6
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
Do not clear SLURM_UMASK env var (5.76 KB, patch)
2016-04-06 01:58 MDT, Moe Jette
Details | Diff
Do not set SLURM_UMASK for batch jobs, srun command only (1.38 KB, patch)
2016-10-13 09:00 MDT, Moe Jette
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Michael Gutteridge 2016-04-04 05:29:29 MDT
Hi

I was digging through some issues we were having with DRMAA and job umasks and found that there's a very useful "SLURM_UMASK" environment variable that was introduced way back in the 1.2 days but does not appear to be in the current documentation for srun.

Hopefully, since I just recommended it as a solution for one of our users, this hasn't been deprecated.  If it hasn't, it'd be nice to have a mention in the manpages.

Thanks much

Michael
Comment 1 Moe Jette 2016-04-06 01:58:45 MDT
Created attachment 2979 [details]
Do not clear SLURM_UMASK env var

The SLURM_UMASK environment variable is being cleared rather than being visible to users. This one line patch removes the code to clear it (at least in version 15.08).

Since you think this is useful, I'll make this change plus matching documentation in version 16.05 (to be released May 2016).
Comment 2 Moe Jette 2016-04-06 02:06:31 MDT
Here is the commit in Slurm's version 16.05 code branch:
https://github.com/SchedMD/slurm/commit/58dea24602fe7b6cd98ae21df7780e13e6ae4d3a
Comment 3 Dorian Krause 2016-10-10 09:55:31 MDT
(In reply to Moe Jette from comment #1)
> Created attachment 2979 [details]
> Do not clear SLURM_UMASK env var
> 
> The SLURM_UMASK environment variable is being cleared rather than being
> visible to users. This one line patch removes the code to clear it (at least
> in version 15.08).

There are a couple of downsides to this change (that we ran into now that we started testing 16.05):

  - When executed within an sbatch script, srun will always propagate the SLURM_UMASK setting no matter what kind of "ulimit" calls precede it. It is rather hard to debug such problems for users.
  - sbatch and salloc will behave differently since only the former one sets SLURM_UMASK.

Have these been taking into account?
Comment 4 Moe Jette 2016-10-11 12:39:22 MDT
re-opening the ticket, but I have no idea when I might have time to investigate
Comment 5 Moe Jette 2016-10-13 09:00:28 MDT
Created attachment 3588 [details]
Do not set SLURM_UMASK for batch jobs, srun command only

This patch (to Slurm version 16.05) will avoid setting SLURM_UMASK for batch jobs, but will continue to be set for anything spawned by an srun command.

Michael, will this still satisfy your requirements?
Comment 6 Michael Gutteridge 2016-10-13 10:10:51 MDT
The specific case (that inspired the original bug) involves jobs submitted via slurm-drmaa (apps.man.poznan.pl/trac/slurm-drmaa).  So I guess I don't know how this change would affect us.

slurm-drmaa appears to create the job record using slurm primitives (is that the right term?)  slurm_init_job_desc_msg() seems to be one such "primitive" used for creating the job.  I don't know if that will bypass the check or if a job created vi drmaa is considered batch or not.  The slurm-drmaa code does attempt to propagate the local environment to the job, but I'm not sure that's a factor here either.

I certainly understand the issue that Dorian has highlighted- it is unclear in that circumstance which mask will be propagated to the srun job.  This is also likely the right fix as one can always set umask in the job script (or template in drmaa-speak).

There are other ways to fix our specific case, so I think I can say we'd be fine with it.  We'll just have to verify how these scripts work when we get to 16.

Thanks for checking in.

M

----- On Oct 13, 2016, at 8:00 AM,  bugs@schedmd.com wrote:

> https://bugs.schedmd.com/show_bug.cgi?id=2609
> 
> --- Comment #5 from Moe Jette <jette@schedmd.com> ---
> Created attachment 3588 [details]
>  --> https://bugs.schedmd.com/attachment.cgi?id=3588&action=edit
> Do not set SLURM_UMASK for batch jobs, srun command only
> 
> This patch (to Slurm version 16.05) will avoid setting SLURM_UMASK for batch
> jobs, but will continue to be set for anything spawned by an srun command.
> 
> Michael, will this still satisfy your requirements?
> 
> --
> You are receiving this mail because:
> You reported the bug.
Comment 7 Moe Jette 2016-10-13 10:27:56 MDT
OK, I'll mark this resolved unless I hear back from you.
Comment 8 Moe Jette 2016-10-13 14:08:49 MDT
(In reply to Michael Gutteridge from comment #6)
> The specific case (that inspired the original bug) involves jobs submitted
> via slurm-drmaa (apps.man.poznan.pl/trac/slurm-drmaa).  So I guess I don't
> know how this change would affect us.
> 
> slurm-drmaa appears to create the job record using slurm primitives (is that
> the right term?)  slurm_init_job_desc_msg() seems to be one such "primitive"
> used for creating the job.  I don't know if that will bypass the check or if
> a job created vi drmaa is considered batch or not.  The slurm-drmaa code
> does attempt to propagate the local environment to the job, but I'm not sure
> that's a factor here either.
> 
> I certainly understand the issue that Dorian has highlighted- it is unclear
> in that circumstance which mask will be propagated to the srun job.  This is
> also likely the right fix as one can always set umask in the job script (or
> template in drmaa-speak).

Slurm propagates the umask at submit time by setting the SLURM_UMASK environment variable. I would _guess_ that drmaa does so, in which case looking directly at the umask rather than the SLURM_UMASK environment variable would probably be best as that should be portable across various resource managers.