Ticket 9258 - Various Errors in slurm.conf man page regarding environment variables available to prolog/epilog
Summary: Various Errors in slurm.conf man page regarding environment variables availab...
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Documentation (show other tickets)
Version: 20.02.3
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Director of Support
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2020-06-19 11:13 MDT by Jim Long
Modified: 2021-01-15 16:40 MST (History)
1 user (show)

See Also:
Site: NCSA
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 20.11.3
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Jim Long 2020-06-19 11:13:51 MDT
I found the following inconsistencies between the slurm.conf man page and
what is actually present in the prolog/epilog environments.

SLURM_JOB_GID
   It is available in the prolog and epilog environments, but man page says
   "Available in PrologSlurmctld, EpilogSlurmctld and TaskProlog only."

It's very helpful to have this in the prolog and epilog  environments,
but the docs should include it on the list.

SLURM_JOBID vs. SLURM_JOB_ID
   man page refereneces SLURM_JOB_ID, but both SLURM_JOBID and
   SLURM_JOB_ID are present in prolog and epilog environments

SLURM_UID vs. SLURM_JOB_UID
   man page references SLURM_JOB_UID, but both SLURM_UID and
   SLURM_JOB_UID are present in prolog and epilog environments

SLURM_NODELIST vs. SLURM_JOB_NODELIST
   man page references SLURM_JOB_NODELIST, but only SLURM_NODELIST
   is present in prolog and epilog environments

SLURMD_NODENAME
   Available in the prolog, but man page does not refernece it.

SLURM_CONF
   Available in the prolog, but man page does not refernece it.


Much of this is covered in bug # 1432 from 2015, but at least the 
SLURM_JOB_GID piece is new.  

Perhaps some of the JOBID and UID variables are still around for
legacy reasons.  

SLURM_NODELIST is either documented wrong our implemented incorrectly.  
I suspect SLURM_JOB_NODELIST is what should be present in the environment.

SLURMD_NODENAME and SLURM_CONF and still undocumented, so is it safe to rely
on them?



Here's what I get from an environment dump for one of my jobs -

SLURM_NODELIST=iforge[146-148]
SLURMD_NODENAME=iforge148
SLURM_JOBID=102
SLURM_CONF=/usr/local/adm/slurm/prod/etc/slurm.conf
SLURM_JOB_ID=102
PWD=/var/spool/slurmd
SLURM_JOB_USER=jlong
SLURM_UID=8788
SLURM_JOB_UID=8788
SHLVL=1
SLURM_JOB_GID=1010
SLURM_CLUSTER_NAME=iforge
SLURM_JOB_PARTITION=normal
SLURM_JOB_CONSTRAINTS=(null)
SLURM_SCRIPT_CONTEXT=prolog_slurmd
Comment 1 Michael Hinton 2020-06-19 12:59:41 MDT
Hello Jim,

(In reply to Jim Long from comment #0)
> I found the following inconsistencies between the slurm.conf man page and
> what is actually present in the prolog/epilog environments.
> 
> SLURM_JOB_GID
>    It is available in the prolog and epilog environments, but man page says
>    "Available in PrologSlurmctld, EpilogSlurmctld and TaskProlog only."
> 
> It's very helpful to have this in the prolog and epilog  environments,
> but the docs should include it on the list.
> 
> SLURM_JOBID vs. SLURM_JOB_ID
>    man page refereneces SLURM_JOB_ID, but both SLURM_JOBID and
>    SLURM_JOB_ID are present in prolog and epilog environments
> 
> SLURM_UID vs. SLURM_JOB_UID
>    man page references SLURM_JOB_UID, but both SLURM_UID and
>    SLURM_JOB_UID are present in prolog and epilog environments
> 
> SLURM_NODELIST vs. SLURM_JOB_NODELIST
>    man page references SLURM_JOB_NODELIST, but only SLURM_NODELIST
>    is present in prolog and epilog environments
> 
> SLURMD_NODENAME
>    Available in the prolog, but man page does not refernece it.
> 
> SLURM_CONF
>    Available in the prolog, but man page does not refernece it.
> 
> 
> Much of this is covered in bug # 1432 from 2015, but at least the 
> SLURM_JOB_GID piece is new.  
Nice catch. It looks like this was added in 20.02 with commit https://github.com/SchedMD/slurm/commit/0744089e3d. I think we just forgot to update the documentation, which hasn't been modified regarding this since 2015/2016.

> Perhaps some of the JOBID and UID variables are still around for
> legacy reasons.  
Yes, I believe we undocumented many of these on purpose, but kept them for backwards compatibility.

> SLURM_NODELIST is either documented wrong our implemented incorrectly.  
> I suspect SLURM_JOB_NODELIST is what should be present in the environment.
> 
> SLURMD_NODENAME and SLURM_CONF and still undocumented, so is it safe to rely
> on them?
Let me look into these some more and get back to you.

Thanks,
-Michael
Comment 6 Michael Hinton 2021-01-15 16:40:44 MST
Hi Jim,

I believe that these issues should all be fixed as of commit bcc1f977c3 (see https://github.com/SchedMD/slurm/commit/bcc1f977c34a5fc099e283794e777b181f5ab01b). I believe this commit fixes everything you raised (besides intentionally undocumented env vars), but if we missed something, feel free to let us know.

Thanks,
-Michael