6609 – prolog script question

Ticket 6609 - prolog script question

Summary: prolog script question

Status:	RESOLVED INFOGIVEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Configuration (show other tickets)
Version:	18.08.3
Hardware:	Cray XC Linux

Severity:	4 - Minor Issue
Assignee:	Marshall Garey
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2019-02-28 08:06 MST by Jason Repik
Modified:	2019-04-02 09:05 MDT (History)
CC List:	2 users (show)

See Also:
Site:	Sandia National Laboratories
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Jason Repik 2019-02-28 08:06:59 MST

Is it possible to get access to user environment variables from the prolog
script or does it only have access to the variables listed on the Prolog and
Epilog Guide?

For example:

Sample prolog script:

#!/bin/bash
if [[ ${SLURM_JOB_PARTITION} == *"_NOLDMS" ]]
then
  echo $SLURM_JOB_PARTITION >> /tmp/prolog_test.txt
  if [ ${PROLOG_TEST} -eq 1 ]
  then 
    echo $SLURM_JOB_NODELIST >> /tmp/prolog_test.txt
  fi
fi


Sample user job script:
#!/bin/bash
#SBATCH -N 4
#SBATCH -p ldms_NOLDMS
export PROLOG_TEST=1
srun hostname

In this example I see the partition name in the output
but not the nodelist which would need the variable from
the user's job submission to be set.

Comment 3 Marshall Garey 2019-03-05 17:27:54 MST

(In reply to Jason Repik from comment #0)
> For example:
> 
> Sample prolog script:
> 
> #!/bin/bash
> if [[ ${SLURM_JOB_PARTITION} == *"_NOLDMS" ]]
> then
>   echo $SLURM_JOB_PARTITION >> /tmp/prolog_test.txt
>   if [ ${PROLOG_TEST} -eq 1 ]
>   then 
>     echo $SLURM_JOB_NODELIST >> /tmp/prolog_test.txt
>   fi
> fi
> 
> 
> Sample user job script:
> #!/bin/bash
> #SBATCH -N 4
> #SBATCH -p ldms_NOLDMS
> export PROLOG_TEST=1
> srun hostname
> 
> In this example I see the partition name in the output
> but not the nodelist which would need the variable from
> the user's job submission to be set.

The nodelist is actually available in the environment variable SLURM_NODELIST in the prolog script. Unfortunately, the environment variable SLURM_JOB_NODELIST listed in the documentation is incorrect. Both SLURM_NODELIST and SLURM_JOB_NODELIST are available in the job's environment.

$ srun env|grep NODELIST
SLURM_NODELIST=v1
SLURM_JOB_NODELIST=v1
SLURM_STEP_NODELIST=v1

I'll fix this documentation error.


> Is it possible to get access to user environment variables from the prolog
> script or does it only have access to the variables listed on the Prolog and
> Epilog Guide?

Natively, the job prolog only has access to the variables listed in the Prolog and Epilog guide. However, you can write a SPANK plugin to do get the user's environment variables and send them to the job prolog script. You can find all the details that you want to know about SPANK (and more) at our SPANK web page. There's a great example at the bottom, which I followed to create my own SPANK plugin.

https://slurm.schedmd.com/spank.html

A few notes:

* Whichever functions you define in your plugin will be called by Slurm. In this case, you want to define the function slurm_spank_init_post_opt().
* I read the user environment with the global environ (and then have to parse them into name/value pairs):

http://man7.org/linux/man-pages/man7/environ.7.html

*  and then call spank_job_control_setenv() to set the environment variables. In the prolog script, all these environment variables will be prepended by the string "SPANK_". You will still have access to the environment variables defined by Slurm. My prolog outputs the environment to a file. Here are a few of the environment variables that appear in the log file:

SPANK_LOGNAME=marshall
SPANK_LESSCLOSE=/usr/bin/lesspipe %s %s
SLURM_STEP_ID=0
...
SLURM_CLUSTER_NAME=voyager
SLURM_NODELIST=v1
...

* Compile the SPANK plugin into a shared library and put that in a plugstack.conf in the same etc/ directory as your slurm.conf. 

$ cat etc/plugstack.conf 
# plugstack.conf
# format:
# required/optional  </path/to/plugin.so>  [args]
required /home/marshall/slurm/18.08/voyager/spank_prolog_env.so


Here's my SPANK plugin. This works for me, but I can't/won't officially bless this code, so use it only as a guide and not as an example of "best practice."

/*
 * compile with
gcc -Wall -g -shared spank_prolog_env.c -o -fPIC spank_prolog_env.so \
-I/home/marshall/slurm/18.08/voyager/include \
-Wl,-rpath=/home/marshall/slurm/18.08/voyager/lib \
-L/home/marshall/slurm/18.08/voyager/lib \
-lslurm
 *
 * -g is for debug info
 * -shared indicates shared library
 * -fPIC is because of extern char **environ - it must be position independent
 *  code
 * the rest of the flags are for linking
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <slurm/spank.h>

/* Required for the library function strstr() */
#define _GNU_SOURCE

/* All SPANK plugins must define this macro for the Slurm plugin loader. */
SPANK_PLUGIN(spank_prolog_env, 1);

extern char **environ;

#define STR_MAX 1024

static void _split_env_name_val(char name[STR_MAX], char **val)
{
        if ((val == NULL)) {
                printf("%s: You done messed up! val = %p\n",
                       __func__, val);
                return;
        }
        *val = strstr(name, "=");
        /*
         * Value is the string after the '='. If there wasn't an '=', something
         * went wrong.
         */
        if (*val) {
                /*
                 * Null-terminate name at the '=' character.
                 * val should point to the character after the '=' character.
                 */
                **val = '\0';
                *val = *val + 1;
        } else {
                printf("%s: ERROR!: '=' not in env string %s\n",
                       __func__, name);
        }
}

int slurm_spank_init_post_opt(spank_t spank, int ac, char *argv[])
{
        /*
        setenv("TEST", "TEST", 1);
        spank_setenv(spank, "TEST2", "TEST2", 0);
        spank_job_control_setenv(spank, "TEST3", "TEST3", 1);
        spank_job_control_setenv(spank, "TEST4", "TEST4", 1);
        */

        //printf("%s: hi!\n", __func__);

        /* Read all user environment */
        int i = 0;
        char name[STR_MAX], *val;
        while (environ[i]) {
                //printf("%s\n", environ[i]);
                snprintf(name, STR_MAX, "%s", environ[i]);
                _split_env_name_val(name, &val);
                //printf("\tname: \"%s\", value: \"%s\"\n\n", name, val);
                spank_job_control_setenv(spank, name, val, 1);
                i++;
        }
        return 0;
}

Comment 4 Jason Repik 2019-03-06 08:09:57 MST

This is great information.  Thanks!

One thing I did notice is that the SLURM_NODELIST does not seem to be available
in the prolog script but the SLURM_JOB_NODELIST is available:

sdb:/etc/opt/slurm # cat slurm_prolog.sh
#!/bin/bash

if [[ ${SLURM_JOB_PARTITION} == *"_NOLDMS" ]]
then

  echo "==============================================================" >> /tmp/prolog_test.txt
  date >> /tmp/prolog_test.txt
  echo "SLURM_JOB_PARTITION="$SLURM_JOB_PARTITION >> /tmp/prolog_test.txt
  echo "SLURM_NODELIST="$SLURM_NODELIST >> /tmp/prolog_test.txt
  echo "SLURM_JOB_NODELIST="$SLURM_JOB_NODELIST >> /tmp/prolog_test.txt
   #  /opt/cray/pdsh/default/bin/pdsh -w $SLURM_JOB_NODELIST touch /tmp/jjr

fi

Output:
==============================================================
Wed Mar  6 08:08:18 MST 2019
SLURM_JOB_PARTITION=ldms_NOLDMS
SLURM_NODELIST=
SLURM_JOB_NODELIST=nid00160

Comment 5 Marshall Garey 2019-03-06 09:28:52 MST

(In reply to Jason Repik from comment #4)
> This is great information.  Thanks!
> 
> One thing I did notice is that the SLURM_NODELIST does not seem to be
> available
> in the prolog script but the SLURM_JOB_NODELIST is available:
> 
> sdb:/etc/opt/slurm # cat slurm_prolog.sh
> #!/bin/bash
> 
> if [[ ${SLURM_JOB_PARTITION} == *"_NOLDMS" ]]
> then
> 
>   echo "==============================================================" >>
> /tmp/prolog_test.txt
>   date >> /tmp/prolog_test.txt
>   echo "SLURM_JOB_PARTITION="$SLURM_JOB_PARTITION >> /tmp/prolog_test.txt
>   echo "SLURM_NODELIST="$SLURM_NODELIST >> /tmp/prolog_test.txt
>   echo "SLURM_JOB_NODELIST="$SLURM_JOB_NODELIST >> /tmp/prolog_test.txt
>    #  /opt/cray/pdsh/default/bin/pdsh -w $SLURM_JOB_NODELIST touch /tmp/jjr
> 
> fi
> 
> Output:
> ==============================================================
> Wed Mar  6 08:08:18 MST 2019
> SLURM_JOB_PARTITION=ldms_NOLDMS
> SLURM_NODELIST=
> SLURM_JOB_NODELIST=nid00160

Huh, weird. That's the opposite of what I see for some reason. Oh well - if it works for you, that's great.

I'm closing this ticket for now. If you have more questions about this, feel free to respond and re-open the ticket.