Ticket 16673

Summary: Srun --bcast option causes incorrect result from spank_get_item with S_JOB_ARGV
Product: Slurm Reporter: Andrew D'Angelo <andrew.dangelo>
Component: User CommandsAssignee: Benjamin Witham <benjamin.witham>
Status: RESOLVED FIXED QA Contact:
Severity: 3 - Medium Impact    
Priority: ---    
Version: 23.02.1   
Hardware: Cray Shasta   
OS: Linux   
Site: CRAY Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: Cray Internal
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 23.02.4 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Andrew D'Angelo 2023-05-08 15:48:03 MDT
We found this problem from our ATP Slurm plugin, which uses `spank_get_item` to find the name of the job binary that is being launched. When the `--bcast=<dir>` option is used to ship the job binary to a directory, this causes the first entry in the `argv` array to be the setting of `--bcast` instead of the name or path of the binary. Below is a small reproducer.

#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

#include <slurm/spank.h>

#define MAJOR_VER 1
#define MINOR_VER 0
#define FULL_VERSION ((MAJOR_VER * 1000) + MINOR_VER)
SPANK_PLUGIN(launch/test, FULL_VERSION)

int slurm_spank_local_user_init(spank_t sHandle, int argc, char **argv)
{
        int job_argc = 0;
        char const** job_argv = NULL;
        int spank_rc = spank_get_item(sHandle, S_JOB_ARGV, &job_argc, &job_argv);
        if (spank_rc != ESPANK_SUCCESS) {
                slurm_info("%s\n", spank_strerror(spank_rc));
        } else {
                fprintf(stderr, "Job argv: ");
                for (int i = 0; i < job_argc; i++) {
                        fprintf(stderr, "%s ", job_argv[i]);
                }
                fprintf(stderr, "\n");
        }

        return 0;
}

int slurm_spank_exit(spank_t sHandle, int argc, char **argv)
{
        return 0;
}

$ cc -g -O0 -shared `pkg-config --libs --cflags slurm` test_argv.c -o test_argv.so

# Plugstack line
optional /home/users/adangelo/test_argv.so

$ srun -n2 --bcast=/home/users/adangelo/ ./a.out argv1 argv2
Job argv: /home/users/adangelo/ argv1 argv2
Comment 1 Ben Roberts 2023-05-08 16:02:22 MDT
Hi Andrew,

I have a quick question about this.  Was this working at one time and stopped, or is this something you found as you were working on a new spank plugin?
Comment 2 Andrew D'Angelo 2023-05-09 08:20:36 MDT
We have been using `spank_get_item` for a while now in one of our debugger products' Slurm plugin, but haven't tried using the `--bcast` option until now.
Comment 19 Benjamin Witham 2023-06-26 09:03:36 MDT
Hello,

This issue has been fixed in commit a9fc6420be. It will be applied to the 23.02.4 release. I'll close this ticket now, but feel free to reopen this if you have any questions about the patch.
Comment 20 Andrew D'Angelo 2024-02-14 16:14:59 MST
*** Ticket 18988 has been marked as a duplicate of this ticket. ***