Ticket 16673 - Srun --bcast option causes incorrect result from spank_get_item with S_JOB_ARGV
Summary: Srun --bcast option causes incorrect result from spank_get_item with S_JOB_ARGV
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: User Commands (show other tickets)
Version: 23.02.1
Hardware: Cray Shasta Linux
: 3 - Medium Impact
Assignee: Benjamin Witham
QA Contact:
URL:
: 18988 (view as ticket list)
Depends on:
Blocks:
 
Reported: 2023-05-08 15:48 MDT by Andrew D'Angelo
Modified: 2024-02-14 16:14 MST (History)
0 users

See Also:
Site: CRAY
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: Cray Internal
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 23.02.4
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Andrew D'Angelo 2023-05-08 15:48:03 MDT
We found this problem from our ATP Slurm plugin, which uses `spank_get_item` to find the name of the job binary that is being launched. When the `--bcast=<dir>` option is used to ship the job binary to a directory, this causes the first entry in the `argv` array to be the setting of `--bcast` instead of the name or path of the binary. Below is a small reproducer.

#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

#include <slurm/spank.h>

#define MAJOR_VER 1
#define MINOR_VER 0
#define FULL_VERSION ((MAJOR_VER * 1000) + MINOR_VER)
SPANK_PLUGIN(launch/test, FULL_VERSION)

int slurm_spank_local_user_init(spank_t sHandle, int argc, char **argv)
{
        int job_argc = 0;
        char const** job_argv = NULL;
        int spank_rc = spank_get_item(sHandle, S_JOB_ARGV, &job_argc, &job_argv);
        if (spank_rc != ESPANK_SUCCESS) {
                slurm_info("%s\n", spank_strerror(spank_rc));
        } else {
                fprintf(stderr, "Job argv: ");
                for (int i = 0; i < job_argc; i++) {
                        fprintf(stderr, "%s ", job_argv[i]);
                }
                fprintf(stderr, "\n");
        }

        return 0;
}

int slurm_spank_exit(spank_t sHandle, int argc, char **argv)
{
        return 0;
}

$ cc -g -O0 -shared `pkg-config --libs --cflags slurm` test_argv.c -o test_argv.so

# Plugstack line
optional /home/users/adangelo/test_argv.so

$ srun -n2 --bcast=/home/users/adangelo/ ./a.out argv1 argv2
Job argv: /home/users/adangelo/ argv1 argv2
Comment 1 Ben Roberts 2023-05-08 16:02:22 MDT
Hi Andrew,

I have a quick question about this.  Was this working at one time and stopped, or is this something you found as you were working on a new spank plugin?
Comment 2 Andrew D'Angelo 2023-05-09 08:20:36 MDT
We have been using `spank_get_item` for a while now in one of our debugger products' Slurm plugin, but haven't tried using the `--bcast` option until now.
Comment 19 Benjamin Witham 2023-06-26 09:03:36 MDT
Hello,

This issue has been fixed in commit a9fc6420be. It will be applied to the 23.02.4 release. I'll close this ticket now, but feel free to reopen this if you have any questions about the patch.
Comment 20 Andrew D'Angelo 2024-02-14 16:14:59 MST
*** Ticket 18988 has been marked as a duplicate of this ticket. ***