We found this problem from our ATP Slurm plugin, which uses `spank_get_item` to find the name of the job binary that is being launched. When the `--bcast=<dir>` option is used to ship the job binary to a directory, this causes the first entry in the `argv` array to be the setting of `--bcast` instead of the name or path of the binary. Below is a small reproducer. #define _GNU_SOURCE #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <slurm/spank.h> #define MAJOR_VER 1 #define MINOR_VER 0 #define FULL_VERSION ((MAJOR_VER * 1000) + MINOR_VER) SPANK_PLUGIN(launch/test, FULL_VERSION) int slurm_spank_local_user_init(spank_t sHandle, int argc, char **argv) { int job_argc = 0; char const** job_argv = NULL; int spank_rc = spank_get_item(sHandle, S_JOB_ARGV, &job_argc, &job_argv); if (spank_rc != ESPANK_SUCCESS) { slurm_info("%s\n", spank_strerror(spank_rc)); } else { fprintf(stderr, "Job argv: "); for (int i = 0; i < job_argc; i++) { fprintf(stderr, "%s ", job_argv[i]); } fprintf(stderr, "\n"); } return 0; } int slurm_spank_exit(spank_t sHandle, int argc, char **argv) { return 0; } $ cc -g -O0 -shared `pkg-config --libs --cflags slurm` test_argv.c -o test_argv.so # Plugstack line optional /home/users/adangelo/test_argv.so $ srun -n2 --bcast=/home/users/adangelo/ ./a.out argv1 argv2 Job argv: /home/users/adangelo/ argv1 argv2
Hi Andrew, I have a quick question about this. Was this working at one time and stopped, or is this something you found as you were working on a new spank plugin?
We have been using `spank_get_item` for a while now in one of our debugger products' Slurm plugin, but haven't tried using the `--bcast` option until now.
Hello, This issue has been fixed in commit a9fc6420be. It will be applied to the 23.02.4 release. I'll close this ticket now, but feel free to reopen this if you have any questions about the patch.
*** Ticket 18988 has been marked as a duplicate of this ticket. ***