| Summary: | Srun --bcast option causes incorrect result from spank_get_item with S_JOB_ARGV | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Andrew D'Angelo <andrew.dangelo> |
| Component: | User Commands | Assignee: | Benjamin Witham <benjamin.witham> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | ||
| Version: | 23.02.1 | ||
| Hardware: | Cray Shasta | ||
| OS: | Linux | ||
| Site: | CRAY | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | Cray Internal |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 23.02.4 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
Hi Andrew, I have a quick question about this. Was this working at one time and stopped, or is this something you found as you were working on a new spank plugin? We have been using `spank_get_item` for a while now in one of our debugger products' Slurm plugin, but haven't tried using the `--bcast` option until now. Hello, This issue has been fixed in commit a9fc6420be. It will be applied to the 23.02.4 release. I'll close this ticket now, but feel free to reopen this if you have any questions about the patch. *** Ticket 18988 has been marked as a duplicate of this ticket. *** |
We found this problem from our ATP Slurm plugin, which uses `spank_get_item` to find the name of the job binary that is being launched. When the `--bcast=<dir>` option is used to ship the job binary to a directory, this causes the first entry in the `argv` array to be the setting of `--bcast` instead of the name or path of the binary. Below is a small reproducer. #define _GNU_SOURCE #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <slurm/spank.h> #define MAJOR_VER 1 #define MINOR_VER 0 #define FULL_VERSION ((MAJOR_VER * 1000) + MINOR_VER) SPANK_PLUGIN(launch/test, FULL_VERSION) int slurm_spank_local_user_init(spank_t sHandle, int argc, char **argv) { int job_argc = 0; char const** job_argv = NULL; int spank_rc = spank_get_item(sHandle, S_JOB_ARGV, &job_argc, &job_argv); if (spank_rc != ESPANK_SUCCESS) { slurm_info("%s\n", spank_strerror(spank_rc)); } else { fprintf(stderr, "Job argv: "); for (int i = 0; i < job_argc; i++) { fprintf(stderr, "%s ", job_argv[i]); } fprintf(stderr, "\n"); } return 0; } int slurm_spank_exit(spank_t sHandle, int argc, char **argv) { return 0; } $ cc -g -O0 -shared `pkg-config --libs --cflags slurm` test_argv.c -o test_argv.so # Plugstack line optional /home/users/adangelo/test_argv.so $ srun -n2 --bcast=/home/users/adangelo/ ./a.out argv1 argv2 Job argv: /home/users/adangelo/ argv1 argv2