In the SPANK plugin pyxis (https://github.com/NVIDIA/pyxis) we use the spank_option_register function of the SPANK API to register our options callbacks (of type https://github.com/SchedMD/slurm/blob/slurm-20-11-1-1/slurm/spank.h#L209-L216) For example, we check if one argument is not passed an empty string here: https://github.com/NVIDIA/pyxis/blob/c8f732fe77b0da39a1fe1a44d87897e265872837/args.c#L116-L119 And it aborst the job launch as expected in Slurm 20.02: > $ srun -A admin -p admin -N1 --container-image= hostname ; echo $? > srun: error: pyxis: --container-image: argument required > srun: error: Invalid --container-image argument: > 255 After upgrading to 20.11 on a test system, we noticed that we can't filter out invalid arguments, the job proceeds and executes normally: > $ srun -A admin -p admin -N1 --container-image= hostname ; echo $? > srun: error: pyxis: --container-image: argument required > srun: error: Invalid --container-image argument: > sc-sdgx-735 > 0 After a git bisect: > $ git bisect good > e9bb9ce288c6284a801b2bd4422d519da6751783 is the first bad commit > commit e9bb9ce288c6284a801b2bd4422d519da6751783 > Author: Tim Wickberg <tim@schedmd.com> > Date: Fri Oct 23 02:07:43 2020 -0600 > > Quietly error out when an option is unknown. > Let the rest of the argument handling figure this out. > src/common/plugstack.c | 2 +- > src/common/slurm_opt.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) It's unclear to me what is meant by "Let the rest of the argument handling figure this out" since spank_process_option is only called here in the Slurm code and the return value of slurm_process_option is ignored in most places. Given that this is not mentioned in the release notes and that the commit message mentions "... when an option is unknown", it looks like this new behavior was not intended. Here the option is not unknown, it's a valid SPANK option rejected by our callback.
Raising this to sev2. This is a blocker for us - we can't upgrade to 20.11 until this is fixed.
The patch is under review. Do you want to give it a try being aware that it's not yet approved/merged? Cheera, Marcin
No that's ok, we can wait. I just wanted to bump this to make sure it didn't get lost in the noise. Thanks for looking into it!
Do you expect the fix be merged before 20.11.3?
Luke, The patch passed our review and got merged ahead of Slurm 20.11.3[1] cheers, Marcin [1]https://github.com/SchedMD/slurm/commit/30ff31b24eaac54a643abf3c3c40c8a7d174bca8
That's good news, thanks! We'll give it a try as soon as 20.11.3 is released.
Verified as fixed in 20.11.3.