Created attachment 11541 [details] Patch file to comment out re-definition of select_plugin_type variable. Overview: In the gres_plugin_job_revalidate function located in src/common/gres.c, the variable 'select_plugin_type' is defined again with a value of NO_VAL after having previously been defined elsewhere in the code. This re-defined value is then immediately tested to see if its value is SELECT_TYPE_CONS_TRES, which will always fail due to having been set to NO_VAL. Steps to reproduce: 1) Submit a job using an option supported only by the cons_tres plugin (such as requesting a GPU using -G 1). 2) Restart slurmctld (or run 'scontrol reconfig'). Actual results: Jobs using the cons_tres options are stopped upon a slurmctld restart or 'scontrol reload' even if the select plugin has not been changed from "select/cons_tres" in slurm.conf. The error provided by slurmctld's log is "error: Aborting JobId=<jobID> due to use of unsupported GRES options". Expected results: Jobs using the cons_tres options contine to run (assuming that whatever change made to slurm.conf wouldn't otherwise cause the job to be stopped). Attached is a patch file which comments out the line in question. With this line removed, the function works as expected and only stops jobs if the select plugin has been changed from select/cons_tres to another one that doesn't support the new tres options, like select/cons_res.
FYI, in a related bug we filed, I had traced down the problem so the same line of code and was about to propose the same fix before seeing this ticket.
Comment on attachment 11541 [details] Patch file to comment out re-definition of select_plugin_type variable. Thanks Robert, a commit that removes this line completely is in 19.05.3. Commit 2abd2a3d8d6bdc.
Fixed, thanks for bringing it to our attention! Please reopen if you find something else on this.