Ticket 7727 - Remove unnecessary re-definition of select_plugin_type in src/common/gres.c
Summary: Remove unnecessary re-definition of select_plugin_type in src/common/gres.c
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmctld (show other tickets)
Version: 19.05.2
Hardware: Linux Linux
: C - Contributions
Assignee: Danny Auble
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2019-09-11 10:17 MDT by Robert Tweedy
Modified: 2019-09-16 11:42 MDT (History)
2 users (show)

See Also:
Site: -Other-
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 19.05.3 20.02.0-pre1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
Patch file to comment out re-definition of select_plugin_type variable. (381 bytes, patch)
2019-09-11 10:17 MDT, Robert Tweedy
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Robert Tweedy 2019-09-11 10:17:01 MDT
Created attachment 11541 [details]
Patch file to comment out re-definition of select_plugin_type variable.

Overview:
In the gres_plugin_job_revalidate function located in src/common/gres.c, the variable 'select_plugin_type' is defined again with a value of NO_VAL after having previously been defined elsewhere in the code. This re-defined value is then immediately tested to see if its value is SELECT_TYPE_CONS_TRES, which will always fail due to having been set to NO_VAL.

Steps to reproduce:
1) Submit a job using an option supported only by the cons_tres plugin (such as requesting a GPU using -G 1).
2) Restart slurmctld (or run 'scontrol reconfig').

Actual results:
Jobs using the cons_tres options are stopped upon a slurmctld restart or 'scontrol reload' even if the select plugin has not been changed from "select/cons_tres" in slurm.conf. The error provided by slurmctld's log is "error: Aborting JobId=<jobID> due to use of unsupported GRES options".

Expected results:
Jobs using the cons_tres options contine to run (assuming that whatever change made to slurm.conf wouldn't otherwise cause the job to be stopped).

Attached is a patch file which comments out the line in question. With this line removed, the function works as expected and only stops jobs if the select plugin has been changed from select/cons_tres to another one that doesn't support the new tres options, like select/cons_res.
Comment 1 Marc 2019-09-11 11:27:04 MDT
FYI, in a related bug we filed, I had traced down the problem so the same line of code and was about to propose the same fix before seeing this ticket.
Comment 3 Danny Auble 2019-09-16 11:41:00 MDT
Comment on attachment 11541 [details]
Patch file to comment out re-definition of select_plugin_type variable.

Thanks Robert, a commit that removes this line completely is in 19.05.3.  Commit 2abd2a3d8d6bdc.
Comment 4 Danny Auble 2019-09-16 11:42:00 MDT
Fixed, thanks for bringing it to our attention!

Please reopen if you find something else on this.