Ticket 11789

Summary: 'scancel --wckey=test' segfaults
Product: Slurm Reporter: Bruno Mundim <bmundim>
Component: User CommandsAssignee: Scott Hilton <scott>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 20.02.7   
Hardware: Linux   
OS: Linux   
Site: SciNet Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 20.11.8 21.08.0pre1 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Bruno Mundim 2021-06-08 14:41:01 MDT
We are considering the configuration for wckeys and noticed that when we run the following command:

scancel --wckey=test

scancel segfaults. Note that at the moment we have not configured wckeys and were just testing how the slurm commands deal with that option.

I do have a core file if you think it is useful for you to debug.

Thanks,
Bruno.
Comment 1 Bruno Mundim 2021-06-08 14:56:35 MDT
Running scancel and the core file with --wckey=test option:

(gdb) run --wckey=test
Starting program: /opt/slurm/bin/scancel --wckey=test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
scancel: Linear node selection plugin loaded with argument 16
scancel: Cray/Aries node selection plugin loaded
scancel: select/cons_tres loaded with argument 16
scancel: select/cons_res loaded with argument 16

Program received signal SIGSEGV, Segmentation fault.
_filter_job_records () at scancel.c:390
390     in scancel.c
(gdb) bt
#0  _filter_job_records () at scancel.c:390
#1  _proc_cluster () at scancel.c:165
#2  0x00000000004047f8 in main (argc=2, argv=0x7fffffffcea8) at scancel.c:121

Thanks,
Bruno.
Comment 3 Scott Hilton 2021-06-08 17:06:23 MDT
Bruno, 

Thanks for pointing this out. I was able to quickly find the issue. It happens when a job doesn't have a wckey and scancel tries to read from a NULL string. 

I'll send the fix over to be reviewed.

-Scott
Comment 8 Scott Hilton 2021-06-11 11:05:07 MDT
Bruno,

The patch is in github with commit 4c953b8998 and should be in 20.11.8

-Scott
Comment 9 Bruno Mundim 2021-06-11 21:27:19 MDT
Thanks!