Ticket 2890

Summary: Fix for bug in sstat command (batch steps)
Product: Slurm Reporter: Jacek Budzowski <j.budzowski>
Component: ContributionsAssignee: Tim Wickberg <tim>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 15.08.4   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 15.08.13 16.05.3 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: Patch for src/sstat/options.c

Description Jacek Budzowski 2016-07-12 01:07:50 MDT
Created attachment 3295 [details]
Patch for src/sstat/options.c

This patch fixes support for 'batch' steps in sstat command. This bug concerns commit 74a7c5c71e1a843a518a865c1f991991f8691e38 when support for 'extern' step was introduced. It influences all versions since that commit.

The bug can cause segfaults when using sstat to get 'batch' steps statistics for large amount of jobs:

Program received signal SIGSEGV, Segmentation fault.
0x00002aaaaaedcbd0 in pthread_mutex_lock () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00002aaaaaedcbd0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x0000000000449528 in list_iterator_create (l=0x0) at list.c:720
#2  0x0000000000427ef6 in _do_stat (jobid=2041584, stepid=4294967295, nodelist=0x80ecf0 " ", req_cpufreq_min=4294967294, req_cpufreq_max=4294967294, req_cpufreq_gov=4294967294) at sstat.c:147
#3  0x000000000042862c in main (argc=3, argv=0x7fffffff7448) at sstat.c:311
(gdb) select-frame 2
(gdb) info args
jobid = 2041584
stepid = 4294967295
nodelist = 0x80ecf0 " "
req_cpufreq_min = 4294967294
req_cpufreq_max = 4294967294
req_cpufreq_gov = 4294967294
(gdb) info locals
step_stat_response = 0x947aa0
rc = 0
itr = 0x0
temp_stats = {act_cpufreq = 0, cpu_ave = 0, consumed_energy = 0, cpu_min = 4294967294, cpu_min_nodeid = 0, cpu_min_taskid = 0, disk_read_ave = 0, disk_read_max = 0, disk_read_max_nodeid = 0, disk_read_max_taskid = 0, disk_write_ave = 0, 
  disk_write_max = 0, disk_write_max_nodeid = 0, disk_write_max_taskid = 0, pages_ave = 0, pages_max = 0, pages_max_nodeid = 0, pages_max_taskid = 0, rss_ave = 0, rss_max = 0, rss_max_nodeid = 0, rss_max_taskid = 0, vsize_ave = 0, 
  vsize_max = 0, vsize_max_nodeid = 0, vsize_max_taskid = 0}
step_stat = 0x0
ntasks = 0
tot_tasks = 0
hl = 0x962a10


Best regards,
Jacek
Comment 1 Tim Wickberg 2016-07-12 08:38:16 MDT
Yep, that was definitely a mistake; you patch is added in as commit 59ae8600dd.

Thanks!