| Summary: | Fix for bug in sstat command (batch steps) | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Jacek Budzowski <j.budzowski> |
| Component: | Contributions | Assignee: | Tim Wickberg <tim> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 15.08.4 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | -Other- | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 15.08.13 16.05.3 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: | Patch for src/sstat/options.c | ||
Yep, that was definitely a mistake; you patch is added in as commit 59ae8600dd. Thanks! |
Created attachment 3295 [details] Patch for src/sstat/options.c This patch fixes support for 'batch' steps in sstat command. This bug concerns commit 74a7c5c71e1a843a518a865c1f991991f8691e38 when support for 'extern' step was introduced. It influences all versions since that commit. The bug can cause segfaults when using sstat to get 'batch' steps statistics for large amount of jobs: Program received signal SIGSEGV, Segmentation fault. 0x00002aaaaaedcbd0 in pthread_mutex_lock () from /lib64/libpthread.so.0 (gdb) bt #0 0x00002aaaaaedcbd0 in pthread_mutex_lock () from /lib64/libpthread.so.0 #1 0x0000000000449528 in list_iterator_create (l=0x0) at list.c:720 #2 0x0000000000427ef6 in _do_stat (jobid=2041584, stepid=4294967295, nodelist=0x80ecf0 " ", req_cpufreq_min=4294967294, req_cpufreq_max=4294967294, req_cpufreq_gov=4294967294) at sstat.c:147 #3 0x000000000042862c in main (argc=3, argv=0x7fffffff7448) at sstat.c:311 (gdb) select-frame 2 (gdb) info args jobid = 2041584 stepid = 4294967295 nodelist = 0x80ecf0 " " req_cpufreq_min = 4294967294 req_cpufreq_max = 4294967294 req_cpufreq_gov = 4294967294 (gdb) info locals step_stat_response = 0x947aa0 rc = 0 itr = 0x0 temp_stats = {act_cpufreq = 0, cpu_ave = 0, consumed_energy = 0, cpu_min = 4294967294, cpu_min_nodeid = 0, cpu_min_taskid = 0, disk_read_ave = 0, disk_read_max = 0, disk_read_max_nodeid = 0, disk_read_max_taskid = 0, disk_write_ave = 0, disk_write_max = 0, disk_write_max_nodeid = 0, disk_write_max_taskid = 0, pages_ave = 0, pages_max = 0, pages_max_nodeid = 0, pages_max_taskid = 0, rss_ave = 0, rss_max = 0, rss_max_nodeid = 0, rss_max_taskid = 0, vsize_ave = 0, vsize_max = 0, vsize_max_nodeid = 0, vsize_max_taskid = 0} step_stat = 0x0 ntasks = 0 tot_tasks = 0 hl = 0x962a10 Best regards, Jacek