Ticket 4484 - slurmd segfault in stepd_completion/free_buf/slurm_xfree
Summary: slurmd segfault in stepd_completion/free_buf/slurm_xfree
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmd (show other tickets)
Version: 17.11.0
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: Felip Moll
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2017-12-07 11:36 MST by David Gloe
Modified: 2017-12-20 02:58 MST (History)
0 users

See Also:
Site: CRAY
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: Cray Internal
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 17.11.1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
slurmd backtrace from gdb (4.11 KB, text/plain)
2017-12-07 11:36 MST, David Gloe
Details
slurmd log (14.10 MB, text/x-log)
2017-12-07 11:36 MST, David Gloe
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description David Gloe 2017-12-07 11:36:28 MST
Created attachment 5693 [details]
slurmd backtrace from gdb

We've just experienced a slurmd segfault on 17.11.0 in stepd_completion. I'll attach the backtrace and slurmd log. This same segfault happened on two nodes.

Looks like the segfault for this node happened at 2017-12-07 12:17:21.
Comment 1 David Gloe 2017-12-07 11:36:55 MST
Created attachment 5694 [details]
slurmd log
Comment 2 Tim Wickberg 2017-12-07 12:02:08 MST
Just a friendly reminder to attach the backtrace when you get a minute... the logs are a good start but that'd help speed up the fix.
Comment 3 David Gloe 2017-12-07 12:06:28 MST
(In reply to Tim Wickberg from comment #2)
> Just a friendly reminder to attach the backtrace when you get a minute...
> the logs are a good start but that'd help speed up the fix.

The backtrace is already attached, at https://bugs.schedmd.com/attachment.cgi?id=5693
Comment 4 Tim Wickberg 2017-12-07 13:00:02 MST
Ah, sorry, my fault. Missed that on the first comment.

Felip - can you work through this on Friday?
Comment 11 Felip Moll 2017-12-19 02:06:22 MST
Hi David,

This is just a quick update to inform you that we have already identified the problem and we have a patch pending for review and commit. Will be fixed officially asap.

Thanks
Felip M
Comment 13 Felip Moll 2017-12-20 02:58:35 MST
Fix for this issue is committed in 973ac2017280246ce0c7741c6d9e25b41d903c9f.

It will be available in 17.11.1 and up.

Thanks for reporting,
Felip M