| Summary: | slurmd segfault in stepd_completion/free_buf/slurm_xfree | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | David Gloe <david.gloe> |
| Component: | slurmd | Assignee: | Felip Moll <felip.moll> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | ||
| Version: | 17.11.0 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: | https://bugs.schedmd.com/show_bug.cgi?id=4491 | ||
| Site: | CRAY | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | Cray Internal |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 17.11.1 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: |
slurmd backtrace from gdb
slurmd log |
||
Created attachment 5694 [details]
slurmd log
Just a friendly reminder to attach the backtrace when you get a minute... the logs are a good start but that'd help speed up the fix. (In reply to Tim Wickberg from comment #2) > Just a friendly reminder to attach the backtrace when you get a minute... > the logs are a good start but that'd help speed up the fix. The backtrace is already attached, at https://bugs.schedmd.com/attachment.cgi?id=5693 Ah, sorry, my fault. Missed that on the first comment. Felip - can you work through this on Friday? Hi David, This is just a quick update to inform you that we have already identified the problem and we have a patch pending for review and commit. Will be fixed officially asap. Thanks Felip M Fix for this issue is committed in 973ac2017280246ce0c7741c6d9e25b41d903c9f. It will be available in 17.11.1 and up. Thanks for reporting, Felip M |
Created attachment 5693 [details] slurmd backtrace from gdb We've just experienced a slurmd segfault on 17.11.0 in stepd_completion. I'll attach the backtrace and slurmd log. This same segfault happened on two nodes. Looks like the segfault for this node happened at 2017-12-07 12:17:21.