Ticket 5473 - slurmctld segfaulted in validate_jobs_on_node
Summary: slurmctld segfaulted in validate_jobs_on_node
Status: RESOLVED DUPLICATE of ticket 5457
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmctld (show other tickets)
Version: 17.11.7
Hardware: Linux Linux
: 2 - High Impact
Assignee: Jason Booth
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2018-07-24 11:05 MDT by Steve Ford
Modified: 2018-07-30 17:12 MDT (History)
0 users

See Also:
Site: MSU
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
gdb thread apply all bt (237.59 KB, text/plain)
2018-07-24 11:05 MDT, Steve Ford
Details
Slurmctld log (1.35 MB, text/plain)
2018-07-24 11:05 MDT, Steve Ford
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Steve Ford 2018-07-24 11:05:34 MDT
Created attachment 7390 [details]
gdb thread apply all bt

Our slurmctl daemon has segfaulted twice today with a backtrace that shows a call to __assert_fail from within validate_jobs_on_node.
Comment 1 Steve Ford 2018-07-24 11:05:59 MDT
Created attachment 7391 [details]
Slurmctld log
Comment 3 Jason Booth 2018-07-30 17:12:39 MDT
Marking as a duplicate to 5457

*** This ticket has been marked as a duplicate of ticket 5457 ***