Ticket 7893 - slurmctld core dump with acct_gather_infiniband/ofed
Summary: slurmctld core dump with acct_gather_infiniband/ofed
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 19.05.2
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: Gavin D. Howard
QA Contact:
URL:
: 7940 (view as ticket list)
Depends on:
Blocks:
 
Reported: 2019-10-08 11:02 MDT by Mathis Clayer
Modified: 2019-10-25 16:23 MDT (History)
1 user (show)

See Also:
Site: Atos/Eviden Sites
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: Grenoble
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 19.05.4
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
slurm.conf (4.03 KB, text/plain)
2019-10-10 07:02 MDT, Mathis Clayer
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Mathis Clayer 2019-10-08 11:02:03 MDT
Hello,
With the option AcctGatherInfinibandType=acct_gather_infiniband/ofed, and without the file acct_gather.conf. I got a slurmctld coredump:

 slurmctld: debug2: No acct_gather.conf file (/usr/local/mathis/19.05/etc/acct_gather.conf)
slurmctld: debug2: before rc
slurmctld: debug2: after rc
slurmctld: debug2:  in s_p_pack_hashtbl
slurmctld: debug2:  in s_p_pack_hashtbl
slurmctld: error: parse_config.c:2190: s_p_pack_hashtbl(): Assertion (p) failed.
Aborted (core dumped)


If I create the file acct_gather.conf, I don't have a core dump anymore.
Comment 1 Gavin D. Howard 2019-10-08 13:48:00 MDT
Thank you for the report. I am looking into it.
Comment 2 Gavin D. Howard 2019-10-09 11:20:20 MDT
I am making progress, but I have a request: can you send me your slurm.conf?
Comment 3 Mathis Clayer 2019-10-10 07:02:34 MDT
Created attachment 11900 [details]
slurm.conf
Comment 5 Gavin D. Howard 2019-10-10 11:40:17 MDT
Good news! I have found the bug and have started the review process for a patch.
Comment 9 Gavin D. Howard 2019-10-16 12:21:02 MDT
*** Ticket 7940 has been marked as a duplicate of this ticket. ***
Comment 16 Gavin D. Howard 2019-10-25 16:23:20 MDT
This bug has been fixed, and the fix will be in 19.05.4.

I am closing this bug, but feel free to reopen if 19.05.4 has the same problem.