Ticket 9912

Summary: Execute the “sacctmgr show assoc”, the slurmdbd service will down
Product: Slurm Reporter: wang. pengfei <1150680039>
Component: slurmdbdAssignee: Jacob Jenson <jacob>
Status: RESOLVED INVALID QA Contact:
Severity: 6 - No support contract    
Priority: ---    
Version: 20.02.5   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description wang. pengfei 2020-09-29 21:26:42 MDT
Frequent execution of the creation and deletion of users, user groups, accounts, and relationships on Linux will cause the slurmdbd service to hang. After restarting the slurmdbd service, execute the sacctmgr show assoc command, the slurmdbd service will continue to hang. Then look at the core.9623 segment error under /var/logs/slurm.
Comment 1 wang. pengfei 2020-10-05 01:20:43 MDT
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/slurmdbd'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fe83aba5327 in ____strtoull_l_internal () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install slurm-slurmdbd-20.02.3-1.el7.x86_64
(gdb) bt
#0  0x00007fe83aba5327 in ____strtoull_l_internal () from /lib64/libc.so.6
#1  0x00007fe83a0ecdca in _cluster_get_assocs (mysql_conn=mysql_conn@entry=0x7fe8200096b0, user=user@entry=0x7fe8370ee930, assoc_cond=assoc_cond@entry=0x7fe82000c3d0, 
    cluster_name=0x7fe82c00cb80 "cluster_ce_jn", fields=<optimized out>, sent_extra=<optimized out>, is_admin=is_admin@entry=true, sent_list=sent_list@entry=0x7fe820008440)
    at as_mysql_assoc.c:2117
#2  0x00007fe83a0ee457 in as_mysql_get_assocs (mysql_conn=0x7fe8200096b0, uid=<optimized out>, assoc_cond=0x7fe82000c3d0) at as_mysql_assoc.c:3465
#3  0x00007fe83a0da15b in acct_storage_p_get_assocs (mysql_conn=<optimized out>, uid=<optimized out>, assoc_cond=<optimized out>) at accounting_storage_mysql.c:3134
#4  0x00007fe83b651f45 in acct_storage_g_get_assocs (db_conn=0x7fe8200096b0, uid=0, assoc_cond=0x7fe82000c3d0) at slurm_accounting_storage.c:707
#5  0x0000000000406ee6 in _get_assocs (slurmdbd_conn=slurmdbd_conn@entry=0x7fe830001070, msg=msg@entry=0x7fe8370eeec0, out_buffer=out_buffer@entry=0x7fe8370eeeb8, 
    uid=uid@entry=0x7fe8370eeea4) at proc_req.c:1277
#6  0x0000000000409730 in proc_req (conn=0x7fe830001070, msg=0x7fe8370eeec0, out_buffer=0x7fe8370eeeb8, uid=0x7fe8370eeea4) at proc_req.c:340
#7  0x00007fe83b66eac7 in _process_service_connection (arg=0x7fe830001070, persist_conn=0x7fe830000900) at slurm_persist_conn.c:275
#8  _service_connection (arg=0x7fe830001180) at slurm_persist_conn.c:340
#9  0x00007fe83af3edd5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fe83ac67ead in clone () from /lib64/libc.so.6