Ticket 17493

Summary: Munge Broken After Upgrade
Product: Slurm Reporter: Jesse James <jesse>
Component: nss_slurmAssignee: Jacob Jenson <jacob>
Status: OPEN --- QA Contact:
Severity: 6 - No support contract    
Priority: ---    
Version: 23.11.x   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Jesse James 2023-08-22 12:35:39 MDT
Hi guys,

I've tried all the resolutions online for this but had no luck.  I upgraded Slurm on all nodes, but only one is having this issue.  It's the most important node.  Here's the error:

 sinfo
sinfo: error: If munged is up, restart with --num-threads=10
sinfo: error: Munge encode failed: Failed to access "/var/run/munge/munge.socket.2": No such file or directory
sinfo: error: slurm_send_node_msg: g_slurm_auth_create: REQUEST_PARTITION_INFO has authentication error: Invalid authentication credential
slurm_load_partitions: Protocol authentication error

Any help would be appreciated.  

Thank you!