Ticket 14511

Summary: cannot run slurm commands
Product: Slurm Reporter: John Thompson <jthompson>
Component: User CommandsAssignee: Jason Booth <jbooth>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 2 - High Impact    
Priority: ---    
Version: - Unsupported Older Versions   
Hardware: Linux   
OS: Linux   
Site: Albert Einstein Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description John Thompson 2022-07-11 16:48:25 MDT
(base) [johthompso@ajasper ~]$ srun -p normal --pty bash
srun: error: If munged is up, restart with --num-threads=10
srun: error: Munge encode failed: Failed to access "/var/run/munge/munge.socket.2": No such file or directory
srun: error: slurm_send_node_msg: g_slurm_auth_create: REQUEST_RESOURCE_ALLOCATION has authentication error: Invalid authentication credential
srun: error: Srun communication socket apparently being written to by something other than Slurm

<repeated four more times>

(base) [johthompso@ajasper ~]$ sinfo
sinfo: error: If munged is up, restart with --num-threads=10
sinfo: error: Munge encode failed: Failed to access "/var/run/munge/munge.socket.2": No such file or directory
sinfo: error: slurm_send_node_msg: g_slurm_auth_create: REQUEST_PARTITION_INFO has authentication error: Invalid authentication credential
slurm_load_partitions: Protocol authentication error
(base) [johthompso@ajasper ~]$


This is breaking on a number of machines, but is not universal.
Comment 1 Jason Booth 2022-07-11 16:52:08 MDT
What is the status of munge of those systems?

> $ systemctl status munge
> $ ps aux | grep munge

Have you tried to restart munge?
Comment 2 John Thompson 2022-07-11 17:06:26 MDT
Jason,



Huh.  Munge daemon wasn’t running.  Got it working.   Had an out-of-date
slurm.conf file as well.



Thanks for your assistance, case can be closed.



John





*From:* bugs@schedmd.com <bugs@schedmd.com>
*Sent:* Monday, July 11, 2022 3:52 PM
*To:* jthompson@penguincomputing.com
*Subject:* [Bug 14511] cannot run slurm commands



*Comment # 1 <https://bugs.schedmd.com/show_bug.cgi?id=14511#c1> on bug
14511 <https://bugs.schedmd.com/show_bug.cgi?id=14511> from Jason Booth
<jbooth@schedmd.com> *

What is the status of munge of those systems?



> $ systemctl status munge

> $ ps aux | grep munge



Have you tried to restart munge?

------------------------------

You are receiving this mail because:

   - You reported the bug.
Comment 3 Jason Booth 2022-07-11 17:08:49 MDT
Resolving