Ticket 721

Summary: mismatch between mysql password and munge socket
Product: Slurm Reporter: vecsys.msf
Component: AccountingAssignee: David Bigagli <david>
Status: OPEN --- QA Contact:
Severity: 6 - No support contract    
Priority: --- CC: tevend
Version: 2.6.7   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description vecsys.msf 2014-04-15 21:14:20 MDT
Hi
I setup the mysql user slurm password to PASSWORD.
When running sbatch, Munge uses the word PASSWORD as the communication socket !!!
To make thing works I set mysql slurm user password to "/var/run/munge/munge.socket.2", AccountingStoragePass to the same value and everything works fine now. Mysql & Munge are happy.
Regards
Marc


config:
AccountingStorageType=accounting_storage/mysql
AccountingStorageHost=msfdev-msf.AAA.dmz
AccountingStorageEnforce=associations,limits,qos
AccountingStorageUser=slurm
AccountingStoragePass=PASSWORD

command
sbatch -o /home/MSF/slurm/%j.log -e /home/MSF/slurm/%j.log -M msf -A transcription  /home/MSF/slurm/test.sh
sbatch: error: Munge encode failed: Failed to access "PASSWORD": No such file or directory (retrying ...)
sbatch: error: Munge encode failed: Failed to access "PASSWORD": No such file or directory (retrying ...)
sbatch: error: Munge encode failed: Failed to access "PASSWORD": No such file or directory
sbatch: error: authentication: Socket communication error
sbatch: error: Batch job submission failed: Protocol authentication error
Comment 1 David Bigagli 2014-04-25 06:41:58 MDT
Cannot reproduce.

David
Comment 2 Steven DuChene 2021-07-26 22:26:53 MDT
I am able to reproduce this exact condition.
I have to set my StoragePass=/var/run/munge/munge.socket.2 in slurmdb.conf and and AccountingStoragePass=/var/run/munge/munge.socket.2 in slurm.conf the user password in mariadb to the same value to get sacct to function.

If I set these two value in the conf files to an actual password that matches the user password in mariadb I get the following error when I run sacct

lockhart-login1:~ # sacct
sacct: error: If munged is up, restart with --num-threads=10
sacct: error: Munge encode failed: Failed to access "PASSWORD": No such file or directory
sacct: error: slurm_send_node_msg: g_slurm_auth_create: REQUEST_PERSIST_INIT has authentication error: Invalid authentication credential
sacct: error: slurm_persist_conn_open: failed to send persistent connection init message to localhost:6819
sacct: error: Sending PersistInit msg: Protocol authentication error
sacct: error: Problem talking to the database: Protocol authentication error