Ticket 721 - mismatch between mysql password and munge socket
Summary: mismatch between mysql password and munge socket
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 2.6.7
Hardware: Linux Linux
: 6 - No support contract
Assignee: David Bigagli
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2014-04-15 21:14 MDT by vecsys.msf
Modified: 2021-07-27 05:32 MDT (History)
1 user (show)

See Also:
Site: -Other-
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description vecsys.msf 2014-04-15 21:14:20 MDT
Hi
I setup the mysql user slurm password to PASSWORD.
When running sbatch, Munge uses the word PASSWORD as the communication socket !!!
To make thing works I set mysql slurm user password to "/var/run/munge/munge.socket.2", AccountingStoragePass to the same value and everything works fine now. Mysql & Munge are happy.
Regards
Marc


config:
AccountingStorageType=accounting_storage/mysql
AccountingStorageHost=msfdev-msf.AAA.dmz
AccountingStorageEnforce=associations,limits,qos
AccountingStorageUser=slurm
AccountingStoragePass=PASSWORD

command
sbatch -o /home/MSF/slurm/%j.log -e /home/MSF/slurm/%j.log -M msf -A transcription  /home/MSF/slurm/test.sh
sbatch: error: Munge encode failed: Failed to access "PASSWORD": No such file or directory (retrying ...)
sbatch: error: Munge encode failed: Failed to access "PASSWORD": No such file or directory (retrying ...)
sbatch: error: Munge encode failed: Failed to access "PASSWORD": No such file or directory
sbatch: error: authentication: Socket communication error
sbatch: error: Batch job submission failed: Protocol authentication error
Comment 1 David Bigagli 2014-04-25 06:41:58 MDT
Cannot reproduce.

David
Comment 2 Steven DuChene 2021-07-26 22:26:53 MDT
I am able to reproduce this exact condition.
I have to set my StoragePass=/var/run/munge/munge.socket.2 in slurmdb.conf and and AccountingStoragePass=/var/run/munge/munge.socket.2 in slurm.conf the user password in mariadb to the same value to get sacct to function.

If I set these two value in the conf files to an actual password that matches the user password in mariadb I get the following error when I run sacct

lockhart-login1:~ # sacct
sacct: error: If munged is up, restart with --num-threads=10
sacct: error: Munge encode failed: Failed to access "PASSWORD": No such file or directory
sacct: error: slurm_send_node_msg: g_slurm_auth_create: REQUEST_PERSIST_INIT has authentication error: Invalid authentication credential
sacct: error: slurm_persist_conn_open: failed to send persistent connection init message to localhost:6819
sacct: error: Sending PersistInit msg: Protocol authentication error
sacct: error: Problem talking to the database: Protocol authentication error