Ticket 959 - Slurmdbd can not startup after slurm version up (2.5.4 →2.6.5) at CRAY Internal Machine.
Summary: Slurmdbd can not startup after slurm version up (2.5.4 →2.6.5) at CRAY Intern...
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 2.6.5
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: David Bigagli
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2014-07-13 21:14 MDT by toru matsuoka
Modified: 2014-07-24 05:49 MDT (History)
1 user (show)

See Also:
Site: CRAY
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name: CRAY CS300
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
slurmdb.conf file and slurm.conf (4.39 KB, application/octet-stream)
2014-07-13 21:14 MDT, toru matsuoka
Details
slurmdbd.conf file (667 bytes, application/octet-stream)
2014-07-14 21:26 MDT, toru matsuoka
Details
attachment-10287-0.html (6.07 KB, text/html)
2014-07-15 01:02 MDT, Danny Auble
Details
attachment-10554-0.html (3.40 KB, text/html)
2014-07-16 00:38 MDT, Danny Auble
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description toru matsuoka 2014-07-13 21:14:14 MDT
Created attachment 1054 [details]
slurmdb.conf file and slurm.conf

We done update to SLURM 2.6.5 from SLURM 2.5.4 by the internal machine in CRAY.
 
However, there is a problem in connection with handling of SLURM DB and munge.

Please let me know what kind of problem can be considered by the following trouble situation. 

■ We have now a problem which cannot use sacct/sacctmgr/sreport/command. 

[root@spm2 ~]# sacct
sacct: error: Problem talking to the database: Connection refused

[root@spm2 ~]# sacctmgr
sacctmgr: error: Problem talking to the database: Connection refused

[root@spm2 ~]# sreport
sreport: error: Problem talking to the database: Connection refused

■ slurmdb can not startup. 

[root@spm2 ~]# /etc/init.d/slurmdbd restart
stopping slurmdbd:                                         [failure]
slurmdbd (pid 22251) is running...
slurmdbd (pid 22251) is running...
slurmdbd (pid 22251) is running...
slurmdbd (pid 22251) is runnning...
starting slurmdbd:                                         [failure]

■ sinfo /sbatch/srun/scancel can be used satisfactorily.


[root@spm2 ~]# sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      1  down* hedwig
debug*       up   infinite      1   idle spm2


[root@spm2 ~]# squeue -l
Mon Jul 14 15:11:31 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)

[root@spm2 ~]# ssh -l tmatsuoka 10.81.1.35
tmatsuoka@10.81.1.35's password:
Last login: Mon Jul 14 10:55:08 2014 from spm2

-bash-4.1$ sbatch test.sh
Submitted batch job 13

-bash-4.1$ sbatch test.sh
Submitted batch job 14

-bash-4.1$ sbatch test.sh
Submitted batch job 15

-bash-4.1$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      1  down* hedwig
debug*       up   infinite      1  alloc spm2

-bash-4.1$ squeue -l
Mon Jul 14 15:12:30 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
                14     debug  test.sh tmatsuok  PENDING       0:00 UNLIMITED      1 (Resources)
                15     debug  test.sh tmatsuok  PENDING       0:00 UNLIMITED      1 (Resources)
                13     debug  test.sh tmatsuok  RUNNING       0:20 UNLIMITED      1 spm2

-bash-4.1$ squeue -l
Mon Jul 14 15:12:36 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
                14     debug  test.sh tmatsuok  PENDING       0:00 UNLIMITED      1 (Resources)
                15     debug  test.sh tmatsuok  PENDING       0:00 UNLIMITED      1 (Resources)
                13     debug  test.sh tmatsuok  RUNNING       0:26 UNLIMITED      1 spm2

-bash-4.1$ squeue -l
Mon Jul 14 15:12:41 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
                15     debug  test.sh tmatsuok  PENDING       0:00 UNLIMITED      1 (Resources)
                14     debug  test.sh tmatsuok  RUNNING       0:01 UNLIMITED      1 spm2

-bash-4.1$ squeue -l
Mon Jul 14 15:12:47 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
                15     debug  test.sh tmatsuok  PENDING       0:00 UNLIMITED      1 (Resources)
                14     debug  test.sh tmatsuok  RUNNING       0:07 UNLIMITED      1 spm2

-bash-4.1$ squeue -l
Mon Jul 14 15:12:49 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
                15     debug  test.sh tmatsuok  PENDING       0:00 UNLIMITED      1 (Resources)
                14     debug  test.sh tmatsuok  RUNNING       0:09 UNLIMITED      1 spm2

-bash-4.1$ squeue -l
Mon Jul 14 15:12:52 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
                15     debug  test.sh tmatsuok  PENDING       0:00 UNLIMITED      1 (Resources)
                14     debug  test.sh tmatsuok  RUNNING       0:12 UNLIMITED      1 spm2

-bash-4.1$ squeue -l
Mon Jul 14 15:12:54 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
                15     debug  test.sh tmatsuok  PENDING       0:00 UNLIMITED      1 (Resources)
                14     debug  test.sh tmatsuok  RUNNING       0:14 UNLIMITED      1 spm2

-bash-4.1$ squeue -l
Mon Jul 14 15:12:57 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
                15     debug  test.sh tmatsuok  PENDING       0:00 UNLIMITED      1 (Resources)
                14     debug  test.sh tmatsuok  RUNNING       0:17 UNLIMITED      1 spm2

-bash-4.1$ squeue -l
Mon Jul 14 15:13:01 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
                15     debug  test.sh tmatsuok  PENDING       0:00 UNLIMITED      1 (Resources)
                14     debug  test.sh tmatsuok  RUNNING       0:21 UNLIMITED      1 spm2

-bash-4.1$ squeue -l
Mon Jul 14 15:13:08 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
                15     debug  test.sh tmatsuok  PENDING       0:00 UNLIMITED      1 (Resources)
                14     debug  test.sh tmatsuok  RUNNING       0:28 UNLIMITED      1 spm2

-bash-4.1$ squeue -l
Mon Jul 14 15:13:10 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
                15     debug  test.sh tmatsuok  PENDING       0:00 UNLIMITED      1 (Resources)
                14     debug  test.sh tmatsuok  RUNNING       0:30 UNLIMITED      1 spm2

-bash-4.1$ squeue -l
Mon Jul 14 15:13:19 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
                15     debug  test.sh tmatsuok  RUNNING       0:09 UNLIMITED      1 spm2

-bash-4.1$ squeue -l
Mon Jul 14 15:13:23 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
                15     debug  test.sh tmatsuok  RUNNING       0:13 UNLIMITED      1 spm2

-bash-4.1$ squeue -l
Mon Jul 14 15:13:31 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
                15     debug  test.sh tmatsuok  RUNNING       0:21 UNLIMITED      1 spm2

-bash-4.1$ squeue -l
Mon Jul 14 15:13:43 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)



■ we created the account DB and munge.key file at below procedure.

    [root@spm2 etc]# mysql 
     Welcome to the MySQL monitor.  Commands end with ; or \g. 
     Your MySQL connection id is 3 
     Server version: 5.1.71 Source distribution 

     Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved. 

     Oracle is a registered trademark of Oracle Corporation and/or its 
     affiliates. Other names may be trademarks of their respective 
     owners. 

     Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. 

     mysql> CREATE DATABASE slurm_acct_db; 
     Query OK, 1 row affected (0.02 sec) 

     mysql> grant all on slurm_acct_db.* TO 'root@spm2'; 
     Query OK, 0 rows affected (0.00 sec) 

     mysql> 

・create the munge.key

dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.key 

     
 # 
 # ls -l 
    total 8 

  -r-------- 1 munge root 1024 Jul 14 14:10 munge.key 


Best Regards..
Toru Matsuoka

Cray Japan Inc..
Comment 1 Moe Jette 2014-07-14 02:57:02 MDT
You need to upgrade slurmdbd before any of the other Slurm commands or daemons. Did you upgrade slurmdbd to version 2.6.5 before upgrading Slurm on your cluster? If not, please do that.

If that is not the problem, we would need to see the slurmdbd log file (see configured value of LogFile in the slurmdbd.conf file).

1. Set the configured value of DebugLevel=debug2 in slurmdbd.conf (clear this after it is working).
2. Restart slurmdbd
3. Try running sacctmgr on the same node as slurmdbd runs on using verbose mode (with "-vvvv" on the execute line like this):
$ sacctmgr -vvvv
sacctmgr: debug3: Trying to load plugin /home/jette/SLURM/install_smd/lib/slurm/accounting_storage_slurmdbd.so
sacctmgr: Accounting storage SLURMDBD plugin loaded with AuthInfo=(null)
sacctmgr: debug3: Success.
sacctmgr: debug3: Trying to load plugin /home/jette/SLURM/install_smd/lib/slurm/auth_munge.so
sacctmgr: debug:  auth plugin for Munge (http://code.google.com/p/munge/) loaded
sacctmgr: debug3: Success.
sacctmgr: debug:  slurmdbd: Sent DbdInit msg
sacctmgr: exit
sacctmgr: debug:  slurmdbd: Sent fini msg

4. If that works, try running sacctmgr on your login node
5. Send your sacctmgr output and slurmdbd log files as attachments.
Comment 2 toru matsuoka 2014-07-14 21:26:33 MDT
Created attachment 1062 [details]
slurmdbd.conf file
Comment 3 toru matsuoka 2014-07-14 21:34:37 MDT
Hello,

Thanks for quick response.

Forst , I done Slurmdbd 2.6.5 version up.

As a Result, the trouble was not resolved.

If there is the method of solving "connection refused", please let me know. 

slurmdbd.conf and slurmdbd.log are indicated below. 

[root@spm2 slurm]# sacctmgr -vvvv
sacctmgr: debug:  Reading slurm.conf file: /etc/slurm/slurm.conf
sacctmgr: debug3: Trying to load plugin /usr/lib64/slurm/accounting_storage_slurmdbd.so
sacctmgr: Accounting storage SLURMDBD plugin loaded with AuthInfo=(null)
sacctmgr: debug3: Success.
sacctmgr: debug2: _slurm_connect failed: Connection refused
sacctmgr: debug2: Error connecting slurm stream socket at 10.81.1.35:6819: Connection refused
sacctmgr: debug:  slurmdbd: slurm_open_msg_conn to spm2:6819: Connection refused
sacctmgr: debug2: _slurm_connect failed: Connection refused
sacctmgr: debug2: Error connecting slurm stream socket at 10.81.1.14:6819: Connection refused
sacctmgr: debug:  slurmdbd: slurm_open_msg_conn to hedwig:6819: Connection refused
sacctmgr: error: Problem talking to the database: Connection refused

[root@spm2 slurm]# sacctmgr
sacctmgr: error: Problem talking to the database: Connection refused

[root@spm2 slurm]# sacct
sacct: error: Problem talking to the database: Connection refused

[root@spm2 slurm]# sreport
sreport: error: Problem talking to the database: Connection refused

[root@spm2 slurm]# tail -30 /etc/slurm/var/log/slurmdbd.log
[2014-07-14T10:02:08.004] error: cannot create auth context for auth/munge
[2014-07-14T10:02:08.004] fatal: Unable to initialize auth/munge authentication plugin
[2014-07-14T11:09:04.554] error: Couldn't find the specified plugin name for auth/munge looking at all files
[2014-07-14T11:09:04.554] error: cannot find auth plugin for auth/munge
[2014-07-14T11:09:04.555] error: cannot create auth context for auth/munge
[2014-07-14T11:09:04.555] fatal: Unable to initialize auth/munge authentication plugin
[2014-07-14T11:26:05.580] error: Couldn't find the specified plugin name for auth/munge looking at all files
[2014-07-14T11:26:05.580] error: cannot find auth plugin for auth/munge
[2014-07-14T11:26:05.580] error: cannot create auth context for auth/munge
[2014-07-14T11:26:05.580] fatal: Unable to initialize auth/munge authentication plugin
[2014-07-14T12:50:45.820] error: Couldn't find the specified plugin name for auth/munge looking at all files
[2014-07-14T12:50:45.820] error: cannot find auth plugin for auth/munge
[2014-07-14T12:50:45.820] error: cannot create auth context for auth/munge
[2014-07-14T12:50:45.820] fatal: Unable to initialize auth/munge authentication plugin
[2014-07-14T12:51:46.108] error: Couldn't find the specified plugin name for auth/munge looking at all files
[2014-07-14T12:51:46.108] error: cannot find auth plugin for auth/munge
[2014-07-14T12:51:46.108] error: cannot create auth context for auth/munge
[2014-07-14T12:51:46.108] fatal: Unable to initialize auth/munge authentication plugin
[2014-07-14T15:27:25.363] error: Couldn't find the specified plugin name for auth/munge looking at all files
[2014-07-14T15:27:25.363] error: cannot find auth plugin for auth/munge
[2014-07-14T15:27:25.363] error: cannot create auth context for auth/munge
[2014-07-14T15:27:25.363] fatal: Unable to initialize auth/munge authentication plugin
[2014-07-15T10:17:28.292] error: Couldn't find the specified plugin name for auth/munge looking at all files
[2014-07-15T10:17:28.292] error: cannot find auth plugin for auth/munge
[2014-07-15T10:17:28.292] error: cannot create auth context for auth/munge
[2014-07-15T10:17:28.292] fatal: Unable to initialize auth/munge authentication plugin
[2014-07-15T11:43:54.669] error: Couldn't find the specified plugin name for auth/munge looking at all files
[2014-07-15T11:43:54.669] error: cannot find auth plugin for auth/munge
[2014-07-15T11:43:54.669] error: cannot create auth context for auth/munge
[2014-07-15T11:43:54.669] fatal: Unable to initialize auth/munge authentication plugin

[root@spm2 slurm]# cd /etc/slurm
[root@spm2 slurm]# cat slurmdbd.conf
#
# Sample /etc/slurmdbd.conf
#
ArchiveEvents=yes
ArchiveJobs=yes
#ArchiveResv=yes
ArchiveSteps=no
ArchiveSuspend=no
#ArchiveScript=/usr/sbin/slurm.dbd.archive
AuthInfo=/var/run/munge/munge.socket.2
AuthType=auth/munge
DbdHost=localhost
DebugLevel=2
###PurgeEventAfter=1month
###PurgeJobAfter=12month
###PurgeResvAfter=1month
###PurgeStepAfter=1month
###PurgeSuspendAfter=1month
LogFile=/etc/slurm/var/log/slurmdbd.log
PidFile=/etc/slurm/var/log/slurmdbd.pid
PluginDir=/etc/slurm/default/lib/slurm
SlurmUser=root
#StoragePass=
StorageType=accounting_storage/mysql
#StorageUser=slurm
#StoragePass=initial0

Best Regards..
Toru Matsuoka
Comment 4 Danny Auble 2014-07-15 01:02:19 MDT
It appears you are missing the munge development libraries in the slurmdbd node.  Install then and recompile.  Is this node a new node or where you just upgrading?  It seems to be a new node. 

On July 15, 2014 2:34:37 AM PDT, bugs@schedmd.com wrote:
>http://bugs.schedmd.com/show_bug.cgi?id=959
>
>--- Comment #3 from toru matsuoka <tmatsuoka@cray.com> ---
>Hello,
>
>Thanks for quick response.
>
>Forst , I done Slurmdbd 2.6.5 version up.
>
>As a Result, the trouble was not resolved.
>
>If there is the method of solving "connection refused", please let me
>know. 
>
>slurmdbd.conf and slurmdbd.log are indicated below. 
>
>[root@spm2 slurm]# sacctmgr -vvvv
>sacctmgr: debug:  Reading slurm.conf file: /etc/slurm/slurm.conf
>sacctmgr: debug3: Trying to load plugin
>/usr/lib64/slurm/accounting_storage_slurmdbd.so
>sacctmgr: Accounting storage SLURMDBD plugin loaded with
>AuthInfo=(null)
>sacctmgr: debug3: Success.
>sacctmgr: debug2: _slurm_connect failed: Connection refused
>sacctmgr: debug2: Error connecting slurm stream socket at
>10.81.1.35:6819:
>Connection refused
>sacctmgr: debug:  slurmdbd: slurm_open_msg_conn to spm2:6819:
>Connection
>refused
>sacctmgr: debug2: _slurm_connect failed: Connection refused
>sacctmgr: debug2: Error connecting slurm stream socket at
>10.81.1.14:6819:
>Connection refused
>sacctmgr: debug:  slurmdbd: slurm_open_msg_conn to hedwig:6819:
>Connection
>refused
>sacctmgr: error: Problem talking to the database: Connection refused
>
>[root@spm2 slurm]# sacctmgr
>sacctmgr: error: Problem talking to the database: Connection refused
>
>[root@spm2 slurm]# sacct
>sacct: error: Problem talking to the database: Connection refused
>
>[root@spm2 slurm]# sreport
>sreport: error: Problem talking to the database: Connection refused
>
>[root@spm2 slurm]# tail -30 /etc/slurm/var/log/slurmdbd.log
>[2014-07-14T10:02:08.004] error: cannot create auth context for
>auth/munge
>[2014-07-14T10:02:08.004] fatal: Unable to initialize auth/munge
>authentication
>plugin
>[2014-07-14T11:09:04.554] error: Couldn't find the specified plugin
>name for
>auth/munge looking at all files
>[2014-07-14T11:09:04.554] error: cannot find auth plugin for auth/munge
>[2014-07-14T11:09:04.555] error: cannot create auth context for
>auth/munge
>[2014-07-14T11:09:04.555] fatal: Unable to initialize auth/munge
>authentication
>plugin
>[2014-07-14T11:26:05.580] error: Couldn't find the specified plugin
>name for
>auth/munge looking at all files
>[2014-07-14T11:26:05.580] error: cannot find auth plugin for auth/munge
>[2014-07-14T11:26:05.580] error: cannot create auth context for
>auth/munge
>[2014-07-14T11:26:05.580] fatal: Unable to initialize auth/munge
>authentication
>plugin
>[2014-07-14T12:50:45.820] error: Couldn't find the specified plugin
>name for
>auth/munge looking at all files
>[2014-07-14T12:50:45.820] error: cannot find auth plugin for auth/munge
>[2014-07-14T12:50:45.820] error: cannot create auth context for
>auth/munge
>[2014-07-14T12:50:45.820] fatal: Unable to initialize auth/munge
>authentication
>plugin
>[2014-07-14T12:51:46.108] error: Couldn't find the specified plugin
>name for
>auth/munge looking at all files
>[2014-07-14T12:51:46.108] error: cannot find auth plugin for auth/munge
>[2014-07-14T12:51:46.108] error: cannot create auth context for
>auth/munge
>[2014-07-14T12:51:46.108] fatal: Unable to initialize auth/munge
>authentication
>plugin
>[2014-07-14T15:27:25.363] error: Couldn't find the specified plugin
>name for
>auth/munge looking at all files
>[2014-07-14T15:27:25.363] error: cannot find auth plugin for auth/munge
>[2014-07-14T15:27:25.363] error: cannot create auth context for
>auth/munge
>[2014-07-14T15:27:25.363] fatal: Unable to initialize auth/munge
>authentication
>plugin
>[2014-07-15T10:17:28.292] error: Couldn't find the specified plugin
>name for
>auth/munge looking at all files
>[2014-07-15T10:17:28.292] error: cannot find auth plugin for auth/munge
>[2014-07-15T10:17:28.292] error: cannot create auth context for
>auth/munge
>[2014-07-15T10:17:28.292] fatal: Unable to initialize auth/munge
>authentication
>plugin
>[2014-07-15T11:43:54.669] error: Couldn't find the specified plugin
>name for
>auth/munge looking at all files
>[2014-07-15T11:43:54.669] error: cannot find auth plugin for auth/munge
>[2014-07-15T11:43:54.669] error: cannot create auth context for
>auth/munge
>[2014-07-15T11:43:54.669] fatal: Unable to initialize auth/munge
>authentication
>plugin
>
>[root@spm2 slurm]# cd /etc/slurm
>[root@spm2 slurm]# cat slurmdbd.conf
>#
># Sample /etc/slurmdbd.conf
>#
>ArchiveEvents=yes
>ArchiveJobs=yes
>#ArchiveResv=yes
>ArchiveSteps=no
>ArchiveSuspend=no
>#ArchiveScript=/usr/sbin/slurm.dbd.archive
>AuthInfo=/var/run/munge/munge.socket.2
>AuthType=auth/munge
>DbdHost=localhost
>DebugLevel=2
>###PurgeEventAfter=1month
>###PurgeJobAfter=12month
>###PurgeResvAfter=1month
>###PurgeStepAfter=1month
>###PurgeSuspendAfter=1month
>LogFile=/etc/slurm/var/log/slurmdbd.log
>PidFile=/etc/slurm/var/log/slurmdbd.pid
>PluginDir=/etc/slurm/default/lib/slurm
>SlurmUser=root
>#StoragePass=
>StorageType=accounting_storage/mysql
>#StorageUser=slurm
>#StoragePass=initial0
>
>Best Regards..
>Toru Matsuoka
>
>-- 
>You are receiving this mail because:
>You are on the CC list for the bug.
Comment 5 Danny Auble 2014-07-15 01:02:27 MDT
Created attachment 1063 [details]
attachment-10287-0.html
Comment 6 toru matsuoka 2014-07-15 20:47:08 MDT
I done munge package install at following methods. 

1. # rpmbuild -tb --clean munge-0.5.11.tar.bz2

2. # rpm -ivh munge-debuginfo-0.5.11-1.el6.x86_64.rpm slurm-munge-2.6.5-1.el6.x86_64.rpm munge-0.5.11-1.el6.x86_64.rpm munge-libs-0.5.11-1.el6.x86_64.rpm munge-devel-0.5.11-1.el6.x86_64.rpm

3. # rpm -qa | grep munge

munge-libs-0.5.11-1.el6.x86_64
munge-devel-0.5.11-1.el6.x86_64
slurm-munge-2.6.5-1.el6.x86_64
munge-0.5.11-1.el6.x86_64
munge-debuginfo-0.5.11-1.el6.x86_64

4. # chkconfig munge on

   # chkconfig --list | grep munge
     munge           0:off   1:off   2:on    3:on    4:on    5:on    6:off

5. # /etc/init.d/munge restart

   # ps -ef | grep munge

     munge    19176     1  0 15:46 ?        00:00:00 /usr/sbin/munged
     root     19891 17319  0 17:22 pts/0    00:00:00 grep munge

6. # dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.key  
 
7. # chown daemon.root /etc/munge/munge.key

   # ls -l 
      total 8 
     -r-------- 1 daemon root 1024 Jun 16 17:40 munge.key 

8. # /etc/init.d/munge start 

     Starting MUNGE: munged                                     [  OK  ] 


・But, This Trouble was not resolved.

[root@spm2 ~]# sacctmgr -vvvv
sacctmgr: debug:  Reading slurm.conf file: /etc/slurm/slurm.conf
sacctmgr: debug3: Trying to load plugin /usr/lib64/slurm/accounting_storage_slurmdbd.so
sacctmgr: Accounting storage SLURMDBD plugin loaded with AuthInfo=(null)
sacctmgr: debug3: Success.
sacctmgr: debug2: _slurm_connect failed: Connection refused
sacctmgr: debug2: Error connecting slurm stream socket at 10.81.1.35:6819: Connection refused
sacctmgr: debug:  slurmdbd: slurm_open_msg_conn to spm2:6819: Connection refused
sacctmgr: debug2: _slurm_connect failed: Connection refused
sacctmgr: debug2: Error connecting slurm stream socket at 10.81.1.14:6819: Connection refused
sacctmgr: debug:  slurmdbd: slurm_open_msg_conn to hedwig:6819: Connection refused
sacctmgr: error: Problem talking to the database: Connection refused

[root@spm2 ~]# sreport
sreport: error: Problem talking to the database: Connection refused

Is it acceptable for you?

Best Regards..
Toru Matsuoka
Comment 7 Danny Auble 2014-07-16 00:38:34 MDT
Now you need to rebuild Slurm after you run configure on it. 

On July 16, 2014 1:47:08 AM PDT, bugs@schedmd.com wrote:
>http://bugs.schedmd.com/show_bug.cgi?id=959
>
>--- Comment #6 from toru matsuoka <tmatsuoka@cray.com> ---
>I done munge package install at following methods. 
>
>1. # rpmbuild -tb --clean munge-0.5.11.tar.bz2
>
>2. # rpm -ivh munge-debuginfo-0.5.11-1.el6.x86_64.rpm
>slurm-munge-2.6.5-1.el6.x86_64.rpm munge-0.5.11-1.el6.x86_64.rpm
>munge-libs-0.5.11-1.el6.x86_64.rpm munge-devel-0.5.11-1.el6.x86_64.rpm
>
>3. # rpm -qa | grep munge
>
>munge-libs-0.5.11-1.el6.x86_64
>munge-devel-0.5.11-1.el6.x86_64
>slurm-munge-2.6.5-1.el6.x86_64
>munge-0.5.11-1.el6.x86_64
>munge-debuginfo-0.5.11-1.el6.x86_64
>
>4. # chkconfig munge on
>
>   # chkconfig --list | grep munge
>  munge           0:off   1:off   2:on    3:on    4:on    5:on    6:off
>
>5. # /etc/init.d/munge restart
>
>   # ps -ef | grep munge
>
>     munge    19176     1  0 15:46 ?        00:00:00 /usr/sbin/munged
>     root     19891 17319  0 17:22 pts/0    00:00:00 grep munge
>
>6. # dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.key  
>
>7. # chown daemon.root /etc/munge/munge.key
>
>   # ls -l 
>      total 8 
>     -r-------- 1 daemon root 1024 Jun 16 17:40 munge.key 
>
>8. # /etc/init.d/munge start 
>
>   Starting MUNGE: munged                                     [  OK  ] 
>
>
>���But, This Trouble was not resolved.
>
>[root@spm2 ~]# sacctmgr -vvvv
>sacctmgr: debug:  Reading slurm.conf file: /etc/slurm/slurm.conf
>sacctmgr: debug3: Trying to load plugin
>/usr/lib64/slurm/accounting_storage_slurmdbd.so
>sacctmgr: Accounting storage SLURMDBD plugin loaded with
>AuthInfo=(null)
>sacctmgr: debug3: Success.
>sacctmgr: debug2: _slurm_connect failed: Connection refused
>sacctmgr: debug2: Error connecting slurm stream socket at
>10.81.1.35:6819:
>Connection refused
>sacctmgr: debug:  slurmdbd: slurm_open_msg_conn to spm2:6819:
>Connection
>refused
>sacctmgr: debug2: _slurm_connect failed: Connection refused
>sacctmgr: debug2: Error connecting slurm stream socket at
>10.81.1.14:6819:
>Connection refused
>sacctmgr: debug:  slurmdbd: slurm_open_msg_conn to hedwig:6819:
>Connection
>refused
>sacctmgr: error: Problem talking to the database: Connection refused
>
>[root@spm2 ~]# sreport
>sreport: error: Problem talking to the database: Connection refused
>
>Is it acceptable for you?
>
>Best Regards..
>Toru Matsuoka
>
>-- 
>You are receiving this mail because:
>You are on the CC list for the bug.
Comment 8 Danny Auble 2014-07-16 00:38:41 MDT
Created attachment 1064 [details]
attachment-10554-0.html
Comment 9 toru matsuoka 2014-07-16 21:38:39 MDT
Thanks a support.

sorry, please let me know the general sample of slurmdbd.conf.
 
Moreover, if there is how to clear the contents of slurmdb once, 
please let me know.
Comment 10 Moe Jette 2014-07-24 05:49:57 MDT
(In reply to toru matsuoka from comment #9)
> Thanks a support.
> 
> sorry, please let me know the general sample of slurmdbd.conf.

See "man slurmdbd.conf"

> Moreover, if there is how to clear the contents of slurmdb once, 
> please let me know.

We would strongly recommend that you only use the sacctmgr command to alter the contents of slurmdb.