| Summary: | Slurmdbd can not startup after slurm version up (2.5.4 →2.6.5) at CRAY Internal Machine. | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | toru matsuoka <tmatsuoka> |
| Component: | Accounting | Assignee: | David Bigagli <david> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | CC: | da |
| Version: | 2.6.5 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | CRAY | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CRAY CS300 | CLE Version: | |
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: |
slurmdb.conf file and slurm.conf
slurmdbd.conf file attachment-10287-0.html attachment-10554-0.html |
||
|
Description
toru matsuoka
2014-07-13 21:14:14 MDT
You need to upgrade slurmdbd before any of the other Slurm commands or daemons. Did you upgrade slurmdbd to version 2.6.5 before upgrading Slurm on your cluster? If not, please do that. If that is not the problem, we would need to see the slurmdbd log file (see configured value of LogFile in the slurmdbd.conf file). 1. Set the configured value of DebugLevel=debug2 in slurmdbd.conf (clear this after it is working). 2. Restart slurmdbd 3. Try running sacctmgr on the same node as slurmdbd runs on using verbose mode (with "-vvvv" on the execute line like this): $ sacctmgr -vvvv sacctmgr: debug3: Trying to load plugin /home/jette/SLURM/install_smd/lib/slurm/accounting_storage_slurmdbd.so sacctmgr: Accounting storage SLURMDBD plugin loaded with AuthInfo=(null) sacctmgr: debug3: Success. sacctmgr: debug3: Trying to load plugin /home/jette/SLURM/install_smd/lib/slurm/auth_munge.so sacctmgr: debug: auth plugin for Munge (http://code.google.com/p/munge/) loaded sacctmgr: debug3: Success. sacctmgr: debug: slurmdbd: Sent DbdInit msg sacctmgr: exit sacctmgr: debug: slurmdbd: Sent fini msg 4. If that works, try running sacctmgr on your login node 5. Send your sacctmgr output and slurmdbd log files as attachments. Created attachment 1062 [details]
slurmdbd.conf file
Hello, Thanks for quick response. Forst , I done Slurmdbd 2.6.5 version up. As a Result, the trouble was not resolved. If there is the method of solving "connection refused", please let me know. slurmdbd.conf and slurmdbd.log are indicated below. [root@spm2 slurm]# sacctmgr -vvvv sacctmgr: debug: Reading slurm.conf file: /etc/slurm/slurm.conf sacctmgr: debug3: Trying to load plugin /usr/lib64/slurm/accounting_storage_slurmdbd.so sacctmgr: Accounting storage SLURMDBD plugin loaded with AuthInfo=(null) sacctmgr: debug3: Success. sacctmgr: debug2: _slurm_connect failed: Connection refused sacctmgr: debug2: Error connecting slurm stream socket at 10.81.1.35:6819: Connection refused sacctmgr: debug: slurmdbd: slurm_open_msg_conn to spm2:6819: Connection refused sacctmgr: debug2: _slurm_connect failed: Connection refused sacctmgr: debug2: Error connecting slurm stream socket at 10.81.1.14:6819: Connection refused sacctmgr: debug: slurmdbd: slurm_open_msg_conn to hedwig:6819: Connection refused sacctmgr: error: Problem talking to the database: Connection refused [root@spm2 slurm]# sacctmgr sacctmgr: error: Problem talking to the database: Connection refused [root@spm2 slurm]# sacct sacct: error: Problem talking to the database: Connection refused [root@spm2 slurm]# sreport sreport: error: Problem talking to the database: Connection refused [root@spm2 slurm]# tail -30 /etc/slurm/var/log/slurmdbd.log [2014-07-14T10:02:08.004] error: cannot create auth context for auth/munge [2014-07-14T10:02:08.004] fatal: Unable to initialize auth/munge authentication plugin [2014-07-14T11:09:04.554] error: Couldn't find the specified plugin name for auth/munge looking at all files [2014-07-14T11:09:04.554] error: cannot find auth plugin for auth/munge [2014-07-14T11:09:04.555] error: cannot create auth context for auth/munge [2014-07-14T11:09:04.555] fatal: Unable to initialize auth/munge authentication plugin [2014-07-14T11:26:05.580] error: Couldn't find the specified plugin name for auth/munge looking at all files [2014-07-14T11:26:05.580] error: cannot find auth plugin for auth/munge [2014-07-14T11:26:05.580] error: cannot create auth context for auth/munge [2014-07-14T11:26:05.580] fatal: Unable to initialize auth/munge authentication plugin [2014-07-14T12:50:45.820] error: Couldn't find the specified plugin name for auth/munge looking at all files [2014-07-14T12:50:45.820] error: cannot find auth plugin for auth/munge [2014-07-14T12:50:45.820] error: cannot create auth context for auth/munge [2014-07-14T12:50:45.820] fatal: Unable to initialize auth/munge authentication plugin [2014-07-14T12:51:46.108] error: Couldn't find the specified plugin name for auth/munge looking at all files [2014-07-14T12:51:46.108] error: cannot find auth plugin for auth/munge [2014-07-14T12:51:46.108] error: cannot create auth context for auth/munge [2014-07-14T12:51:46.108] fatal: Unable to initialize auth/munge authentication plugin [2014-07-14T15:27:25.363] error: Couldn't find the specified plugin name for auth/munge looking at all files [2014-07-14T15:27:25.363] error: cannot find auth plugin for auth/munge [2014-07-14T15:27:25.363] error: cannot create auth context for auth/munge [2014-07-14T15:27:25.363] fatal: Unable to initialize auth/munge authentication plugin [2014-07-15T10:17:28.292] error: Couldn't find the specified plugin name for auth/munge looking at all files [2014-07-15T10:17:28.292] error: cannot find auth plugin for auth/munge [2014-07-15T10:17:28.292] error: cannot create auth context for auth/munge [2014-07-15T10:17:28.292] fatal: Unable to initialize auth/munge authentication plugin [2014-07-15T11:43:54.669] error: Couldn't find the specified plugin name for auth/munge looking at all files [2014-07-15T11:43:54.669] error: cannot find auth plugin for auth/munge [2014-07-15T11:43:54.669] error: cannot create auth context for auth/munge [2014-07-15T11:43:54.669] fatal: Unable to initialize auth/munge authentication plugin [root@spm2 slurm]# cd /etc/slurm [root@spm2 slurm]# cat slurmdbd.conf # # Sample /etc/slurmdbd.conf # ArchiveEvents=yes ArchiveJobs=yes #ArchiveResv=yes ArchiveSteps=no ArchiveSuspend=no #ArchiveScript=/usr/sbin/slurm.dbd.archive AuthInfo=/var/run/munge/munge.socket.2 AuthType=auth/munge DbdHost=localhost DebugLevel=2 ###PurgeEventAfter=1month ###PurgeJobAfter=12month ###PurgeResvAfter=1month ###PurgeStepAfter=1month ###PurgeSuspendAfter=1month LogFile=/etc/slurm/var/log/slurmdbd.log PidFile=/etc/slurm/var/log/slurmdbd.pid PluginDir=/etc/slurm/default/lib/slurm SlurmUser=root #StoragePass= StorageType=accounting_storage/mysql #StorageUser=slurm #StoragePass=initial0 Best Regards.. Toru Matsuoka It appears you are missing the munge development libraries in the slurmdbd node. Install then and recompile. Is this node a new node or where you just upgrading? It seems to be a new node. On July 15, 2014 2:34:37 AM PDT, bugs@schedmd.com wrote: >http://bugs.schedmd.com/show_bug.cgi?id=959 > >--- Comment #3 from toru matsuoka <tmatsuoka@cray.com> --- >Hello, > >Thanks for quick response. > >Forst , I done Slurmdbd 2.6.5 version up. > >As a Result, the trouble was not resolved. > >If there is the method of solving "connection refused", please let me >know. > >slurmdbd.conf and slurmdbd.log are indicated below. > >[root@spm2 slurm]# sacctmgr -vvvv >sacctmgr: debug: Reading slurm.conf file: /etc/slurm/slurm.conf >sacctmgr: debug3: Trying to load plugin >/usr/lib64/slurm/accounting_storage_slurmdbd.so >sacctmgr: Accounting storage SLURMDBD plugin loaded with >AuthInfo=(null) >sacctmgr: debug3: Success. >sacctmgr: debug2: _slurm_connect failed: Connection refused >sacctmgr: debug2: Error connecting slurm stream socket at >10.81.1.35:6819: >Connection refused >sacctmgr: debug: slurmdbd: slurm_open_msg_conn to spm2:6819: >Connection >refused >sacctmgr: debug2: _slurm_connect failed: Connection refused >sacctmgr: debug2: Error connecting slurm stream socket at >10.81.1.14:6819: >Connection refused >sacctmgr: debug: slurmdbd: slurm_open_msg_conn to hedwig:6819: >Connection >refused >sacctmgr: error: Problem talking to the database: Connection refused > >[root@spm2 slurm]# sacctmgr >sacctmgr: error: Problem talking to the database: Connection refused > >[root@spm2 slurm]# sacct >sacct: error: Problem talking to the database: Connection refused > >[root@spm2 slurm]# sreport >sreport: error: Problem talking to the database: Connection refused > >[root@spm2 slurm]# tail -30 /etc/slurm/var/log/slurmdbd.log >[2014-07-14T10:02:08.004] error: cannot create auth context for >auth/munge >[2014-07-14T10:02:08.004] fatal: Unable to initialize auth/munge >authentication >plugin >[2014-07-14T11:09:04.554] error: Couldn't find the specified plugin >name for >auth/munge looking at all files >[2014-07-14T11:09:04.554] error: cannot find auth plugin for auth/munge >[2014-07-14T11:09:04.555] error: cannot create auth context for >auth/munge >[2014-07-14T11:09:04.555] fatal: Unable to initialize auth/munge >authentication >plugin >[2014-07-14T11:26:05.580] error: Couldn't find the specified plugin >name for >auth/munge looking at all files >[2014-07-14T11:26:05.580] error: cannot find auth plugin for auth/munge >[2014-07-14T11:26:05.580] error: cannot create auth context for >auth/munge >[2014-07-14T11:26:05.580] fatal: Unable to initialize auth/munge >authentication >plugin >[2014-07-14T12:50:45.820] error: Couldn't find the specified plugin >name for >auth/munge looking at all files >[2014-07-14T12:50:45.820] error: cannot find auth plugin for auth/munge >[2014-07-14T12:50:45.820] error: cannot create auth context for >auth/munge >[2014-07-14T12:50:45.820] fatal: Unable to initialize auth/munge >authentication >plugin >[2014-07-14T12:51:46.108] error: Couldn't find the specified plugin >name for >auth/munge looking at all files >[2014-07-14T12:51:46.108] error: cannot find auth plugin for auth/munge >[2014-07-14T12:51:46.108] error: cannot create auth context for >auth/munge >[2014-07-14T12:51:46.108] fatal: Unable to initialize auth/munge >authentication >plugin >[2014-07-14T15:27:25.363] error: Couldn't find the specified plugin >name for >auth/munge looking at all files >[2014-07-14T15:27:25.363] error: cannot find auth plugin for auth/munge >[2014-07-14T15:27:25.363] error: cannot create auth context for >auth/munge >[2014-07-14T15:27:25.363] fatal: Unable to initialize auth/munge >authentication >plugin >[2014-07-15T10:17:28.292] error: Couldn't find the specified plugin >name for >auth/munge looking at all files >[2014-07-15T10:17:28.292] error: cannot find auth plugin for auth/munge >[2014-07-15T10:17:28.292] error: cannot create auth context for >auth/munge >[2014-07-15T10:17:28.292] fatal: Unable to initialize auth/munge >authentication >plugin >[2014-07-15T11:43:54.669] error: Couldn't find the specified plugin >name for >auth/munge looking at all files >[2014-07-15T11:43:54.669] error: cannot find auth plugin for auth/munge >[2014-07-15T11:43:54.669] error: cannot create auth context for >auth/munge >[2014-07-15T11:43:54.669] fatal: Unable to initialize auth/munge >authentication >plugin > >[root@spm2 slurm]# cd /etc/slurm >[root@spm2 slurm]# cat slurmdbd.conf ># ># Sample /etc/slurmdbd.conf ># >ArchiveEvents=yes >ArchiveJobs=yes >#ArchiveResv=yes >ArchiveSteps=no >ArchiveSuspend=no >#ArchiveScript=/usr/sbin/slurm.dbd.archive >AuthInfo=/var/run/munge/munge.socket.2 >AuthType=auth/munge >DbdHost=localhost >DebugLevel=2 >###PurgeEventAfter=1month >###PurgeJobAfter=12month >###PurgeResvAfter=1month >###PurgeStepAfter=1month >###PurgeSuspendAfter=1month >LogFile=/etc/slurm/var/log/slurmdbd.log >PidFile=/etc/slurm/var/log/slurmdbd.pid >PluginDir=/etc/slurm/default/lib/slurm >SlurmUser=root >#StoragePass= >StorageType=accounting_storage/mysql >#StorageUser=slurm >#StoragePass=initial0 > >Best Regards.. >Toru Matsuoka > >-- >You are receiving this mail because: >You are on the CC list for the bug. Created attachment 1063 [details]
attachment-10287-0.html
I done munge package install at following methods.
1. # rpmbuild -tb --clean munge-0.5.11.tar.bz2
2. # rpm -ivh munge-debuginfo-0.5.11-1.el6.x86_64.rpm slurm-munge-2.6.5-1.el6.x86_64.rpm munge-0.5.11-1.el6.x86_64.rpm munge-libs-0.5.11-1.el6.x86_64.rpm munge-devel-0.5.11-1.el6.x86_64.rpm
3. # rpm -qa | grep munge
munge-libs-0.5.11-1.el6.x86_64
munge-devel-0.5.11-1.el6.x86_64
slurm-munge-2.6.5-1.el6.x86_64
munge-0.5.11-1.el6.x86_64
munge-debuginfo-0.5.11-1.el6.x86_64
4. # chkconfig munge on
# chkconfig --list | grep munge
munge 0:off 1:off 2:on 3:on 4:on 5:on 6:off
5. # /etc/init.d/munge restart
# ps -ef | grep munge
munge 19176 1 0 15:46 ? 00:00:00 /usr/sbin/munged
root 19891 17319 0 17:22 pts/0 00:00:00 grep munge
6. # dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.key
7. # chown daemon.root /etc/munge/munge.key
# ls -l
total 8
-r-------- 1 daemon root 1024 Jun 16 17:40 munge.key
8. # /etc/init.d/munge start
Starting MUNGE: munged [ OK ]
・But, This Trouble was not resolved.
[root@spm2 ~]# sacctmgr -vvvv
sacctmgr: debug: Reading slurm.conf file: /etc/slurm/slurm.conf
sacctmgr: debug3: Trying to load plugin /usr/lib64/slurm/accounting_storage_slurmdbd.so
sacctmgr: Accounting storage SLURMDBD plugin loaded with AuthInfo=(null)
sacctmgr: debug3: Success.
sacctmgr: debug2: _slurm_connect failed: Connection refused
sacctmgr: debug2: Error connecting slurm stream socket at 10.81.1.35:6819: Connection refused
sacctmgr: debug: slurmdbd: slurm_open_msg_conn to spm2:6819: Connection refused
sacctmgr: debug2: _slurm_connect failed: Connection refused
sacctmgr: debug2: Error connecting slurm stream socket at 10.81.1.14:6819: Connection refused
sacctmgr: debug: slurmdbd: slurm_open_msg_conn to hedwig:6819: Connection refused
sacctmgr: error: Problem talking to the database: Connection refused
[root@spm2 ~]# sreport
sreport: error: Problem talking to the database: Connection refused
Is it acceptable for you?
Best Regards..
Toru Matsuoka
Now you need to rebuild Slurm after you run configure on it. On July 16, 2014 1:47:08 AM PDT, bugs@schedmd.com wrote: >http://bugs.schedmd.com/show_bug.cgi?id=959 > >--- Comment #6 from toru matsuoka <tmatsuoka@cray.com> --- >I done munge package install at following methods. > >1. # rpmbuild -tb --clean munge-0.5.11.tar.bz2 > >2. # rpm -ivh munge-debuginfo-0.5.11-1.el6.x86_64.rpm >slurm-munge-2.6.5-1.el6.x86_64.rpm munge-0.5.11-1.el6.x86_64.rpm >munge-libs-0.5.11-1.el6.x86_64.rpm munge-devel-0.5.11-1.el6.x86_64.rpm > >3. # rpm -qa | grep munge > >munge-libs-0.5.11-1.el6.x86_64 >munge-devel-0.5.11-1.el6.x86_64 >slurm-munge-2.6.5-1.el6.x86_64 >munge-0.5.11-1.el6.x86_64 >munge-debuginfo-0.5.11-1.el6.x86_64 > >4. # chkconfig munge on > > # chkconfig --list | grep munge > munge 0:off 1:off 2:on 3:on 4:on 5:on 6:off > >5. # /etc/init.d/munge restart > > # ps -ef | grep munge > > munge 19176 1 0 15:46 ? 00:00:00 /usr/sbin/munged > root 19891 17319 0 17:22 pts/0 00:00:00 grep munge > >6. # dd if=/dev/random bs=1 count=1024 > /etc/munge/munge.key > >7. # chown daemon.root /etc/munge/munge.key > > # ls -l > total 8 > -r-------- 1 daemon root 1024 Jun 16 17:40 munge.key > >8. # /etc/init.d/munge start > > Starting MUNGE: munged [ OK ] > > >���But, This Trouble was not resolved. > >[root@spm2 ~]# sacctmgr -vvvv >sacctmgr: debug: Reading slurm.conf file: /etc/slurm/slurm.conf >sacctmgr: debug3: Trying to load plugin >/usr/lib64/slurm/accounting_storage_slurmdbd.so >sacctmgr: Accounting storage SLURMDBD plugin loaded with >AuthInfo=(null) >sacctmgr: debug3: Success. >sacctmgr: debug2: _slurm_connect failed: Connection refused >sacctmgr: debug2: Error connecting slurm stream socket at >10.81.1.35:6819: >Connection refused >sacctmgr: debug: slurmdbd: slurm_open_msg_conn to spm2:6819: >Connection >refused >sacctmgr: debug2: _slurm_connect failed: Connection refused >sacctmgr: debug2: Error connecting slurm stream socket at >10.81.1.14:6819: >Connection refused >sacctmgr: debug: slurmdbd: slurm_open_msg_conn to hedwig:6819: >Connection >refused >sacctmgr: error: Problem talking to the database: Connection refused > >[root@spm2 ~]# sreport >sreport: error: Problem talking to the database: Connection refused > >Is it acceptable for you? > >Best Regards.. >Toru Matsuoka > >-- >You are receiving this mail because: >You are on the CC list for the bug. Created attachment 1064 [details]
attachment-10554-0.html
Thanks a support. sorry, please let me know the general sample of slurmdbd.conf. Moreover, if there is how to clear the contents of slurmdb once, please let me know. (In reply to toru matsuoka from comment #9) > Thanks a support. > > sorry, please let me know the general sample of slurmdbd.conf. See "man slurmdbd.conf" > Moreover, if there is how to clear the contents of slurmdb once, > please let me know. We would strongly recommend that you only use the sacctmgr command to alter the contents of slurmdb. |