Ticket 5282 - Slurmctld.service not getting started on centos-7.4
Summary: Slurmctld.service not getting started on centos-7.4
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmctld (show other tickets)
Version: 17.11.7
Hardware: Linux Linux
: 6 - No support contract
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2018-06-08 00:42 MDT by Dheeraj
Modified: 2018-06-10 22:52 MDT (History)
0 users

See Also:
Site: -Other-
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 17.11.7
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Dheeraj 2018-06-08 00:42:29 MDT
Hi

I have build rpms of SLURM-17.10 and installed the rpms on head and compute nodes. When I start the slurmctld service, I get below error. slurmd dameon is getting started without any issue.
 
[root@master ~]# systemctl status slurmctld
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; vendor preset: disabled)
   Active: failed (Result: resources) since Fri 2018-06-08 12:04:07 IST; 6min ago
  Process: 3802 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 1235 (code=exited, status=1/FAILURE)

Jun 08 12:04:07 master.hcilab.local systemd[1]: Starting Slurm controller daemon...
Jun 08 12:04:07 master.hcilab.local systemd[1]: PID file /var/run/slurmctld.pid not readable (yet?) after start.
Jun 08 12:04:07 master.hcilab.local systemd[1]: slurmctld.service never wrote its PID file. Failing.
Jun 08 12:04:07 master.hcilab.local systemd[1]: Failed to start Slurm controller daemon.
Jun 08 12:04:07 master.hcilab.local systemd[1]: Unit slurmctld.service entered failed state.
Jun 08 12:04:07 master.hcilab.local systemd[1]: slurmctld.service failed.
Comment 1 Dheeraj 2018-06-10 22:52:31 MDT
It was my mistake. It seems to be a permission issue. I am closing it.