Ticket 5282

Summary: Slurmctld.service not getting started on centos-7.4
Product: Slurm Reporter: Dheeraj <dheeraj.kv>
Component: slurmctldAssignee: Jacob Jenson <jacob>
Status: RESOLVED FIXED QA Contact:
Severity: 6 - No support contract    
Priority: ---    
Version: 17.11.7   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 17.11.7 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Dheeraj 2018-06-08 00:42:29 MDT
Hi

I have build rpms of SLURM-17.10 and installed the rpms on head and compute nodes. When I start the slurmctld service, I get below error. slurmd dameon is getting started without any issue.
 
[root@master ~]# systemctl status slurmctld
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; vendor preset: disabled)
   Active: failed (Result: resources) since Fri 2018-06-08 12:04:07 IST; 6min ago
  Process: 3802 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 1235 (code=exited, status=1/FAILURE)

Jun 08 12:04:07 master.hcilab.local systemd[1]: Starting Slurm controller daemon...
Jun 08 12:04:07 master.hcilab.local systemd[1]: PID file /var/run/slurmctld.pid not readable (yet?) after start.
Jun 08 12:04:07 master.hcilab.local systemd[1]: slurmctld.service never wrote its PID file. Failing.
Jun 08 12:04:07 master.hcilab.local systemd[1]: Failed to start Slurm controller daemon.
Jun 08 12:04:07 master.hcilab.local systemd[1]: Unit slurmctld.service entered failed state.
Jun 08 12:04:07 master.hcilab.local systemd[1]: slurmctld.service failed.
Comment 1 Dheeraj 2018-06-10 22:52:31 MDT
It was my mistake. It seems to be a permission issue. I am closing it.