Ticket 1182

Summary:	systemctl start/stop does not work on RHEL 7
Product:	Slurm	Reporter:	Nancy <nancy.kritkausky>
Component:	slurmd	Assignee:	David Bigagli <david>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	3 - Medium Impact
Priority:	---	CC:	adam.huffman, brandon.barker, brian, da, david.gloe, doug.parisek, nancy.kritkausky, Ole.H.Nielsen, sven.sternberger, yiannis.georgiou
Version:	14.03.8
Hardware:	Linux
OS:	Linux
Site:	Universitat Dresden (Germany)	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---
Attachments:	Fix for described problem

Description Nancy 2014-10-17 04:26:27 MDT

On RedHat 7, when we try to start the slurmd on a compute node it does not work.  The slurmctld DOES work on the management node.  We think it may have something to do with cgroups, but can not confirm that.  Here is the information we get when we try to start it.  Also, included are the results of the journalctl -xn command and some information about cgroup.  The systemtld does not give much information just running systemctl start slurm, but when you do the systemctl status slurm.service we get more info as shown here.  

[root@rama25 ~]# systemctl status slurm.service
slurm.service - LSB: slurm daemon management
   Loaded: loaded (/etc/rc.d/init.d/slurm)
   Active: failed (Result: resources) since Fri 2014-10-17 16:58:15 CEST; 10s ago
  Process: 10829 ExecStart=/etc/rc.d/init.d/slurm start (code=exited, status=0/SUCCESS)

Oct 17 16:58:15 rama25.bullx slurm[10829]: starting slurmd:
Oct 17 16:58:15 rama25.bullx systemd[1]: PID file /var/run/slurmctld.pid not readable (yet?) after start.
Oct 17 16:58:15 rama25.bullx systemd[1]: Failed to start LSB: slurm daemon management.
Oct 17 16:58:15 rama25.bullx systemd[1]: Unit slurm.service entered failed state.
Oct 17 16:58:20 rama25.bullx systemd[1]: Stopped LSB: slurm daemon management.

[root@rama25 ~]# systemctl restart slurm
Job for slurm.service failed. See 'systemctl status slurm.service' and 'journalctl -xn' for details.
[root@rama25 ~]# systemctl start slurm
Job for slurm.service failed. See 'systemctl status slurm.service' and 'journalctl -xn' for details.
[root@rama25 ~]# systemctl stop slurm
[root@rama25 ~]# ps -aux|grep slurm
root 3368 0.0 0.0 198948 2412 ? Sl 16:41 0:00 slurmd
root 10873 0.0 0.0 112644 980 pts/1 S+ 17:00 0:00 grep --color=auto slurm

'journalctl -xn'
-- Logs begin at Fri 2014-10-17 14:32:02 CEST, end at Fri 2014-10-17 17:04:29 CEST. --
Oct 17 17:01:01 rama25.bullx run-parts(/etc/cron.hourly)[10885]: finished 0anacron
Oct 17 17:01:01 rama25.bullx run-parts(/etc/cron.hourly)[10887]: starting 0yum-hourly.cron
Oct 17 17:01:01 rama25.bullx run-parts(/etc/cron.hourly)[10891]: finished 0yum-hourly.cron
Oct 17 17:01:39 rama25.bullx collectd[1517]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1.
Oct 17 17:03:46 rama25.bullx collectd[1517]: Filter subsystem: Built-in target `write': Dispatching value to all write plugins failed with status -1.
Oct 17 17:04:29 rama25.bullx systemd[1]: Starting LSB: slurm daemon management...
-- Subject: Unit slurm.service has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit slurm.service has begun starting up.
Oct 17 17:04:29 rama25.bullx slurm[10904]: starting slurmd:
Oct 17 17:04:29 rama25.bullx systemd[1]: PID file /var/run/slurmctld.pid not readable (yet?) after start.
Oct 17 17:04:29 rama25.bullx systemd[1]: Failed to start LSB: slurm daemon management.
-- Subject: Unit slurm.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit slurm.service has failed.
-- 
-- The result is failed.
Oct 17 17:04:29 rama25.bullx systemd[1]: Unit slurm.service entered failed state.


I don't know why it checks for slurmctld.pid but I don't think this is the reason of the problem

It should be more related to cgroups 

[root@rama25 ~]# cat /etc/slurm/cgroup.conf
###
#
# Slurm cgroup support configuration file
#
# See man slurm.conf and man cgroup.conf for further
# information on cgroup configuration parameters
#--
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm/cgroup/"

ConstrainCores=yes
ConstrainRAMSpace=no

#CgroupMountOptions="cpuset"
CgroupMountpoint=/sys/fs/cgroup/


[root@mslu-1 georgioy]# mount|grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)

Comment 1 David Bigagli 2014-10-20 08:53:42 MDT

This seems to be an systemctl problem. This commands hang:

/bin/systemctl start slurm
 /bin/systemctl start slurm.service

if you strace it you will see it is waiting for some reply from somebody :-)

recvmsg(3, 0x7fffc47c64c0, MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}], 1, 4294967295) = 1 ([{fd=3, revents=POLLIN}])
recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"l\4\1\1q\0\0\0\241\5\0\0x\0\0\0\1\1o\0\31\0\0\0/org/fre"..., 2048}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 249
recvmsg(3, 0x7fffc47c68a0, MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}], 1, 4294967295^CProcess 5700 detached

The workaround is to cd /etc/init.d and then run the script, in this case
systemctl is out of the picture and everything works. 
I will try to find what's going on. 

I don't have any cgroup configured. Why do you think it is cgroup related?

David

Comment 2 David Gloe 2014-10-21 04:18:57 MDT

If you don't run slurmd and slurmctld together on any nodes, a workaround I found is to set SlurmdPidFile=/var/run/slurmctld.pid in slurm.conf.

My guess is that systemd just takes the last pidfile comment in the file as the pid file of the service.

Comment 3 Nancy 2014-10-21 05:27:24 MDT

David and David,
Thank you for the information and looking at the problem.  The information I provided does show a problem with pid file, but we had already corrected that part of the problem.  

The work around to start it from the init.d script does work, but it is a difficult work around when we have a large cluster.  

Thanks for the info,
Nancy

Comment 4 Nancy 2014-10-21 05:52:15 MDT

David B.
I took cgroups out of my configuration and you are right, it doesn't change anything.  So, I think we can take cgroups out of the equation.  
Nancy

Comment 5 David Bigagli 2014-10-22 09:09:49 MDT

I am playing around with the units file. I think we have to use those instead the 
/etc/init.d scripts. I will let you know when I got it to work.
You may want to have a look at this meanwhile:

https://wiki.archlinux.org/index.php/systemd

David

Comment 6 David Bigagli 2014-10-22 09:22:08 MDT

Something simple like this works for me, I created 2 unit files:

root@phobos /usr/lib/systemd/system>cat david.service 
[Unit]
Description=David server
After=syslog.target network.target auditd.service

[Service]
Type=forking
ExecStart=/sbin/slurmctld -vvv
Restart=on-failure
RestartSec=42s

[Install]
WantedBy=multi-user.target

and 

root@phobos /usr/lib/systemd/system>cat zebra.service 
[Unit]
Description=Zebra server
After=syslog.target network.target auditd.service

[Service]
Type=forking
ExecStart=/sbin/slurmd -vvv
Restart=on-failure
RestartSec=42s

[Install]
WantedBy=multi-user.target

one for the controller and another for the slurmd. Then start them:

root@phobos /usr/lib/systemd/system>systemctl start david.service
root@phobos /usr/lib/systemd/system>systemctl start zebra.service
root@phobos /usr/lib/systemd/system>ps -ef|grep slurm|grep -v grep
david    14726     1  0 14:19 ?        00:00:00 /sbin/slurmctld -vvv
root     14783     1  0 14:20 ?        00:00:00 /sbin/slurmd -vvv
root@phobos /usr/lib/systemd/system>sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
marte*       up   infinite      1  down* deimos
marte*       up   infinite      1   idle phobos

David

Comment 7 Ole.H.Nielsen@fysik.dtu.dk 2014-10-29 01:31:58 MDT

We're slurm newbies starting to set up slurm 14.11.0-0rc2 on a CentOS 7 nodes, which is going to be our next generation cluster setup. I can fully confirm the above bug in a very simple slurm setup.

As a temporary workaround on a compute node I had to make this PidFile configuration in slurm.conf:
  SlurmdPidFile=/var/run/slurmctld.pid
and start the daemons in the old way:
  cd /etc/init.d
  ./slurm start

I'd love to get a proper bug fix, how's the chance of this?

Thanks, Ole

Comment 8 David Bigagli 2014-10-30 09:09:49 MDT

Nancy, did you have a chance to try the service files I posted?

David

Comment 9 David Bigagli 2014-11-03 06:21:19 MST

This is not a Slurm problem but rather systemd. I figure out that systemd/systemctl is very sensitive to the existence of the PIDFile
if this variable includes a directory that does not exist systemctld
hangs after it starts the slurmd instead of returning an error.
The solution is to specify a correct path or comment it out.

David

Comment 10 Ole.H.Nielsen@fysik.dtu.dk 2014-11-03 06:46:16 MST

David, then what's wrong with this:
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
The /var/run/ directory surely exists already.
Can you give an explicitly working example for slurm.conf?

Comment 11 David Gloe 2014-11-03 06:49:28 MST

I opened a systemd bug on this. I think systemd doesn't handle init scripts with multiple pid files listed.

https://bugs.freedesktop.org/show_bug.cgi?id=85297

Comment 12 David Bigagli 2014-11-03 06:55:16 MST

On CentOS 7 this works for me:

root@phobos /usr/lib/systemd/system>cat slurmd.service 
[Unit]
Description=Slurm node daemon
After=network.target
ConditionPathExists=/home/david/cluster/1411/linux/etc/slurm.conf

[Service]
Type=forking
EnvironmentFile=/home/david/cluster/1411/linux/etc/defaults
ExecStart=/home/david/cluster/1411/linux/sbin/slurmd $SLURMD_OPTIONS
PIDFile=/var/slurm/pid/slurmd.pid

[Install]
WantedBy=multi-user.target

root@phobos /usr/lib/systemd/system>cat slurmctld.service 
[Unit]
Description=Slurm controller daemon
After=network.target
ConditionPathExists=/home/david/cluster/1411/linux/etc/slurm.conf

[Service]
Type=forking
EnvironmentFile=/home/david/cluster/1411/linux/etc/defaults
ExecStart=/home/david/cluster/1411/linux/sbin/slurmctld $SLURMCTLD_OPTIONS
PIDFile=/var/slurm/pid/slurmctld.pid

[Install]
WantedBy=multi-user.target

These are the templates Slurm installs in the etc directory where the examples
are. Then run 'systemctl enable slurmd' and 'systemctl enable slurmctld' and
start the services. When the machine reboots Slurmd is started as well.

I don't know why they invented systemd or why they think it is a good idea,
but that's not a question for me. :-) I was doing fine with /etc/rc.local.

David

Comment 13 Brandon Barker 2014-11-04 06:26:30 MST

I'm having similar symptoms on CentOS 7 on my *controller* node (haven't tried worker nodes yet). I'm new to both systemd and slurm (administration), and was amused to see activity on this as recent as yesterday, which partly makes up for the sysadmin woes.

Just to get on the same page, I'm curious what are the specific contents of the EnvironmentFile you have specified, and if there is a default that you are using?

Thanks,
Brandon

Comment 14 David Bigagli 2014-11-04 06:57:45 MST

These files should be considered as templates just like the examples of slurm.conf and release_agent in the etc directory. In my testing I used
a defaults file like this:

david@phobos ~/cluster/1411/linux/etc>cat defaults 
SLURMCTLD_OPTIONS=-vvv
SLURMD_OPTIONS=-vvv

setting the command like options for the daemons.

David

Comment 15 Ole.H.Nielsen@fysik.dtu.dk 2014-11-25 23:38:03 MST

Please reopen this bug because it hasn't been resolved at all!  I have confirmed that it's still present in SLURM 14.11.

Comment 16 David Gloe 2014-12-09 02:32:53 MST

From the systemd bug: "Hmm, listing two pidile entries in the headers is an extension that is not
supported by systemd, sorry, and it's unlikely to be supported. your daemon
really shouldn't ship thing with an extension like that..."

On systemd systems you really need to set up the service files or use my pidfile workaround for Slurm to work correctly.

Comment 17 Brian Christiansen 2014-12-15 08:14:41 MST

How about removing the two pidfile, and processname, tag lines in the slurm init.d script? 

brian@compy:~/slurm/14.11/slurm/etc$ git diff
diff --git a/etc/init.d.slurm.in b/etc/init.d.slurm.in
index 5387a9e..4741ccb 100644
--- a/etc/init.d.slurm.in
+++ b/etc/init.d.slurm.in
@@ -5,12 +5,6 @@
 #              manages exclusive access to a set of compute \
 #              resources and distributes work to those resources.
 #
-# processname: @sbindir@/slurmd
-# pidfile: /var/run/slurmd.pid
-#
-# processname: @sbindir@/slurmctld
-# pidfile: /var/run/slurmctld.pid
-#
 # config: /etc/sysconfig/slurm
 #
 ### BEGIN INIT INFO

This makes the init script work for me in centos 7 and centos 6.  

The init script can grab the corresponding pids independent of the pidfile tag line. The status() function greps out the Slurm[ctl]dPid file from the slurm.conf and matches the pid in the file against the pid of the process with the given daemon name. And stop()/killproc() kills the daemon by using the daemon name to get the pid. It looks, and feels, safe to do. Can anyone see any adverse effects to removing the pidfile and processname tag lines?

Comment 18 Sven Sternberger 2015-07-21 20:34:33 MDT

Hello

The bug is still in 14.11.8 and Centos7.1
As I read the systemd developer refuses to fix
this. The workaround for now is to remove the Pid-file 
line if the corresponding service won't be started.

But this means that we have different config files for different
nodes.

It would be nice if the service could be split up in two unit
files 

Sven

Comment 19 David Bigagli 2015-07-21 20:38:44 MDT

Hi did you try the scripts in comment 12?

David

Comment 20 Sven Sternberger 2015-07-21 21:25:50 MDT

(In reply to David Bigagli from comment #19)
> Hi did you try the scripts in comment 12?
> 
> David

I've missed that the files are already installed. 
And indeed this looks much better.

The only small issues:
- it fails if the defaults file are not in place
- the location of the pid file is not synced with the definition
  in /etc/slurm/slurm.conf

but this is easly adjustable, so for me it is fixed.
For the developer it would be nice if they could clean
up the situation

many thanks!

sven

Comment 21 David Bigagli 2015-07-21 21:32:34 MDT

Hi,
   these issues are actually fixed as well. If you look in the source code in the
etc directory you will find the .in files, e.g. slurmctld.service.in.
During the configuration phase before the build these template files are filled
with the configure options. The examples in comment 12 are after the software was
configured.

David

Comment 22 doug.parisek 2015-09-02 09:49:49 MDT

Created attachment 2173 [details]
Fix for described problem

Comment 23 doug.parisek 2015-09-02 09:56:50 MDT

Comment on attachment 2173 [details]
Fix for described problem

As I was asked to confirm this bug I added my own screen captures in attachment which support Comment #20.

Comment 24 David Bigagli 2015-09-06 20:50:46 MDT

Yes this is a known issue the location of the pid file must be a valid one.
The default directory in slurm.conf is /var/run if the path changes then
the startup file must be updated accordingly otherwise systemd does not
work properly.

David