Ticket 4573

Summary:	Systemd
Product:	Slurm	Reporter:	NASA JSC Aerolab <JSC-DL-AEROLAB-ADMIN>
Component:	slurmctld	Assignee:	Brian Christiansen <brian>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	2 - High Impact
Priority:	---	CC:	felip.moll
Version:	17.11.1
Hardware:	Linux
OS:	Linux
Site:	Johnson Space Center	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description NASA JSC Aerolab 2018-01-03 10:19:57 MST

I can't seem to get slurmctl started properly using systemd.  As a start, I got it working using an existing /etc/init.d/slurm script.  At the top of this I have the following in order to start the desired version of slurmctld with the right slurm.conf file (we are using a non-standard location for slurm.conf - etc2):

VER=17.11.1
BINDIR="/software/x86_64/slurm/$VER/bin"
CONFDIR="/software/x86_64/slurm/$VER/etc2"
LIBDIR="/software/x86_64/slurm/$VER/lib"
SBINDIR="/software/x86_64/slurm/$VER/sbin"
SLURM_CONF="$CONFDIR/slurm.conf"

A "service slurm start" brings up slurmcltd just fine.  In order to try and do the same thing via systemd I have this:


[root@europa ~]# cat /etc/sysconfig/slurmctld 
#SLURM_CONF=/software/x86_64/slurm/etc2/slurm.conf
SLURMCTLD_OPTIONS='-f /software/x86_64/slurm/etc2/slurm.conf'
[root@europa ~]# 

Note, I have some symlinks setup up:

[root@europa ~]# ls -ld /software/x86_64/slurm/17.11.1/etc*
lrwxrwxrwx 1 root root 7 Jan  2 14:41 /software/x86_64/slurm/17.11.1/etc -> ../etc2
lrwxrwxrwx 1 root root 7 Jan  2 14:42 /software/x86_64/slurm/17.11.1/etc2 -> ../etc2
[root@europa ~]# ls /software/x86_64/slurm/etc2/slurm.conf
/software/x86_64/slurm/etc2/slurm.conf
[root@europa ~]# 



And here is our slurmctld.service file:



[root@europa ~]# cat /usr/lib/systemd/system/slurmctld.service 
[Unit]
Description=Slurm controller daemon
After=network.target munge.service
ConditionPathExists=/software/x86_64/slurm/17.11.1/etc2/slurm.conf

[Service]
Type=forking
EnvironmentFile=-/etc/sysconfig/slurmctld
ExecStart=/software/x86_64/slurm/17.11.1/sbin/slurmctld $SLURMCTLD_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/var/run/slurmctld.pid


[Install]
WantedBy=multi-user.target
[root@europa ~]# 




But trying to start it fails:

[root@europa ~]# systemctl start slurmctld.service
Job for slurmctld.service failed because a timeout was exceeded. See "systemctl status slurmctld.service" and "journalctl -xe" for details.
[root@europa ~]# systemctl status slurmctld.service
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; disabled; vendor preset: disabled)
   Active: failed (Result: timeout) since Wed 2018-01-03 11:18:06 CST; 39s ago

Jan 03 11:18:02 europa systemd[1]: slurmctld.service start operation timed out. Terminating.
Jan 03 11:18:06 europa systemd[1]: Failed to start Slurm controller daemon.
Jan 03 11:18:06 europa systemd[1]: Unit slurmctld.service entered failed state.
Jan 03 11:18:06 europa systemd[1]: slurmctld.service failed.
[root@europa ~]# 

The journalctl command doesn't show anything useful.  Any ideas what I'm doing wrong?

Comment 1 NASA JSC Aerolab 2018-01-03 10:30:50 MST

I also tried replacing the ExecStart line in slurmctld.service with this (just to rule out any issues with the /etc/sysconfig/slurmctld file.  

ExecStart=/software/x86_64/slurm/17.11.1/sbin/slurmctld -f /software/x86_64/slurm/etc2/slurm.conf

That didn't work either.  

[root@europa ~]# systemctl start slurmctld.service
Job for slurmctld.service failed because a timeout was exceeded. See "systemctl status slurmctld.service" and "journalctl -xe" for details.
[root@europa ~]# systemctl status slurmctld.service
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; disabled; vendor preset: disabled)
   Active: failed (Result: timeout) since Wed 2018-01-03 11:23:47 CST; 9s ago
  Process: 37657 ExecStart=/software/x86_64/slurm/17.11.1/sbin/slurmctld -f /software/x86_64/slurm/etc2/slurm.conf (code=exited, status=0/SUCCESS)

Jan 03 11:23:47 europa systemd[1]: slurmctld.service start operation timed out. Terminating.
Jan 03 11:23:47 europa systemd[1]: Failed to start Slurm controller daemon.
Jan 03 11:23:47 europa systemd[1]: Unit slurmctld.service entered failed state.
Jan 03 11:23:47 europa systemd[1]: slurmctld.service failed.
[root@europa ~]# 


It take bout 30 seconds or so for the systemctl start command to finish.  If I just execute this directly:

[root@europa ~]# /software/x86_64/slurm/17.11.1/sbin/slurmctld -f /software/x86_64/slurm/etc2/slurm.conf

It starts right away.

Comment 2 Brian Christiansen 2018-01-03 11:06:21 MST

We're looking into this. 

Out of curiosity why are you using an etc2 directory? Is /software being shared by L1 and Europa?

Have you considered configuring the binaries to always look at the etc2 directory? e.g. ./configure --sysconfdir=/software/x86_64/slurm/$VER/etc2. This would prevent the need to define the SLURM_CONF environment variable everywhere.

Thanks,
Brian

Comment 3 Brian Christiansen 2018-01-03 11:08:51 MST

And when it fails using systemd, do you see anything in the slurmctld logs?

Comment 4 Brian Christiansen 2018-01-03 11:26:26 MST

And does your PIDFile in the service script match what you have in your slurm.conf? This was an issue for me. Before fixing this up the start would hang and doing a status showed:

brian@lappy:/etc/systemd/system$ sudo systemctl status slurmctld1711.service
● slurmctld1711.service - Slurm controller daemon
   Loaded: loaded (/etc/systemd/system/slurmctld1711.service; disabled; vendor preset: enabled)
   Active: activating (start) since Wed 2018-01-03 11:21:11 MST; 15s ago
    Tasks: 15
   Memory: 3.3M
      CPU: 55ms
   CGroup: /system.slice/slurmctld1711.service
           └─31745 /home/brian/slurm/17.11/lappy/sbin/slurmctld

Jan 03 11:21:11 lappy systemd[1]: Starting Slurm controller daemon...
Jan 03 11:21:11 lappy systemd[1]: slurmctld1711.service: PID file /var/run/slurmctld.pid not readable (yet?) after start: No such file or directory

Comment 5 Brian Christiansen 2018-01-03 12:09:09 MST

I'm also able to start the slurmctld using the SLURM_CONF variable using the following ways:

1.
Environment=SLURM_CONF=/home/brian/slurm/17.11/lappy/etc2/slurm.conf
ExecStart=/home/brian/slurm/17.11/lappy/sbin/slurmctld $SLURMCTLD_OPTIONS

2.
EnvironmentFile=-/tmp/slurmctld.conf
ExecStart=/home/brian/slurm/17.11/lappy/sbin/slurmctld $SLURMCTLD_OPTIONS

brian@lappy:/etc/systemd/system$ cat /tmp/slurmctld.conf 
SLURM_CONF=/home/brian/slurm/17.11/lappy/etc2/slurm.conf

3.
ExecStart=/home/brian/slurm/17.11/lappy/sbin/slurmctld -f /home/brian/slurm/17.11/lappy/etc2/slurm.conf


e.g.
[Unit]
Description=Slurm controller daemon
After=network.target munge.service
ConditionPathExists=/home/brian/slurm/17.11/lappy/etc2/slurm.conf

[Service]
Type=forking
Environment=SLURM_CONF=/home/brian/slurm/17.11/lappy/etc2/slurm.conf
#EnvironmentFile=-/tmp/slurmctld.conf
#ExecStart=/home/brian/slurm/17.11/lappy/sbin/slurmctld -f /home/brian/slurm/17.11/lappy/etc2/slurm.conf
ExecStart=/home/brian/slurm/17.11/lappy/sbin/slurmctld $SLURMCTLD_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/home/brian/slurm/17.11/lappy/run/slurmctld.pid
TasksMax=infinity

[Install]
WantedBy=multi-user.target

Comment 6 NASA JSC Aerolab 2018-01-03 12:53:01 MST

Correct - /software is being shared by L1 and Europa.  Then end goal is to have both share the same slurm binaries - just running different configurations, hence the etc and etc2.  

No, I wasn't seeing anything at all in the slurmctl_log.  So I don't think systemd was ever even starting the process.  

My PID file was not consistent with slurm.conf.  But fixing that did not help.  

Next, I tried adding TasksMax=infinity, which was missing from my systemd file.  Strangely, this seemed to help.  I can now start and stop slurmctld via systemd.  Awesome.  

The Environment=SLURM_CONF method seems to be working well.  Thanks for the info and help.  



[root@europa slurm]# systemctl status slurmctld.service
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2018-01-03 13:43:55 CST; 2s ago
  Process: 2539 ExecStart=/software/x86_64/slurm/17.11.1/sbin/slurmctld $SLURMCTLD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 2542 (slurmctld)
   CGroup: /system.slice/slurmctld.service
           └─2542 /software/x86_64/slurm/17.11.1/sbin/slurmctld -f /software/x86_64/slurm/etc2/slurm...
[root@europa slurm]# 
[root@europa slurm]# 
[root@europa slurm]# cat /usr/lib/systemd/system/slurmctld.service 
[Unit]
Description=Slurm controller daemon
After=network.target munge.service
ConditionPathExists=/software/x86_64/slurm/17.11.1/etc2/slurm.conf

[Service]
Type=forking

Environment=SLURM_CONF=/software/x86_64/slurm/etc2/slurm.conf
EnvironmentFile=-/etc/sysconfig/slurmctld
ExecStart=/software/x86_64/slurm/17.11.1/sbin/slurmctld $SLURMCTLD_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/var/run/slurm/slurmctld.pid
TasksMax=infinity

[Install]
WantedBy=multi-user.target
[root@europa slurm]#

Comment 7 Brian Christiansen 2018-01-03 13:34:20 MST

Glad it's working for you now. I don't have that issue if I remove TasksMax option -- but I'm testing on Ubuntu and it may behave differently in a different os.


Another thought on the etc, etc2 setup. You may do something like configure the sysconfir to be something local on the systems and then have the local link to the correct etc directory for the system. Just so you don't have to use the SLURM_CONF variable. One foreseeable issue in using the SLURM_CONF variable is if you end up submitting a job from Europa to L1 using the --cluster option, the SLURM_CONF environment variable will be preserved in the job's environment (unless told not to or is unset. e.g. sbatch --export=none) and then if the batch script attempts to run a Slurm command, it will attempt to talk back to Europa because of the SLURM_CONF env variable. Just some thoughts.


Do you need anymore help on this bug?

Comment 8 NASA JSC Aerolab 2018-01-03 13:44:13 MST

Good catch on the etc, etc2 issue.  That would have definitely bit us as we intend to either use -M or even federate these two, if possible.  I'll do something as you suggest - either different binaries with the configdir built in or the symlink on a local file system.   

No, I think that's it.  I was able to get slurmd up with systemd on the nodes following something similar.  

Thanks again.

Comment 9 Brian Christiansen 2018-01-03 13:57:02 MST

Another suggestion, if you aren't doing similar already, is to have a symlink that points to the version of Slurm that is in use.

e.g.
/software/x86_64/slurm/current -> /software/x86_64/slurm/17.11.1

This allows you to install new versions and then just change the symlink when you are ready. Then you don't have to update your service files when you upgrade. Instead you would have:

ExecStart=/software/x86_64/slurm/current/sbin/slurmctld $SLURMCTLD_OPTIONS

Typically this is done by sharing /software just in one cluster -- then each cluster could be updated independently with their own symlinks but could follow the same ideas for how you setup the etc dirs.


I'll close the bug. Let us know if you have any other questions.

Thanks,
Brian

Comment 10 NASA JSC Aerolab 2018-01-08 09:45:02 MST

On CentOS 7, I discovered that /var/run is now a syslink to /run, which is a tmpfs.  Therefore, for the default PID locations in /var/run/slurm/, I had to add this to my systemd files:

ExecStartPre=-/usr/bin/mkdir /var/run/slurm
ExecStartPre=/usr/bin/chown -R slurm /var/run/slurm/

You might consider making that change in your source to benefit others.

Comment 11 Brian Christiansen 2018-01-08 11:34:23 MST

The default is actually just /var/run/slurm[ctl]d.pid and not /var/run/slurm/...

Comment 12 NASA JSC Aerolab 2018-01-09 07:01:14 MST

Right, but its the same issue.  Since slurmctld runs as the slurm user, you can't create a pid file in /var/run.  Our workaround was to make a persistent directory (/var/run/slurm/) owned by the slurm user so you can consistently create and remove the pid file.  I still think you need some ExecStartPre statements to setup the permissions properly, especially since /var/run is created fresh on every boot.

Comment 13 Felip Moll 2018-01-09 09:10:32 MST

(In reply to NASA JSC Aerolab from comment #12)
> Right, but its the same issue.  Since slurmctld runs as the slurm user, you
> can't create a pid file in /var/run.  Our workaround was to make a
> persistent directory (/var/run/slurm/) owned by the slurm user so you can
> consistently create and remove the pid file.  I still think you need some
> ExecStartPre statements to setup the permissions properly, especially since
> /var/run is created fresh on every boot.

Just wanted to add some comments here:

To create directories in the systems that use systemd and tmpfs for /var/run, you should create an /etc/tmpfiles.d/slurm.conf file with the following contents:

d /var/run/slurm 0755 slurm slurm -

The recommended directory for this systems is /var/run/slurm. In this case, once you reboot the machine systemd will create this directory for you with the appropriate permissions.

You can read more on this matter googling for 'tmpfiles.d systemd'.


If you are using systemd you should avoid absolutely /etc/init.d scripts.

Regarding the TaskMax parameter, please refer to bug 3526 to see the implications.

Other recommended variables for systemd could be:
LimitNOFILE=1048576
LimitNPROC=1541404
LimitMEMLOCK=infinity
LimitSTACK=infinity

For slurmd service file, TaskMax inifinify and other limits could also be needed.

For the environment, this works on all my systems (CentOS, SuSE, RHEL), use:

EnvironmentFile=-/etc/sysconfig/slurmd



Always remember to do systemctl daemon-reload and whatever is necessary to reload unit files.

Comment 14 NASA JSC Aerolab 2018-01-10 07:37:47 MST

Thanks for the info - very helpful.