| Summary: | Systemd | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | NASA JSC Aerolab <JSC-DL-AEROLAB-ADMIN> |
| Component: | slurmctld | Assignee: | Brian Christiansen <brian> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 2 - High Impact | ||
| Priority: | --- | CC: | felip.moll |
| Version: | 17.11.1 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Johnson Space Center | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
NASA JSC Aerolab
2018-01-03 10:19:57 MST
I also tried replacing the ExecStart line in slurmctld.service with this (just to rule out any issues with the /etc/sysconfig/slurmctld file. ExecStart=/software/x86_64/slurm/17.11.1/sbin/slurmctld -f /software/x86_64/slurm/etc2/slurm.conf That didn't work either. [root@europa ~]# systemctl start slurmctld.service Job for slurmctld.service failed because a timeout was exceeded. See "systemctl status slurmctld.service" and "journalctl -xe" for details. [root@europa ~]# systemctl status slurmctld.service ● slurmctld.service - Slurm controller daemon Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; disabled; vendor preset: disabled) Active: failed (Result: timeout) since Wed 2018-01-03 11:23:47 CST; 9s ago Process: 37657 ExecStart=/software/x86_64/slurm/17.11.1/sbin/slurmctld -f /software/x86_64/slurm/etc2/slurm.conf (code=exited, status=0/SUCCESS) Jan 03 11:23:47 europa systemd[1]: slurmctld.service start operation timed out. Terminating. Jan 03 11:23:47 europa systemd[1]: Failed to start Slurm controller daemon. Jan 03 11:23:47 europa systemd[1]: Unit slurmctld.service entered failed state. Jan 03 11:23:47 europa systemd[1]: slurmctld.service failed. [root@europa ~]# It take bout 30 seconds or so for the systemctl start command to finish. If I just execute this directly: [root@europa ~]# /software/x86_64/slurm/17.11.1/sbin/slurmctld -f /software/x86_64/slurm/etc2/slurm.conf It starts right away. We're looking into this. Out of curiosity why are you using an etc2 directory? Is /software being shared by L1 and Europa? Have you considered configuring the binaries to always look at the etc2 directory? e.g. ./configure --sysconfdir=/software/x86_64/slurm/$VER/etc2. This would prevent the need to define the SLURM_CONF environment variable everywhere. Thanks, Brian And when it fails using systemd, do you see anything in the slurmctld logs? And does your PIDFile in the service script match what you have in your slurm.conf? This was an issue for me. Before fixing this up the start would hang and doing a status showed:
brian@lappy:/etc/systemd/system$ sudo systemctl status slurmctld1711.service
● slurmctld1711.service - Slurm controller daemon
Loaded: loaded (/etc/systemd/system/slurmctld1711.service; disabled; vendor preset: enabled)
Active: activating (start) since Wed 2018-01-03 11:21:11 MST; 15s ago
Tasks: 15
Memory: 3.3M
CPU: 55ms
CGroup: /system.slice/slurmctld1711.service
└─31745 /home/brian/slurm/17.11/lappy/sbin/slurmctld
Jan 03 11:21:11 lappy systemd[1]: Starting Slurm controller daemon...
Jan 03 11:21:11 lappy systemd[1]: slurmctld1711.service: PID file /var/run/slurmctld.pid not readable (yet?) after start: No such file or directory
I'm also able to start the slurmctld using the SLURM_CONF variable using the following ways: 1. Environment=SLURM_CONF=/home/brian/slurm/17.11/lappy/etc2/slurm.conf ExecStart=/home/brian/slurm/17.11/lappy/sbin/slurmctld $SLURMCTLD_OPTIONS 2. EnvironmentFile=-/tmp/slurmctld.conf ExecStart=/home/brian/slurm/17.11/lappy/sbin/slurmctld $SLURMCTLD_OPTIONS brian@lappy:/etc/systemd/system$ cat /tmp/slurmctld.conf SLURM_CONF=/home/brian/slurm/17.11/lappy/etc2/slurm.conf 3. ExecStart=/home/brian/slurm/17.11/lappy/sbin/slurmctld -f /home/brian/slurm/17.11/lappy/etc2/slurm.conf e.g. [Unit] Description=Slurm controller daemon After=network.target munge.service ConditionPathExists=/home/brian/slurm/17.11/lappy/etc2/slurm.conf [Service] Type=forking Environment=SLURM_CONF=/home/brian/slurm/17.11/lappy/etc2/slurm.conf #EnvironmentFile=-/tmp/slurmctld.conf #ExecStart=/home/brian/slurm/17.11/lappy/sbin/slurmctld -f /home/brian/slurm/17.11/lappy/etc2/slurm.conf ExecStart=/home/brian/slurm/17.11/lappy/sbin/slurmctld $SLURMCTLD_OPTIONS ExecReload=/bin/kill -HUP $MAINPID PIDFile=/home/brian/slurm/17.11/lappy/run/slurmctld.pid TasksMax=infinity [Install] WantedBy=multi-user.target Correct - /software is being shared by L1 and Europa. Then end goal is to have both share the same slurm binaries - just running different configurations, hence the etc and etc2.
No, I wasn't seeing anything at all in the slurmctl_log. So I don't think systemd was ever even starting the process.
My PID file was not consistent with slurm.conf. But fixing that did not help.
Next, I tried adding TasksMax=infinity, which was missing from my systemd file. Strangely, this seemed to help. I can now start and stop slurmctld via systemd. Awesome.
The Environment=SLURM_CONF method seems to be working well. Thanks for the info and help.
[root@europa slurm]# systemctl status slurmctld.service
● slurmctld.service - Slurm controller daemon
Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2018-01-03 13:43:55 CST; 2s ago
Process: 2539 ExecStart=/software/x86_64/slurm/17.11.1/sbin/slurmctld $SLURMCTLD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 2542 (slurmctld)
CGroup: /system.slice/slurmctld.service
└─2542 /software/x86_64/slurm/17.11.1/sbin/slurmctld -f /software/x86_64/slurm/etc2/slurm...
[root@europa slurm]#
[root@europa slurm]#
[root@europa slurm]# cat /usr/lib/systemd/system/slurmctld.service
[Unit]
Description=Slurm controller daemon
After=network.target munge.service
ConditionPathExists=/software/x86_64/slurm/17.11.1/etc2/slurm.conf
[Service]
Type=forking
Environment=SLURM_CONF=/software/x86_64/slurm/etc2/slurm.conf
EnvironmentFile=-/etc/sysconfig/slurmctld
ExecStart=/software/x86_64/slurm/17.11.1/sbin/slurmctld $SLURMCTLD_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/var/run/slurm/slurmctld.pid
TasksMax=infinity
[Install]
WantedBy=multi-user.target
[root@europa slurm]#
Glad it's working for you now. I don't have that issue if I remove TasksMax option -- but I'm testing on Ubuntu and it may behave differently in a different os. Another thought on the etc, etc2 setup. You may do something like configure the sysconfir to be something local on the systems and then have the local link to the correct etc directory for the system. Just so you don't have to use the SLURM_CONF variable. One foreseeable issue in using the SLURM_CONF variable is if you end up submitting a job from Europa to L1 using the --cluster option, the SLURM_CONF environment variable will be preserved in the job's environment (unless told not to or is unset. e.g. sbatch --export=none) and then if the batch script attempts to run a Slurm command, it will attempt to talk back to Europa because of the SLURM_CONF env variable. Just some thoughts. Do you need anymore help on this bug? Good catch on the etc, etc2 issue. That would have definitely bit us as we intend to either use -M or even federate these two, if possible. I'll do something as you suggest - either different binaries with the configdir built in or the symlink on a local file system. No, I think that's it. I was able to get slurmd up with systemd on the nodes following something similar. Thanks again. Another suggestion, if you aren't doing similar already, is to have a symlink that points to the version of Slurm that is in use. e.g. /software/x86_64/slurm/current -> /software/x86_64/slurm/17.11.1 This allows you to install new versions and then just change the symlink when you are ready. Then you don't have to update your service files when you upgrade. Instead you would have: ExecStart=/software/x86_64/slurm/current/sbin/slurmctld $SLURMCTLD_OPTIONS Typically this is done by sharing /software just in one cluster -- then each cluster could be updated independently with their own symlinks but could follow the same ideas for how you setup the etc dirs. I'll close the bug. Let us know if you have any other questions. Thanks, Brian On CentOS 7, I discovered that /var/run is now a syslink to /run, which is a tmpfs. Therefore, for the default PID locations in /var/run/slurm/, I had to add this to my systemd files: ExecStartPre=-/usr/bin/mkdir /var/run/slurm ExecStartPre=/usr/bin/chown -R slurm /var/run/slurm/ You might consider making that change in your source to benefit others. The default is actually just /var/run/slurm[ctl]d.pid and not /var/run/slurm/... Right, but its the same issue. Since slurmctld runs as the slurm user, you can't create a pid file in /var/run. Our workaround was to make a persistent directory (/var/run/slurm/) owned by the slurm user so you can consistently create and remove the pid file. I still think you need some ExecStartPre statements to setup the permissions properly, especially since /var/run is created fresh on every boot. (In reply to NASA JSC Aerolab from comment #12) > Right, but its the same issue. Since slurmctld runs as the slurm user, you > can't create a pid file in /var/run. Our workaround was to make a > persistent directory (/var/run/slurm/) owned by the slurm user so you can > consistently create and remove the pid file. I still think you need some > ExecStartPre statements to setup the permissions properly, especially since > /var/run is created fresh on every boot. Just wanted to add some comments here: To create directories in the systems that use systemd and tmpfs for /var/run, you should create an /etc/tmpfiles.d/slurm.conf file with the following contents: d /var/run/slurm 0755 slurm slurm - The recommended directory for this systems is /var/run/slurm. In this case, once you reboot the machine systemd will create this directory for you with the appropriate permissions. You can read more on this matter googling for 'tmpfiles.d systemd'. If you are using systemd you should avoid absolutely /etc/init.d scripts. Regarding the TaskMax parameter, please refer to bug 3526 to see the implications. Other recommended variables for systemd could be: LimitNOFILE=1048576 LimitNPROC=1541404 LimitMEMLOCK=infinity LimitSTACK=infinity For slurmd service file, TaskMax inifinify and other limits could also be needed. For the environment, this works on all my systems (CentOS, SuSE, RHEL), use: EnvironmentFile=-/etc/sysconfig/slurmd Always remember to do systemctl daemon-reload and whatever is necessary to reload unit files. Thanks for the info - very helpful. |