This bug is actually two problems in one, as I can't figure out how to report them separately. All my tests are with Slurm 19.05.3 on CentOS 7.7. The systemd service files does not specify enough ordering for Slurm services. When mariadb, slurmdbd and slurmctld is running on the same host, systemd will happily attempt to start all three services simultaneously. This sort-of works but cause errors/warnings in logs and unnecessary delays. This happens every boot, but can be tested by running "systemctl restart mariadb slurmdbd slurmctld" and checking the logs. SlurmDBD logs errors because it tries to connect to MariaDB before it is ready. slurmdbd.log: error: mysql_real_connect failed: 2002 Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2) error: The database must be up when starting the MYSQL plugin. Trying again in 5 seconds. According to my tests this can be avoided completely by adding a "After=mariadb.service" to the slurmdbd.service. Slurmctld also fails to connect to slurmdbd as it is not running yet, and logs several errors. (This actually causes other problems, requiring a later slurmctld restart, but I will open a separate bug about that). Example output from slurmctld.log: slurmctld version 19.05.3-2 started on cluster terry ... error: slurm_persist_conn_open_without_init: failed to open persistent connection to terry-q:7031: Connection refused error: slurmdbd: Sending PersistInit msg: Connection refused error: Association database appears down, reading from state file. error: slurmdbd: Sending PersistInit msg: Connection refused Adding "Before=slurmctld.service" to slurmdbd.service, or "After=slurmdbd.service" to slurmctld.service is unfortunately not enought to prevent this. The slurmdbd service finishes startup before the daemon is actually ready. My current work-around for this is to use ExecStartPost in slurmdbd.service which waits until the DbdPort is open (we use DbdPort=7031): [Service] ExecStartPost=-/bin/bash -c "while ! ss -nltH|awk '{print $$4}'|grep '*:7031'; do sleep 0.5;done" TimeoutStartSec=0 For completeness, our current slurmdbd.service override looks like this. /etc/systemd/system/slurmdbd.service.d/override.conf: ######################################## # Ensure services start in the proper order: # mariadb -> slurmdbd -> slurmctld [Unit] After=mariadb.service Before=slurmctld.service [Service] # Delay until the DbdPort is listening ExecStartPost=-/bin/bash -c "while ! ss -nltH|awk '{print $$4}'|grep '*:7031'; do sleep 0.5;done" # Disable start timeout. During Slurm major version upgrades starting # SlurmDBD can take a long time because of database format # changes. Killing the process during this would be bad. TimeoutStartSec=0 ######################################## It would be nice if the Before=/After= ordering could be added to the included service files. This should be safe for sites that are using separate hosts for mariadb, slurmdbd, slurmctld as this only specify ordering and are not requirements. My ExecStartPost kludge should obviously not be included anywhere. But it would be good if the slurmdbd service startup could be fixed so that it does not finish before being ready.
The slurmctld issue I referred to is now reported as bug 8067.
Hi Pär, I have investigated about your issue and despite it would seem correct to add an After= or Before= in the systemd unit files shipped with Slurm, it can also be problematic. Depending on the architecture one designs, services may or may not be in the same server, so making one service like slurmdbd to *always* wait for mariadb have two problems: first it won't work on installations where the database is in a different server than the one where slurmdbd resides. Second not everybody would use mariadbd for the database, MySQL or maybe others could be used. Even more, in a typical cluster installation services may be managed by third party clustering software. It is not uncommon to use solutions like Pacemaker which already takes care of the ordering. Therefore I think not hardcoding the After= in our unit files and leaving this possibility to the administration entirely is the most correct solution. As for the second part regarding the ExecPre and slurmctld having to wait for slurmdbd, as you said it cannot be included neither due to it being a hackish solution. In any case I will investigate why systemd thinks slurmdbd is up even if it still not listening on the port. I also think the solution must come from bug 8067: slurmdbd/ctld should just not generate errors other than informative ones. What do you think?
> I have investigated about your issue and despite it would seem correct to > add an After= or Before= in the systemd unit files shipped with Slurm, it > can also be problematic. Depending on the architecture one designs, services > may or may not be in the same server, so making one service like slurmdbd to > *always* wait for mariadb have two problems: first it won't work on > installations where the database is in a different server than the one where > slurmdbd resides. Second not everybody would use mariadbd for the database, > MySQL or maybe others could be used. No, I think you have not understood what Before=,After= does. It I tried explaining briefly at the end why it will not cause the issues you mention. Please check out the systemd.unit documentation: https://www.freedesktop.org/software/systemd/man/systemd.unit.html# Before=,After= only controls the ordering during service start-up and shut-down. It does not initiate the any startup, add any requirements of other services actually running. It is also still possible to start/stop each service individually. So this will not make slurmdbd startup wait for a not-enabled or even non-existing mariadb service. This is very different compared to options like Wants= or Requires= which would cause the problems you describe. Non-existing services are ignored completely. So you could specify both mariadb and mysql if you prefer. "After=mysql.service mariadb.service" For unit files in RPM packages it would however make sense to only specify the one used during build. > Even more, in a typical cluster installation services may be managed by > third party clustering software. It is not uncommon to use solutions like > Pacemaker which already takes care of the ordering. A Pacemaker setup should be completely unaffected by this change. That some sites might be using clustering software is no reason not to fix the startup ordering in the simple case where people just install the RPMs. > Therefore I think not hardcoding the After= in our unit files and leaving > this possibility to the administration entirely is the most correct solution. I think this was based on an incorrect understanding on what Before=,After= does, see previous explanation. > As for the second part regarding the ExecPre and slurmctld having to wait > for slurmdbd, as you said it cannot be included neither due to it being a > hackish solution. That was mostly included to show that "Before=" is currently not sufficient. Before= still makes sense (or a After=slurmdbd in the slurmctld unit). Also this made the hackish solution available for other that might find this bug. > In any case I will investigate why systemd thinks slurmdbd is up even if it > still not listening on the port. Please do! I suspect that fixing this in in a way that avoids ugly external hacks would require naitive systemd support in slurmdbd and calling sd_notify() during startup: https://www.freedesktop.org/software/systemd/man/sd_notify.html# This would be a great new feature.
Hi Pär, First of all I apologize for having responded by memory and not checked systemd documentation. You are right, the After= and Before= only applies to start jobs launched by systemd, essentially during the boot process (though I suppose there could be other situations that could be applied). But yes, the general case is that After= or Before= alone, without any combination of Required= or Wants= which is what I really had in my mind should make the slurmdbd start after the mariadb or whatever we put in there. That would probably work though only for services like MariaDB which implements the sd_notify so are Type=notify services because on fork type services we would probably end in the same situation than we have between slurmdbd and slurmctld. I will propose to add mariadb and/or mysql into the slurmdbd unit file. I am seeing that it may be sufficient to just put mysql, since mariadb seems to add a link from mysql.service to mariadb.service. I have still to check if it is a standard or something about my RH packaging. Is this ok for you? > > In any case I will investigate why systemd thinks slurmdbd is up even if it > > still not listening on the port. > > Please do! > > I suspect that fixing this in in a way that avoids ugly external > hacks would require naitive systemd support in slurmdbd and calling > sd_notify() during startup: > https://www.freedesktop.org/software/systemd/man/sd_notify.html# > > This would be a great new feature. That's easy. We are daemonizing in slurmdbd before opening any ports, this is the first thing we do. After we have daemonized the tracked pid by systemd dies with a rc of 0 and therefore systemd thinks the service is up and running. In fact it is though it is not fully initialized. Changing from a systemd forking type service to a notify type one requires to link with systemd libraries. I am studying this possibility but it would be a feature request. In fact the last thing we do as part of init process is to open the port creating the rpc_mgr thread, which obviously cannot be done before as we need everything set to start receiving RPCs. Does it makes sense now? I re-apologize for having responded too quick and wrong at the first time.
Hi Pär, I am just adding here a message to still confirm that a fix is pending for review by our QA team. The fix refers to the order on how we start services in systemd. Other errors are being fixed in bug 8067. Thanks for your enormous patience and understanding.
(In reply to Felip Moll from comment #15) > Hi Pär, > > I am just adding here a message to still confirm that a fix is pending for > review by our QA team. The fix refers to the order on how we start services > in systemd. > > Other errors are being fixed in bug 8067. > > Thanks for your enormous patience and understanding. First of all sorry for the unusually long delay with this. The fix is committed into master and will be there starting from 21.08. commit 53bcf76ed35b3603bf17bd9e7c1fbb86a3e711e6 Author: Felip Moll <felip.moll@schedmd.com> AuthorDate: Wed Mar 3 14:52:00 2021 -0700 slurmdbd.service - add "After" relationship for mariadb. Include all common names for MariaDB/MySQL services. Only use After= and not Requires= to avoid issues if the database is located on a different host, and to avoid needing to deduce the appropriate service name for MariaDB which will vary by distribution. Bug 8066.