Ticket 5903

Summary:	How to configure a Backup controller
Product:	Slurm	Reporter:	Hjalti Sveinsson <hjalti.sveinsson>
Component:	Configuration	Assignee:	Director of Support <support>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	4 - Minor Issue
Priority:	---
Version:	17.11.4
Hardware:	Linux
OS:	Linux
Site:	deCODE	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description Hjalti Sveinsson 2018-10-23 04:01:25 MDT

Hi,

We are in the progress of setting up a new machine that will act as a backup controller to our current environment. 

We have a single head node now that is writing all the state data to /var/lib/slurm/slurmctld/ directory. 

What happens when we change this to a NFS directory? Will we loose the state of jobs while switching over? 

Can you send me the steps that we should take to do this in correct order.

Best regards,
Hjalti Sveinsson

Comment 1 Michael Hinton 2018-10-23 15:31:59 MDT

Hi Hjalti,

This is the basic workflow: On the controller node, shutdown the slurmctld, backup StateSaveLocation folder, move StateSaveLocation files to a folder over NFS, mount that NFS folder over the StateSaveLocation shown in slurm.conf, and restart slurmctld. It should be straightforward to start up the backup controller in a similar way.

Make sure that the log files for the primary and backup controller are still written locally, and not over NFS. Also make sure that you remove the local files under the StateSaveLocation mount point, so you don’t accidentally load an old state if NFS fails to mount.

As always, it’s recommended to try this out on a dummy cluster.

Running jobs should still run properly, even when the controller is down. However, my testing shows that if a job completes while the controller is down, after the controller comes back up, it may stay in a running state. So be aware of that.

If you want to be extra cautious and do this when no jobs are running, you may want to consider creating reservations with the “maint” flag. See https://slurm.schedmd.com/reservations.html. 

I hope that helps.
-Michael

Comment 2 Hjalti Sveinsson 2018-10-25 09:36:20 MDT

Thank you, this clears it up. 

Best Regards,
Hjalti Sveinsson

Comment 3 Michael Hinton 2018-10-25 10:14:06 MDT

You're welcome! Please reopen if you have any other questions.

-Michael

Comment 4 Hjalti Sveinsson 2018-10-29 07:35:36 MDT

Hi again,

there was one big thing I forgot to mention. We have an underlying slurmdbd running on our current head node with mysql database. 

How do we make this redundant?

Install slurm packages on newdbhost.
Shutdown services (slurmctld/slurmdbd)? 
Move database (myqsldump?) to seperate DB host? 
Change config to match DB host? 
##
StorageHost=localhost
StoragePort=3306
To
StorageHost="newdbhost"
StoragePort=3306
##
##
# slurmDBD info
DbdAddr=localhost
DbdHost=localhost
DbdPort=6819
To
# slurmDBD info
DbdAddr="newdbhost"
DbdHost="newdbhost"
DbdPort=6819
#

Would be good to get information on how we go about doing this.

Best regards,
Hjalti Sveinsson

Comment 6 Michael Hinton 2018-10-29 10:21:36 MDT

Hjalti,

First of all, slurmctld itself backs up all cluster transactions when slurmdbd goes down, for a limited time. So in effect, slurmctld is a backup slurmdbd. Things like sacct won’t work while slurmdbd is down, but no data will be lost.

When slurmdbd is down, slurmctld writes job-related data to the local disk. So make sure to monitor `DBD Agent Queue size` with the sdiag command. Once this is full, you may start losing data. But this shouldn’t fill up for a while, unless you have a heavy workload. The sdiag man page explains how long slurmctld will cache information under "DBD Agent Queue Size": https://slurm.schedmd.com/sdiag.html.

Second, I would not recommend duplicating the db like you describe. The two databases will get out of sync if the backup slurmdbd kicks in, and you will have to merge the DBs back together manually. Instead, you should look into DB replication.

How you do DB replication is out of the scope of what we support, but it appears that two prominent replication strategies are Master-Slave and Galera Cluster. Here are a few resources you could look into:
*https://mariadb.com/kb/en/library/replication-cluster-multi-master/
*https://mariadb.com/kb/en/library/what-is-mariadb-galera-cluster/
*https://mariadb.com/kb/en/library/setting-up-replication/
*https://www.digitalocean.com/community/tutorials/how-to-set-up-master-slave-replication-in-mysql

Third, if you still want a backup *slurmdbd*, put it on a different node from the primary slurmdbd, but have both dameons use the same underlying replicated DB. To install the backup slurmdbd, you don’t have to shut anything down. When you start it up, it will automatically go into backup mode. As for your slurmdbd.conf, I think you have the right idea. Just make sure the MySql/Mariadb db permissions are set up to accept incoming connections other than localhost.

-Michael

Comment 7 Michael Hinton 2018-11-14 17:09:08 MST

Another alternative to DB replication is to simply backup your database regularly.

Closing ticket. Please reopen if you have any other questions or if something doesn't make sense.

Thanks,
Michael