| Summary: | Multiple Slurmctld on the same serve | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Marco Passerini <marco.passerini> |
| Component: | slurmctld | Assignee: | Moe Jette <jette> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 5 - Enhancement | ||
| Priority: | --- | CC: | brian, da |
| Version: | 14.03.10 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | CSC - IT Center for Science | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Marco Passerini
2014-11-30 21:22:41 MST
There is not a way today to do what you describe. Normally a cluster would have two slurmctld in order to provide fault tolerance and they would be on different servers. What are you trying to accomplish with two daemons on the same server? So, we have basically 2 clusters, with different slurmctld, sharing the same slurmdbd. The important fact here is that each of them has a different queuing policy (one for high throughput and one for a more fair scheduling). For convenience (we don't have free admin boxes right now) we thought of installing 2 slurm controllers on the same admin node. Do you think we're the only ones who desire this kind of configuration? (In reply to Marco Passerini from comment #2) > So, we have basically 2 clusters, with different slurmctld, sharing the same > slurmdbd. The important fact here is that each of them has a different > queuing policy (one for high throughput and one for a more fair scheduling). > For convenience (we don't have free admin boxes right now) we thought of > installing 2 slurm controllers on the same admin node. > Do you think we're the only ones who desire this kind of configuration? If this is for two different clusters, you will just need to have each slurmctld daemon use a different configuration file, different ports, different log files, etc. It might also be a bit confusing for system administrators. Note that Slurm commands or daemons can be started with an environment variable SLURM_CONF set to the path of your slurm.conf file will result in that configuration file being used. Note that all of us here at SchedMD use the same cluster for most of our testing. Besides the necessary different ports, logs, etc. We do install in our own home directories so avoid conflicts. The information below is from the slurmctld man page: ENVIRONMENT VARIABLES The following environment variables can be used to override settings compiled into slurmctld. SLURM_CONF The location of the SLURM configuration file. This is overridden by explicitly naming a configuration file on the command line. The details can be conveniently handle with the module software. http://modules.sourceforge.net/ this is an example of 2 module config files for slurm 1403 and 1411 david@prometeo /etc/modulefiles>less 1403 #%Module 1.0 # # setenv SLURM_ROOT /home/david/clusters/1403/linux setenv SLURM_BUILD /home/david/clusters/1403/linux/build prepend-path PATH /home/david/clusters/1403/linux/bin:/home/david/clusters/1403/linux/sbin prepend-path LD_LIBRARY_PATH /home/david/clusters/1403/linux/lib:/home/david/clusters/1403/linux/lib/slurm append-path MANPATH /home/david/clusters/1403/linux/share/man setenv SLURM_CONF /home/david/clusters/1403/linux/etc/slurm.conf setenv SLURM_ENVDIR /home/david/clusters/1403/linux/etc setenv SLURM_SRCDIR /home/david/slurm/1403/slurm setenv SLURM_SERVERDIR /home/david/clusters/1403/linux/sbin david@prometeo /etc/modulefiles>less 1411 #%Module 1.0 # # OpenMPI module for use with 'environment-modules' package: # setenv SLURM_ROOT /home/david/clusters/1411/linux setenv SLURM_BUILD /home/david/clusters/1411/linux/build prepend-path PATH /home/david/clusters/1411/linux/bin:/home/david/clusters/1411/linux/sbin prepend-path LD_LIBRARY_PATH /home/david/clusters/1411/linux/lib:/home/david/clusters/1411/linux/lib/slurm append-path MANPATH /home/david/clusters/1411/linux/share/man setenv SLURM_CONF /home/david/clusters/1411/linux/etc/slurm.conf setenv SLURM_ENVDIR /home/david/clusters/1411/linux/etc setenv SLURM_SRCDIR /home/david/slurm/1411/slurm setenv SLURM_SERVERDIR /home/david/clusters/1411/linux/sbin I just type 'module 1403' or 'module 1411' to get the desired configuration. David Hi,
I thanks for the pointers. In fact, I managed to run 2 slurm controllers on the same server, but the main issue is that I would like init scripts to manage them, since we're using Pacemaker for high availability, which is quite sensitive if they don't work correctly.
I tried to hack the provided init script, in this way...
I configured the paths like this:
PREFIX=/opt/slurm-shell/default
BINDIR=$PREFIX/bin
CONFDIR=$PREFIX/etc
LIBDIR=$PREFIX/lib
SBINDIR=$PREFIX/sbin
In the following function:
slurmstatus() {
I had to comment out this, because otherwise the status was not returned correctly:
#if [ "$base" = "slurmctld" -a "$pid" != "" ] ; then
# echo $"${base} (pid $pid) is running..."
# continue
#fi
This was not enough though, because when I was running service slurm-shell stop sometimes the wrong daemon was stopped. The reason is that the function does not check the pid of the program it kills:
stop() {
echo -n "stopping $1: "
killproc $1 -TERM
rc_status -v
echo
rm -f /var/lock/subsys/slurm
}
So then I started hacking the init script but eventually gave up because I thought it was going to be too annoying to maintain a custom solution on the long run.
It would be nice if the script was written in a way that it was enough to set up the config directories at the top and all the rest would work.
|