Ticket 1732

Summary: slurm.conf in shared location
Product: Slurm Reporter: Herbert Mehlhose <herbert_mehlhose>
Component: ConfigurationAssignee: Brian Christiansen <brian>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 2 - High Impact    
Priority: --- CC: brian, da
Version: 14.11.7   
Hardware: Linux   
OS: Linux   
Site: PIK Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Herbert Mehlhose 2015-06-10 07:34:55 MDT
Hi,
my name is Herbert Mehlhose and I'm working on the SLURM configuration at PIK Potsdam. I did already compile slurm (14.11.7 currently, will compile 14.11.8 if you recommend this). As recommended I compiled munge and got this stuff basically working, so submitting simple jobs into partitions with sbatch is basically working.
The setup:
Cluster with 318 compute nodes SLES11 SP3 (stateless boot)
Filesystem GPFS
Provisioning is done via xCAT.
What I currently have implemented: xCAT installs all slurm and munge rpms required, and within an xCAT postscript the correct permissions etc. are set to the required directories. Basically: slurmd and slurmctld come up after a new installation or stateless rebooot.
I have been busy with many things during setup now, so my colleague Holger Holthoff already filed a ticked #1731 - this contains many questions the customer asked me - and this is quite urgent.

I want to file a new ticket here, regarding just one simple question. I think it is a good way to ask simple questions in single tickets, so that we have a short reference and can close these.

My current setup uses xCAT syncfiles to distribute the munge key and the slurm.conf. 
For the munge key, this is ok, as this is absolutely static.
For the slurm.conf: I would like to include this into a shared filesystem (here: GPFS), so that in case of any changes in configuration files, we do not have to sync these contents via xcat (which does rsync) to the nodes. This could result nodes not receiving the updates for whatever reasons. So using a shared FS will just require a service slurm reconfig (or maybe restart) to have changes available. And having a consistent configuration across the cluster is vital to avoid problems, that's what I get from the documentation.

So I tried this today and found, that there are several ways possible to change the path to the config:
a) at build time: DEFAULT_SLURM_CONF
b) at execution time:  by setting the SLURM_CONF environment variable
c) -f option at daemon startup

Option a) sounds interesting - might test that with next version build
Option b) - simply did not work - checking the /etc/init.d/slurm script shows me, that it checks for fixed /etc/slurm/ path
Option c) also does not work with init script.

I found, that running /usr/sbin/slurmd will accept the env variable and the -f option. But I want to use the init script, because this seems automatically to distinguish from the information in the configuration, which daemon has to be started. At least in my setup, on the controller nodes, it starts slurmctld and on the compute nodes it starts slurmd - it must get this information from the configuration file, but if it can't read this if it is located somewhere else, it is quite useless for my shared setup.

What I did now: just create a symlink "ln -s /p/system/slurm /etc/slurm" within my postscript. This works fine.

But: is this the recommended way to do it? Why does the init script not reflect the environment variable? Using /usr/sbin/slurmXX will require me to check manually, which role the node has.

BTW: next step I want to do is to be prepared for slurmctld failover - this means, that I'll go to include "StateSaveLocation" into the shared GPFS as well. But as said - another story and I'll open new ticket for this topic.

Thanks for any tips for good practice how to start the daemons (init script vs. direct call etc...)

Best regards,
Herbert
Comment 1 Brian Christiansen 2015-06-10 09:00:45 MDT
As you stated, it's very important that the slurm.conf's stay synced. Putting the slurm.conf in a shared location is a recommended way of doing this.

How are you building slurm? rpm or source? You can configure all the daemons and commands to look for the slurm.conf in the shared location by configuring with the --sysconfdir option (ex. ./configure --sysconfdir=/p/system/slurm).

Let us know if this works for you.

Thanks,
Brian
Comment 2 Herbert Mehlhose 2015-06-10 22:05:42 MDT
My original build was from rpmbuild. I changed over to source build now and do it using the --prefix way. This builds everything into shared directory in GPFS, including my slurm config.

./configure --prefix/p/system/slurmtest --with-hdf5=/root/hdf5-1.8.15/hdf5/bin/h5cc --with-munge=/usr/lib64

I can start the slurmctld now, just need to create an etc/slurm.conf within my slurmtest directory. This way, even my binaries are now shared and do not consume any space (especially good on stateless compute nodes)

I'm missing the nice /etc/init.d/slurm script, which automatically started the correct daemon (slurmctld on controller, slurmd on non-controllers). What I did here: copy this from my rpmbuild installation and adjust the BINDIR,CONFDIR,LIBDIR and SBINDIR to point into my gpfs paths. This one I can distribute easily by xCAT syncfiles.

I will test this setup with controllers and compute nodes and let you know the results.
Thank you,
best regards
Herbert
Comment 3 Herbert Mehlhose 2015-06-11 02:47:08 MDT
Hi Brian,

up to now, my tests look very promising. The adjusted init script works fine and slurmd is started on the compute nodes. So for now, it looks to me, that this ticket can be closed, if you do not see any issues with the way I described.
Next steps I will do:
- shared StateSaveLocation for 2 controllers
- hdf5 stuff (some important issue for the customer) - but I now managed to build the library.
I will open separate tickets for these, if issues arise.

Thank you,
best regards
Herbert
Comment 4 Brian Christiansen 2015-06-11 03:53:31 MDT
Glad to hear that things are going well. You can also set the prefix when building rpms. What you do is create a .rpmmacros file in your home directory and then put "%_prefix /p/system/slurmtest" in it. Then build the rpms. You can look at the slurm.spec file for other options.

Let us know if you have other questions.

Thanks,
Brian