| Summary: | slurm.conf in shared location | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Herbert Mehlhose <herbert_mehlhose> |
| Component: | Configuration | Assignee: | Brian Christiansen <brian> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 2 - High Impact | ||
| Priority: | --- | CC: | brian, da |
| Version: | 14.11.7 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | PIK | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Herbert Mehlhose
2015-06-10 07:34:55 MDT
As you stated, it's very important that the slurm.conf's stay synced. Putting the slurm.conf in a shared location is a recommended way of doing this. How are you building slurm? rpm or source? You can configure all the daemons and commands to look for the slurm.conf in the shared location by configuring with the --sysconfdir option (ex. ./configure --sysconfdir=/p/system/slurm). Let us know if this works for you. Thanks, Brian My original build was from rpmbuild. I changed over to source build now and do it using the --prefix way. This builds everything into shared directory in GPFS, including my slurm config. ./configure --prefix/p/system/slurmtest --with-hdf5=/root/hdf5-1.8.15/hdf5/bin/h5cc --with-munge=/usr/lib64 I can start the slurmctld now, just need to create an etc/slurm.conf within my slurmtest directory. This way, even my binaries are now shared and do not consume any space (especially good on stateless compute nodes) I'm missing the nice /etc/init.d/slurm script, which automatically started the correct daemon (slurmctld on controller, slurmd on non-controllers). What I did here: copy this from my rpmbuild installation and adjust the BINDIR,CONFDIR,LIBDIR and SBINDIR to point into my gpfs paths. This one I can distribute easily by xCAT syncfiles. I will test this setup with controllers and compute nodes and let you know the results. Thank you, best regards Herbert Hi Brian, up to now, my tests look very promising. The adjusted init script works fine and slurmd is started on the compute nodes. So for now, it looks to me, that this ticket can be closed, if you do not see any issues with the way I described. Next steps I will do: - shared StateSaveLocation for 2 controllers - hdf5 stuff (some important issue for the customer) - but I now managed to build the library. I will open separate tickets for these, if issues arise. Thank you, best regards Herbert Glad to hear that things are going well. You can also set the prefix when building rpms. What you do is create a .rpmmacros file in your home directory and then put "%_prefix /p/system/slurmtest" in it. Then build the rpms. You can look at the slurm.spec file for other options. Let us know if you have other questions. Thanks, Brian |