| Summary: | Srun fails to load correct slurm.conf in multi-cluster setup | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | peter.georg |
| Component: | User Commands | Assignee: | Jacob Jenson <jacob> |
| Status: | RESOLVED INVALID | QA Contact: | |
| Severity: | 6 - No support contract | ||
| Priority: | --- | ||
| Version: | 20.02.6 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | -Other- | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
|
Description
peter.georg
2020-12-02 03:42:32 MST
Just found another bug of that group of bugs: `scontrol --clusters=dummy reboot`. Once again the "generic" slurm.conf (typically the local /etc/slurm/slurm.conf) is loaded to verify "RebootProgram" has been set. Indeed it should check the cluster specific slurm.conf. Again, a work-around is to set RebootProgram to an arbitrary value (the value is not used at any time). This can be easily seen at the source code here: https://github.com/SchedMD/slurm/blob/23cbe39d98cfacd9434f10c19a415c6092e4c61c/src/scontrol/reboot_node.c#L70 |