| Summary: | A login / submit node is not using configless slurm | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | GSK-ONYX-SLURM <slurm-support> |
| Component: | Scheduling | Assignee: | Marcin Stolarek <cinek> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | cinek |
| Version: | 21.08.1 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | GSK | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
GSK-ONYX-SLURM
2022-01-26 01:55:38 MST
Radek, The other way may be to configure a submit host as one of the cluster nodes. It doesn't have to be included in any partition, but running slurmd on it will result in a cached slurm.conf(/run/slurm/conf/slurm.conf) being stored on it and used by Slurm commands. cheers, Marcin Hey Marcin, many thanks for your quick response. The submit node is a part of the cluster. The slurmd service is running on that node and the cached config files are stored in /var/spool/slurmd/conf-cache. The point is that a submit node seems not using them. It's looking compute nodes in /etc/slurm/slurm.conf instead. If the specific compute node doesn't exist, it will display a failure, even though it does exist in cached slurm.conf. The slurmd is configured properly in terms of configless point of view. The slurm.conf along with other files (i.e. gres.conf) is updated every time the new configuration is pushed by scontrol reconfig command from the control node. Any ideas? Cheers, Radek Radek,
>The point is that a submit node seems not using them. It's looking compute nodes in /etc/slurm/slurm.conf instead. If the specific compute node doesn't exist, it will display a failure, even though it does exist in cached slurm.conf.
I guess is the default built-in location of slurm.conf. In that case, it's used if the file exists. Why does the file exist if you want to run configless?
cheers,
Marcin
Hi Marcin, the reason why the slurm.conf file has been left in its default location is that without this file is not possible to read the server's physical configuration to put it into the config file later on. Obviously it should have been removed after installation. It looks like removing everything from /etc/slurm helped and now a submit node can use cached config files, including gres.conf. Please make sure that SchedMD documentation (https://slurm.schedmd.com/configless_slurm.html) is updated with the info that config files located in /etc/slurm/ takes precedence over the cached config files. Many thanks for your help! Cheers, Radek I believe it's documented on that page: >The order of precedence for determining what configuration source to use is as follows: > > 1.The slurmd --conf-server $host[:$port] option > 2.The -f $config_file option > 3.The SLURM_CONF environment variable (if set) > 4.The default slurm config file (likely /etc/slurm.conf) > 5.Any DNS SRV records (from lowest priority value to highest) Does that look good to you? cheers, Marcin [1]https://slurm.schedmd.com/configless_slurm.html#NOTES You're right. I probably wasn't very careful while reading this, so it's a user issue ;-) It is mentioned in an opposite direction (from lowest priority value to highest), but makes sense. Thanks again for your support. You can close the ticket. Cheers, Radek |