| Summary: | ordering of hosts in slurm.conf with --enable-multiple-slurmd | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Michael Gutteridge <mrg> |
| Component: | slurmctld | Assignee: | David Bigagli <david> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | brian, da |
| Version: | 15.08.x | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | FHCRC - Fred Hutchinson Cancer Research Center | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: | slurm.conf with misordered nodes | ||
|
Description
Michael Gutteridge
2015-03-09 11:59:45 MDT
Hi could you please attach your slurm.conf that shows the problem? Thanks, David Created attachment 1701 [details]
slurm.conf with misordered nodes
Ok- attached... when I start slurmctld with this: mrg@slapshot[~]: sudo /usr/sbin/slurmctld -D slurmctld: slurmctld version 15.08.0-0pre2 started on cluster slapshot slurmctld: layouts: no layout to initialize slurmctld: error: Reconfiguration for node puck1, ignoring! slurmctld: _parse_part_spec: changing default partition from slapshot to campus slurmctld: layouts: loading entities/relations information slurmctld: error: find_node_record: lookup failure for puck1 slurmctld: Recovered state of 406 nodes slurmctld: Recovered information about 0 jobs slurmctld: error: find_node_record: lookup failure for puck1 slurmctld: error: node_name2bitmap: invalid node specified puck1 slurmctld: fatal: Invalid node names in partition slapshot Thanks M Hi, this was fixed in the commit ce32018a28d6b7a. It is available in 15.08.0pre3. If you update your code to the latest version you will get the fix. David We discovered the issue still exist in the latest code. Sorry for the confusion. The solution is to use NodeAddr instead of NodeHostName. Thanks, David |