Ticket 14983 - Question about compute node hostname changes
Summary: Question about compute node hostname changes
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 22.05.3
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Marcin Stolarek
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-09-16 08:37 MDT by David Gloe
Modified: 2022-09-26 06:23 MDT (History)
0 users

See Also:
Site: CRAY
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: Cray Internal
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description David Gloe 2022-09-16 08:37:36 MDT
We're looking into a new procedure to fill gaps in the compute node hostname assignments.
For example, if we have compute nodes nid000001, nid000002, nid000004, and nid000005, the procedure would rename nid000004 to nid000003 and nid000005 to nid000004.

We're wondering what would need to be done in Slurm to handle this change.
Do you just need to update slurm.conf and restart slurmctld? Or is there state in the spool directory that needs to be cleared manually?
Comment 1 Marcin Stolarek 2022-09-19 23:02:25 MDT
David,

For sure all slurmd's should be restarted too after the change. In general I'd recommend following our two FAQ answers:
1) What process should I follow to remove nodes from Slurm?[1]
2) What process should I follow to add nodes to Slurm?[2]

first remove the nodes that are going to be renamed and then add those nodes with new names.

What is the command you use to start slurmctld - I'm interested in the command line options used in the systemd unit file (or alternative).

cheers,
Marcin
[1]https://slurm.schedmd.com/faq.html#rem_nodes
[2]https://slurm.schedmd.com/faq.html#add_nodes
Comment 2 David Gloe 2022-09-22 09:36:40 MDT
The compute nodes are all rebooted during this change, so the slurmds will be restarted.

We start slurmctld with /usr/sbin/slurmctld -D
Comment 3 Marcin Stolarek 2022-09-26 05:55:07 MDT
This should work just fine. However, I'd recommend doing this in two steps like in our FAQ. In first step remove all nodes that are going to be renamed and then add those nodes under new names.

Let me know if the procedure is clear for you.

cheers,
Marcin
Comment 4 David Gloe 2022-09-26 06:19:25 MDT
Yes, that's clear, thank you.