We're looking into a new procedure to fill gaps in the compute node hostname assignments. For example, if we have compute nodes nid000001, nid000002, nid000004, and nid000005, the procedure would rename nid000004 to nid000003 and nid000005 to nid000004. We're wondering what would need to be done in Slurm to handle this change. Do you just need to update slurm.conf and restart slurmctld? Or is there state in the spool directory that needs to be cleared manually?
David, For sure all slurmd's should be restarted too after the change. In general I'd recommend following our two FAQ answers: 1) What process should I follow to remove nodes from Slurm?[1] 2) What process should I follow to add nodes to Slurm?[2] first remove the nodes that are going to be renamed and then add those nodes with new names. What is the command you use to start slurmctld - I'm interested in the command line options used in the systemd unit file (or alternative). cheers, Marcin [1]https://slurm.schedmd.com/faq.html#rem_nodes [2]https://slurm.schedmd.com/faq.html#add_nodes
The compute nodes are all rebooted during this change, so the slurmds will be restarted. We start slurmctld with /usr/sbin/slurmctld -D
This should work just fine. However, I'd recommend doing this in two steps like in our FAQ. In first step remove all nodes that are going to be renamed and then add those nodes under new names. Let me know if the procedure is clear for you. cheers, Marcin
Yes, that's clear, thank you.