Ticket 10409 - Adding Removing nodes will it require rebooting slurm?
Summary: Adding Removing nodes will it require rebooting slurm?
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 21.08.x
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Jason Booth
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2020-12-09 10:18 MST by Erik Lönroth
Modified: 2021-01-05 10:55 MST (History)
0 users

See Also:
Site: Scania
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
3ADD6F1CA11540488575802DF5F8D07D.png (136 bytes, image/png)
2021-01-04 03:06 MST, Erik Lönroth
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Erik Lönroth 2020-12-09 10:18:29 MST
So, whenever we want to add or remove nodes from a running slurm-cluster, we have seen that slurm lose information about all running jobs.

We are not certain that adding or removing nodes should behave lite this. 

We might have done something wrong here, but adding and/or removing nodes feels shouldn't be a downtime issue.

Looking for some clarity here as we also trying to use this now in a more cloud like scenario where nodes will come and go more often than in a more static environment.

Thanx!
Comment 1 Jason Booth 2020-12-09 13:43:46 MST
https://slurm.schedmd.com/faq.html#add_nodes

What process should I follow to add nodes to Slurm?
The slurmctld daemon has a multitude of bitmaps to track state of nodes and cores in the system. Adding nodes to a running system would require the slurmctld daemon to rebuild all of those bitmaps, which the developers feel would be safer to do by restarting the daemon. Communications from the slurmd daemons on the compute nodes to the slurmctld daemon include a configuration file checksum, so you probably also want to maintain a common slurm.conf file on all nodes. The following procedure is recommended:

1. Stop the slurmctld daemon (e.g. "systemctl stop slurmctld" on the head node)
2. Update the slurm.conf file on all nodes in the cluster
3. Restart the slurmd daemons on all nodes (e.g. "systemctl restart slurmd" on all nodes)
4. Restart the slurmctld daemon (e.g. "systemctl start slurmctld" on the head node)

Cloud operates a bit differently using a CLOUD state.

https://slurm.schedmd.com/elastic_computing.html


It is also worth mentioning the FUTURE state. 

https://slurm.schedmd.com/slurm.conf.html+OPT_State

> FUTURE

Indicates the node is defined for future use and need not exist when the Slurm daemons are started. These nodes can be made available for use simply by updating the node state using the scontrol command rather than restarting the slurmctld daemon. After these nodes are made available, change their State in the slurm.conf file. Until these nodes are made available, they will not be seen using any Slurm commands or nor will any attempt be made to contact them.


> Dynamic Future Nodes

A slurmd started with -F[<feature>] will be associated with a FUTURE node that matches the same configuration (sockets, cores, threads) as reported by slurmd -C. The node's NodeAddr and NodeHostname will automatically be retrieved from the slurmd and will be cleared when set back to the FUTURE state. Dynamic FUTURE nodes retain non-FUTURE state on restart. Use scontrol to put node back into FUTURE state.

If the mapping of the NodeName to the slurmd HostName is not updated in DNS, Dynamic Future nodes won't know how to communicate with each other -- because NodeAddr and NodeHostName are not defined in the slurm.conf -- and the fanout communications need to be disabled by setting TreeWidth to a high number (e.g. 65533). If the DNS mapping is made, then the cloud_dns SlurmctldParameter can be used.
Comment 2 Erik Lönroth 2020-12-11 06:44:06 MST
Thanx for the pointer, but I don't fully understand.

A) Is Slurm losing information about the queued and running jobs as part of a "systemctl restart slurmctld" ?

B) Is running jobs terminated as part of a "systemctl restart slurmd" ?

C) Is slurm[ctl,d] resilient to a "systemctl restart slurmdbd" ?

D) Is slurmdbd resillient to a "systemctl restart mysql" ?

E) If we add a node to slurm.conf - will that need a "systemctl restart slurctld" ?

E) If we add a node to slurm.conf - will that need a "systemctl restart slurmd" ?


..... Since we add and remove nodes alot (cloud instances) this matters alot to us. We do not predefine nodes and have then on a stand-by, but rather they need to be fully added (and removed) as part of coming and going.

If Slurm require a reboot for this activity, I think that we might be interested in knowing what effort is needed to be able to more transparently/dynamically (whithout slurm restarts) add and remove hosts.
Comment 3 Erik Lönroth 2020-12-11 06:45:25 MST
Thanx for the pointer, but I don't fully understand.

A) Is Slurm losing information about the queued and running jobs as part of a "systemctl restart slurmctld" ?

B) Is running jobs terminated as part of a "systemctl restart slurmd" ?

C) Is slurm[ctl,d] resilient to a "systemctl restart slurmdbd" ?

D) Is slurmdbd resillient to a "systemctl restart mysql" ?

E) If we add a node to slurm.conf - will that need a "systemctl restart slurctld" ?

E) If we add a node to slurm.conf - will that need a "systemctl restart slurmd" ?


..... Since we add and remove nodes alot (cloud instances) this matters alot to us. We do not predefine nodes and have then on a stand-by, but rather they need to be fully added (and removed) as part of coming and going.

If Slurm require a reboot for this activity, I think that we might be interested in knowing what effort is needed to be able to more transparently/dynamically (whithout slurm restarts) add and remove hosts.
Comment 4 Jason Booth 2020-12-11 11:38:03 MST
Slum uses the node definition in the slurm.conf to build out its view of the network and where to send messages. This is stored in an internal bitmap to Slurm so modifying the nodes requires a restart of the slurmctld and the slurmd processes. You could use "Dynamic Future Nodes" however that would still require your predefining nodes.

https://slurm.schedmd.com/slurm.conf.html#OPT_Dynamic-Future-Nodes


> A) Is Slurm losing information about the queued and running jobs as part of a "systemctl restart slurmctld" ?
As long as the slurm.conf is the same cluster wide and the restart happens in the order mentioned previously then running jobs should be fine.


> B) Is running jobs terminated as part of a "systemctl restart slurmd" ?

Jobs should not be canceled unless the procedure to add nodes was not followed.


> C) Is slurm[ctl,d] resilient to a "systemctl restart slurmdbd" ?
Yes

> D) Is slurmdbd resillient to a "systemctl restart mysql" ?
Yes

> E) If we add a node to slurm.conf - will that need a "systemctl restart slurctld" ?
Yes, please see the proceedure mentioned previously.

1. Stop the slurmctld daemon (e.g. "systemctl stop slurmctld" on the head node)
2. Update the slurm.conf file on all nodes in the cluster
3. Restart the slurmd daemons on all nodes (e.g. "systemctl restart slurmd" on all nodes)
4. Restart the slurmctld daemon (e.g. "systemctl start slurmctld" on the head node)

> E) If we add a node to slurm.conf - will that need a "systemctl restart slurmd" ?

Yes please see the response above and in comment #1


>..... Since we add and remove nodes alot (cloud instances) this matters alot to us. We do not predefine nodes and have then on a stand-by, but rather they need to be fully added (and removed) as part of coming and going.
> If Slurm require a reboot for this activity, I think that we might be interested in knowing what effort is needed to be able to more transparently/dynamically (whithout slurm restarts) add and remove hosts.

I could have your account representative reach out to you about having your site sponsor development for this. Let me know and I will begin that process.
Comment 5 Jason Booth 2020-12-21 15:27:25 MST
It has been some time since I have heard from you. I am marking this bug as info given since I gave a reply to your questions in comment #4. If you have any follow-up question then please feel free to re-open this issue.
Comment 6 Erik Lönroth 2021-01-04 03:06:55 MST
Created attachment 17317 [details]
3ADD6F1CA11540488575802DF5F8D07D.png

Hello!

This is a great clarification.

Some information seems unrelated to slurm.conf to me from what I have seen.

Consider me changing default partition via scontrol

How would slurm maintain the information about what is the current default partition?

Would I need to have the ”KeepPartState=True” for a restart to maintain any ”online” Changes?

Erik Lönroth

Tech Lead | IUBA - HPC | Scania CV AB
Open Source Officer
Phone: +46 8 553  527 21
Scania CV AB, 151 87 Södertälje, Sweden
erik.lonroth@scania.com<mailto:maria.wiik@scania.com>

Från: bugs@schedmd.com<mailto:bugs@schedmd.com>
Skickat: den 11 december 2020 19:38
Till: Lönroth Erik<mailto:erik.lonroth@scania.com>
Ämne: [Bug 10409] Adding Removing nodes will it require rebooting slurm?

Comment # 4<https://bugs.schedmd.com/show_bug.cgi?id=10409#c4> on bug 10409<https://bugs.schedmd.com/show_bug.cgi?id=10409> from Jason Booth<mailto:jbooth@schedmd.com>

Slum uses the node definition in the slurm.conf to build out its view of the

network and where to send messages. This is stored in an internal bitmap to

Slurm so modifying the nodes requires a restart of the slurmctld and the slurmd

processes. You could use "Dynamic Future Nodes" however that would still

require your predefining nodes.



https://slurm.schedmd.com/slurm.conf.html#OPT_Dynamic-Future-Nodes





> A) Is Slurm losing information about the queued and running jobs as part of a "systemctl restart slurmctld" ?

As long as the slurm.conf is the same cluster wide and the restart happens in

the order mentioned previously then running jobs should be fine.





> B) Is running jobs terminated as part of a "systemctl restart slurmd" ?



Jobs should not be canceled unless the procedure to add nodes was not followed.





> C) Is slurm[ctl,d] resilient to a "systemctl restart slurmdbd" ?

Yes



> D) Is slurmdbd resillient to a "systemctl restart mysql" ?

Yes



> E) If we add a node to slurm.conf - will that need a "systemctl restart slurctld" ?

Yes, please see the proceedure mentioned previously.



1. Stop the slurmctld daemon (e.g. "systemctl stop slurmctld" on the head node)

2. Update the slurm.conf file on all nodes in the cluster

3. Restart the slurmd daemons on all nodes (e.g. "systemctl restart slurmd" on

all nodes)

4. Restart the slurmctld daemon (e.g. "systemctl start slurmctld" on the head

node)



> E) If we add a node to slurm.conf - will that need a "systemctl restart slurmd" ?



Yes please see the response above and in comment #1<show_bug.cgi?id=10409#c1>





>..... Since we add and remove nodes alot (cloud instances) this matters alot to us. We do not predefine nodes and have then on a stand-by, but rather they need to be fully added (and removed) as part of coming and going.

> If Slurm require a reboot for this activity, I think that we might be interested in knowing what effort is needed to be able to more transparently/dynamically (whithout slurm restarts) add and remove hosts.



I could have your account representative reach out to you about having your

site sponsor development for this. Let me know and I will begin that process.

You are receiving this mail because:

  *   You reported the bug.
Comment 7 Jason Booth 2021-01-05 10:55:06 MST
> Some information seems unrelated to slurm.conf to me from what I have seen.
>
>Consider me changing default partition via scontrol
>
>How would slurm maintain the information about what is the current default partition?
>
>Would I need to have the ”KeepPartState=True” for a restart to maintain any ”online” Changes?

This is correct. Any changes to the partition via scontrol would be lost on restart. You would have to use the KeepPartState/KeepPartInfo to preserver your changes.