Ticket 13598

Summary: scontrol
Product: Slurm Reporter: Savita Thakur <thakurs>
Component: User CommandsAssignee: Jason Booth <jbooth>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 20.11.8   
Hardware: Linux   
OS: Linux   
Site: Miami University Oxford Ohio Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Savita Thakur 2022-03-10 06:28:25 MST
Hello,

Wanted to change the partition of few nodes. I know one way of doing it is by modifying slurm.conf or can we use scontrol command to make changes. 

the nodes which I'm trying to move has running jobs on them so should I put the jobs on hold and resume after changing or is there a better way to do it without affecting any jobs?     

After modifying my changes can I uses scontrol reconfigure to update slurm.conf without slurmctld service restart as it terminates all the other jobs running?

Thanks
Comment 1 Savita Thakur 2022-03-10 06:31:52 MST
*** Ticket 13597 has been marked as a duplicate of this ticket. ***
Comment 2 Jason Booth 2022-03-10 10:23:32 MST
Savita,


You can in fact change the node definition on a partition with running jobs and those jobs will continue to run. This can be done by editing slurm.conf and executing a "scontrol reconfigure" or temporarily by updating the partition with scontrol.

For example:

$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
highPrio     up    2:00:00      5   idle n[1-5]
defq*        up   infinite      5   idle n[1-5]
control      up   infinite      1   idle n1
debug        up   infinite      1   idle n1



> $ scontrol update partitionname=defq nodes=n[3-5]

$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
highPrio     up    2:00:00      2    mix n[1-2]
highPrio     up    2:00:00      3   idle n[3-5]
defq*        up   infinite      3   idle n[3-5]
control      up   infinite      1    mix n1
debug        up   infinite      1    mix n1

> $ scontrol update partitionname=control nodes=n[1-2]

$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
highPrio     up    2:00:00      2    mix n[1-2]
highPrio     up    2:00:00      3   idle n[3-5]
defq*        up   infinite      3   idle n[3-5]
control      up   infinite      2    mix n[1-2]
debug        up   infinite      1    mix n1



The only caution I would offer is if you are adding or removing nodes from the slurm.conf. If that is the case then you should have a look at the following link.

https://slurm.schedmd.com/faq.html#add_nodes
Comment 3 Savita Thakur 2022-03-10 10:47:48 MST
Hello Jason,

In our case we have 2 partition(test1 and test2). 
There are 4 nodes that I want remove from test1 partition but jobs are running on test1 partition using those nodes. 
Nodes are already assigned to test2 partition.
So all I want to do is to remove it from test1 partition by editing slurm.conf but on doing so will I terminate the jobs running under test1 partition on those nodes.

do I have to restart slurmctld as I'm removing the nodes from one partition?
 
Thanks
Comment 4 Jason Booth 2022-03-10 11:01:36 MST
No, the jobs will continue to run. Slurm will not cancel the jobs even if they are no longer part of any partition.

You can edit the slurm.conf and issue a scontrol reconfigure. A restart is not needed.
Comment 5 Jason Booth 2022-03-14 12:14:26 MDT
I am resolving this issue. Please feel free to re-open if you have further questions regarding this case.