Hello, Wanted to change the partition of few nodes. I know one way of doing it is by modifying slurm.conf or can we use scontrol command to make changes. the nodes which I'm trying to move has running jobs on them so should I put the jobs on hold and resume after changing or is there a better way to do it without affecting any jobs? After modifying my changes can I uses scontrol reconfigure to update slurm.conf without slurmctld service restart as it terminates all the other jobs running? Thanks
*** Ticket 13597 has been marked as a duplicate of this ticket. ***
Savita, You can in fact change the node definition on a partition with running jobs and those jobs will continue to run. This can be done by editing slurm.conf and executing a "scontrol reconfigure" or temporarily by updating the partition with scontrol. For example: $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST highPrio up 2:00:00 5 idle n[1-5] defq* up infinite 5 idle n[1-5] control up infinite 1 idle n1 debug up infinite 1 idle n1 > $ scontrol update partitionname=defq nodes=n[3-5] $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST highPrio up 2:00:00 2 mix n[1-2] highPrio up 2:00:00 3 idle n[3-5] defq* up infinite 3 idle n[3-5] control up infinite 1 mix n1 debug up infinite 1 mix n1 > $ scontrol update partitionname=control nodes=n[1-2] $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST highPrio up 2:00:00 2 mix n[1-2] highPrio up 2:00:00 3 idle n[3-5] defq* up infinite 3 idle n[3-5] control up infinite 2 mix n[1-2] debug up infinite 1 mix n1 The only caution I would offer is if you are adding or removing nodes from the slurm.conf. If that is the case then you should have a look at the following link. https://slurm.schedmd.com/faq.html#add_nodes
Hello Jason, In our case we have 2 partition(test1 and test2). There are 4 nodes that I want remove from test1 partition but jobs are running on test1 partition using those nodes. Nodes are already assigned to test2 partition. So all I want to do is to remove it from test1 partition by editing slurm.conf but on doing so will I terminate the jobs running under test1 partition on those nodes. do I have to restart slurmctld as I'm removing the nodes from one partition? Thanks
No, the jobs will continue to run. Slurm will not cancel the jobs even if they are no longer part of any partition. You can edit the slurm.conf and issue a scontrol reconfigure. A restart is not needed.
I am resolving this issue. Please feel free to re-open if you have further questions regarding this case.