| Summary: | scontrol | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Savita Thakur <thakurs> |
| Component: | User Commands | Assignee: | Jason Booth <jbooth> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 20.11.8 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Miami University Oxford Ohio | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Savita Thakur
2022-03-10 06:28:25 MST
*** Ticket 13597 has been marked as a duplicate of this ticket. *** Savita, You can in fact change the node definition on a partition with running jobs and those jobs will continue to run. This can be done by editing slurm.conf and executing a "scontrol reconfigure" or temporarily by updating the partition with scontrol. For example: $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST highPrio up 2:00:00 5 idle n[1-5] defq* up infinite 5 idle n[1-5] control up infinite 1 idle n1 debug up infinite 1 idle n1 > $ scontrol update partitionname=defq nodes=n[3-5] $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST highPrio up 2:00:00 2 mix n[1-2] highPrio up 2:00:00 3 idle n[3-5] defq* up infinite 3 idle n[3-5] control up infinite 1 mix n1 debug up infinite 1 mix n1 > $ scontrol update partitionname=control nodes=n[1-2] $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST highPrio up 2:00:00 2 mix n[1-2] highPrio up 2:00:00 3 idle n[3-5] defq* up infinite 3 idle n[3-5] control up infinite 2 mix n[1-2] debug up infinite 1 mix n1 The only caution I would offer is if you are adding or removing nodes from the slurm.conf. If that is the case then you should have a look at the following link. https://slurm.schedmd.com/faq.html#add_nodes Hello Jason, In our case we have 2 partition(test1 and test2). There are 4 nodes that I want remove from test1 partition but jobs are running on test1 partition using those nodes. Nodes are already assigned to test2 partition. So all I want to do is to remove it from test1 partition by editing slurm.conf but on doing so will I terminate the jobs running under test1 partition on those nodes. do I have to restart slurmctld as I'm removing the nodes from one partition? Thanks No, the jobs will continue to run. Slurm will not cancel the jobs even if they are no longer part of any partition. You can edit the slurm.conf and issue a scontrol reconfigure. A restart is not needed. I am resolving this issue. Please feel free to re-open if you have further questions regarding this case. |