Hi SLURM Support This is not a bug request, but of an enquiry, very low priority. We are experimenting with this parameter 'RebootProgram' in our slurm.conf, to try this reboot function in slurm.conf As inside slurm.conf: ----------------- RebootProgram = "/sbin/shutdown -r now" ----------------- Is this correct ? How can we test this ? ( scontrol node test-hostname reboot ), Is there any other factors which we should watch of ? Cheers Damien
From the slurm.conf man page: RebootProgram Program to be executed on each compute node to reboot it. Invoked on each node once it becomes idle after the command "scontrol reboot_nodes" is executed by an authorized user or a job is submitted with the "--reboot" option. After being rebooting, the node is returned to normal use. NOTE: This con‐ figuration option does not apply to IBM BlueGene systems. From the scontrol man page: reboot_nodes [NodeList] Reboot all nodes in the system when they become idle using the RebootProgram as configured in Slurm's slurm.conf file. Accepts an option list of nodes to reboot. By default all nodes are rebooted. NOTE: This command does not prevent additional jobs from being scheduled on these nodes, so many jobs can be exe‐ cuted on the nodes prior to them being rebooted. You can explic‐ itly drain the nodes in order to reboot nodes as soon as possi‐ ble, but the nodes must also explicitly be returned to service after being rebooted. You can alternately create an advanced reservation to prevent additional jobs from being initiated on nodes to be rebooted. NOTE: Nodes will be placed in a state of "MAINT" until rebooted and returned to service with a normal state. Alternately the node's state "MAINT" may be cleared by using the scontrol command to set the node state to "RESUME", which clears the "MAINT" flag. One friendly reminder - SchedMD's Slurm support is offered as "level 3" support only; we expect sites to be comfortable looking through the documentation (web pages at http://slurm.schedmd.com and the various man pages) and testing out most operational changes before turning to us for help. - Tim
Hi Tim Thanks for this reminder. I apologise for this. I am formerly from the corporate IT world. (Hope that explains) We will do more due diligence on our own, before approaching SchedMD support. Thanks for helping us. Cheers Damien