| Summary: | Cloud Node Reset | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Brian Christiansen <brian> |
| Component: | Cloud | Assignee: | Broderick Gardner <broderick> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 5 - Enhancement | ||
| Priority: | --- | CC: | fdm, nick, schedmd-contacts |
| Version: | 21.08.x | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | DS9 (PSLA) | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | 21.08.0pre1 | |
| Target Release: | 21.08 | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
|
Description
Brian Christiansen
2021-05-05 10:06:49 MDT
In 21.08:
-- added power_down_asap, power_down_force power down state for scontrol.
e.g. scontrol update nodename=<> state=power_down_asap
power_down - queue up the the node to be powered down when the node if free. Jobs can
continue to land on the node until it powers down.
power_down_asap - queue up the node to be powered down and put the node in a drain
state. This makes it so no more jobs are scheduled on the node and
the node will power down after the currently running jobs are done.
power_down_force - cancel jobs, requeue if possible, and power down node. This state can also be
used to cancel a powering up node and reset it back to powered down.
-- Define and separate node power state transitions. Previously a powering
down node was in both states, POWERING_OFF and POWERED_OFF. These are now
separated.
e.g.
IDLE+POWERED_OFF (IDLE~)
-> IDLE+POWERING_UP (IDLE#) - Manual power up or allocation
-> IDLE
-> IDLE+POWER_DOWN (IDLE!) - Node waiting for power down
-> IDLE+POWERING_DOWN (IDLE%) - Node powering down
-> IDLE+POWERED_OFF (IDLE~) - Powered off
-- Some node state flag names have changed. These would be noticeable for
example if using a state flag to filter nodes with sinfo.
e.g.
POWER_UP -> POWERING_UP
POWER_DOWN -> POWERED_DOWN
POWER_DOWN now represents a node pending power down
Let us know if you have any questions.
Thanks,
Brian
|