Ticket 13974

Summary: sinfo -t does not report the good nodes states
Product: Slurm Reporter: Regine Gaudin <regine.gaudin>
Component: User CommandsAssignee: Ben Roberts <ben>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 3 - Medium Impact    
Priority: ---    
Version: 20.11.8   
Hardware: Linux   
OS: Linux   
Site: CEA Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Regine Gaudin 2022-05-02 10:01:00 MDT
Since upgrade to 20.11.8, sinfo -t idle does not idle nodes only

sinfo -t idle
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
optane       up   infinite      0    n/a 
v100-hdr     up   infinite      0    n/a 
rome         up   infinite      2   resv inti[6045-6046]
rome         up   infinite     30   idle inti[6007-6016,6018-6023,6027-6028,6030-6032,6034-6036,6038,6040-6042,6044,6047]
a100         up   infinite      8   idle inti[7701-7705,7707-7709]
mi100        up   infinite      1  drain inti7600
mi100        up   infinite      1   idle inti7601
a100-q80     up   infinite      1   idle inti7651
milan-bxi    up   infinite     32   idle inti[6203-6234]
rome-bxi     up   infinite     47   idle inti[6101-6147]
a100-bxi     up   infinite      2   idle inti[7801,7803]
mi250        up   infinite      2   idle inti[7660-7661]
Comment 1 Ben Roberts 2022-05-02 10:55:16 MDT
Hi Regine,

This is the expected behavior, though I can understand the confusion.  There is a base node state and there can be additional state flags that are added to a node.  Taking DRAIN as an example, when a node is fully drained (meaning all jobs have completed) then the node is in an IDLE state, but it also has the DRAIN flag to show that it shouldn't receive any more jobs.  So when you request nodes in the IDLE state with sinfo it will show all nodes that include the IDLE state, but they may have additional flags that affect how they are displayed.  

This is due to a change in 20.11 that allows you to filter by multiple states (bug 9723).

Here's an example that may help illustrate how that works.  If I request sinfo to show me the IDLE nodes I will get ones that appear as 'idle' and 'drain'.
$ sinfo -pdebug -tidle
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      1  drain node10
debug*       up   infinite     17   idle node[01-09,11-18]


I can request only nodes that include the IDLE and DRAIN flags, which will have the same effect as when you requested just nodes with the DRAIN state.
$ sinfo -pdebug -t'idle&drain'
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      1  drain node10


You an also use the scontrol show node output to see how the state appears on the node.
$ scontrol show nodes node10  | grep State
   State=IDLE+DRAIN ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A


The multiple states were present before 20.11 as well, but the lack of ability to filter by the state flag made the information returned by sinfo behave differently.  

There was also additional functionality added in 21.08 where you can request the sinfo output in json or yaml format.  This would allow you to script something that looks at the state information and returns just the nodes that are idle with no additional state flag.  It sounds like you just upgraded to 20.11 though, so going to 21.08 may not be something you can do immediately.  

Let me know if you have any additional questions about this.

Thanks,
Ben
Comment 2 Ben Roberts 2022-05-12 08:31:15 MDT
Hi Regine,

I wanted to follow up and make sure you don't have additional questions about this.  Let me know if there is anything else I can do to help.

Thanks,
Ben
Comment 3 Ben Roberts 2022-05-19 10:45:34 MDT
Hi Regine,

I believe the information I sent about the node states should have answered your questions.  I haven't heard any follow up questions so I'll go ahead and close this ticket.  If you do have a follow up questions feel free to update the ticket and I'll respond.

Thanks,
Ben