Since upgrade to 20.11.8, sinfo -t idle does not idle nodes only sinfo -t idle PARTITION AVAIL TIMELIMIT NODES STATE NODELIST optane up infinite 0 n/a v100-hdr up infinite 0 n/a rome up infinite 2 resv inti[6045-6046] rome up infinite 30 idle inti[6007-6016,6018-6023,6027-6028,6030-6032,6034-6036,6038,6040-6042,6044,6047] a100 up infinite 8 idle inti[7701-7705,7707-7709] mi100 up infinite 1 drain inti7600 mi100 up infinite 1 idle inti7601 a100-q80 up infinite 1 idle inti7651 milan-bxi up infinite 32 idle inti[6203-6234] rome-bxi up infinite 47 idle inti[6101-6147] a100-bxi up infinite 2 idle inti[7801,7803] mi250 up infinite 2 idle inti[7660-7661]
Hi Regine, This is the expected behavior, though I can understand the confusion. There is a base node state and there can be additional state flags that are added to a node. Taking DRAIN as an example, when a node is fully drained (meaning all jobs have completed) then the node is in an IDLE state, but it also has the DRAIN flag to show that it shouldn't receive any more jobs. So when you request nodes in the IDLE state with sinfo it will show all nodes that include the IDLE state, but they may have additional flags that affect how they are displayed. This is due to a change in 20.11 that allows you to filter by multiple states (bug 9723). Here's an example that may help illustrate how that works. If I request sinfo to show me the IDLE nodes I will get ones that appear as 'idle' and 'drain'. $ sinfo -pdebug -tidle PARTITION AVAIL TIMELIMIT NODES STATE NODELIST debug* up infinite 1 drain node10 debug* up infinite 17 idle node[01-09,11-18] I can request only nodes that include the IDLE and DRAIN flags, which will have the same effect as when you requested just nodes with the DRAIN state. $ sinfo -pdebug -t'idle&drain' PARTITION AVAIL TIMELIMIT NODES STATE NODELIST debug* up infinite 1 drain node10 You an also use the scontrol show node output to see how the state appears on the node. $ scontrol show nodes node10 | grep State State=IDLE+DRAIN ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A The multiple states were present before 20.11 as well, but the lack of ability to filter by the state flag made the information returned by sinfo behave differently. There was also additional functionality added in 21.08 where you can request the sinfo output in json or yaml format. This would allow you to script something that looks at the state information and returns just the nodes that are idle with no additional state flag. It sounds like you just upgraded to 20.11 though, so going to 21.08 may not be something you can do immediately. Let me know if you have any additional questions about this. Thanks, Ben
Hi Regine, I wanted to follow up and make sure you don't have additional questions about this. Let me know if there is anything else I can do to help. Thanks, Ben
Hi Regine, I believe the information I sent about the node states should have answered your questions. I haven't heard any follow up questions so I'll go ahead and close this ticket. If you do have a follow up questions feel free to update the ticket and I'll respond. Thanks, Ben