Ticket 21428

Summary: Make sinfo -t distinguish between node states idle (responding) and idle~ (powered_down)
Product: Slurm Reporter: Ole.H.Nielsen <Ole.H.Nielsen>
Component: User CommandsAssignee: Marcin Stolarek <cinek>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: brian, tim
Version: 24.05.4   
Hardware: Linux   
OS: Linux   
See Also: https://support.schedmd.com/show_bug.cgi?id=22307
https://support.schedmd.com/show_bug.cgi?id=22585
Site: DTU Physics Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed: 25.05.0rc1
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Ticket Depends on: 22307    
Ticket Blocks:    

Description Ole.H.Nielsen@fysik.dtu.dk 2024-11-15 02:15:34 MST
It would be useful if "sinfo -t <state>" could distinguish between node states idle (responding) and idle~ (powered_down).  The flag --responding might be interpreted as providing this distinction, but it doesn't work as expected.

For example, we currently have two nodes x006,s007 that are in a powered_down (idle~) state:

$ sinfo -t idle
PARTITION      AVAIL  TIMELIMIT  NODES  STATE NODELIST
xeon24el8_test    up      30:00      1  idle~ x006
...
sm3090el8         up 7-00:00:00      1  idle~ s007
sm3090el8         up 7-00:00:00      1  drain s005
...

However, the --responding flag doesn't exclude the idle~ nodes as I would expect:

$ sinfo -h -o "%N" -t idle
s[005,007],x006
$ sinfo -h -o "%N" --responding -t idle
s[005,007],x006

The reason why we'd like this to work is that we use ClusterShell[1] to run commands across nodes with a given state, for example:

$ clush -bw@slurmstate:idle uname -r
x006: ssh: connect to host x006 port 22: No route to host
s007: ssh: connect to host s007 port 22: No route to host
---------------
s005
---------------
4.18.0-553.27.1.el8_10.x86_64
clush: s007,x006 (2): exited with exit code 255

We would like to exclude the idle~ nodes from the "clush" command.

Question: Why does "sinfo --responding" include nodes that are powered_down?  Is this a bug that should be fixed?

Thanks,
Ole

[1] https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_operations/#clustershell
Comment 1 Marcin Stolarek 2024-11-20 05:49:14 MST
Ole,

On one hand lexically powered down node won't respond, on the other hand "responding" and "powered down" are separate bit flags for nodes.
I understand your use case, however, I'm hesitatnt in attempting to change the behavior, since it isn't a clear bug and .

I'll discuss how we can approach that with our CTO and let you know.

cheers,
Marcin
Comment 2 Ole.H.Nielsen@fysik.dtu.dk 2024-11-20 05:49:27 MST
I'm out of the office, back on November 25.
Jeg er ikke på kontoret, tilbage igen 25. november.

Best regards / Venlig hilsen,
Ole Holm Nielsen
Comment 4 Ole.H.Nielsen@fysik.dtu.dk 2024-12-11 00:55:07 MST
Hi Marcin,

(In reply to Marcin Stolarek from comment #1)
> On one hand lexically powered down node won't respond, on the other hand
> "responding" and "powered down" are separate bit flags for nodes.
> I understand your use case, however, I'm hesitatnt in attempting to change
> the behavior, since it isn't a clear bug and .
> 
> I'll discuss how we can approach that with our CTO and let you know.

Can I ask if you've had any progress in resolving this issue?  I think it would be good to fix the sinfo command as requested.

Thanks,
Ole
Comment 5 Marcin Stolarek 2024-12-11 05:39:53 MST
Ole,

I'll let you know when we have a decision on the approach here. I know it is a while, but we had SC and next Slurm release in the mean time, which usually means busy time for our senior staff.

cheers,
Marcin
Comment 7 Ole.H.Nielsen@fysik.dtu.dk 2025-01-09 02:26:11 MST
Can I ask if you've had any progress in resolving this issue?  I think it would be good to fix the sinfo command as requested.

Thanks,
Ole
Comment 8 Jason Booth 2025-01-09 09:15:42 MST
Marcin is currently out of the office this week however he is currently 
investigating a few ways to approach the issue. I will have him update you first 
thing next week when he is back in the office.
Comment 10 Marcin Stolarek 2025-01-13 04:57:50 MST
Ole,

I'm sorry it takes so much time. This is one of the cases where actual implamentation probably won't be very complicated, but we want to make sure we're moving into right direction.

I'm in touch with our CTO on that, we have to discuss the approach to make sure we're not introducing a breaking change in behavior or if we're not solving only a specific case of a more general issue.

I'll keep you posted on that.

cheers,
Marcin
Comment 11 Ole.H.Nielsen@fysik.dtu.dk 2025-01-14 01:44:26 MST
Hi Marcin,

(In reply to Marcin Stolarek from comment #10)
> I'm sorry it takes so much time. This is one of the cases where actual
> implamentation probably won't be very complicated, but we want to make sure
> we're moving into right direction.
> 
> I'm in touch with our CTO on that, we have to discuss the approach to make
> sure we're not introducing a breaking change in behavior or if we're not
> solving only a specific case of a more general issue.

At SC24 I had a chat with Skyler and Tim Mullen about this issue, and the comment was that it may be an oversight in the power saving plugin.  This sounds quite plausible to me.

IHTH,
Ole
Comment 15 Marcin Stolarek 2025-01-24 01:24:52 MST
Ole,

I know it took us a while, but our conclusion is to add the ability to negate the state in the list given to `-t`, so your specific case can be handled by the command like:
>sinfo -t "idle,~powered_down".

I'll take care of the development and will let you know when the improvement is merged.

cheers,
Marcin
Comment 16 Ole.H.Nielsen@fysik.dtu.dk 2025-01-24 01:32:02 MST
Hi Marcin,

(In reply to Marcin Stolarek from comment #15)
> I know it took us a while, but our conclusion is to add the ability to
> negate the state in the list given to `-t`, so your specific case can be
> handled by the command like:
> >sinfo -t "idle,~powered_down".
> 
> I'll take care of the development and will let you know when the improvement
> is merged.

That sounds like an excellent solution!  It still leaves the question of what is the supposed function of the flag "--responding"?  I can't make sense of it.

The ~powered_down seems to be a new node state which isn't available at present.  Could you make sure that it gets added also to 24.05 (which we run at this time) and maybe also to 23.11?

Thanks,
Ole
Comment 17 Marcin Stolarek 2025-01-24 06:35:39 MST
Node states in Slurm are quite complex topic. It's not that easy to plot a state diagram for those, since many of them are more "flags". In these terms, "NO_RESPOND"[1] bit isn't set for powered down nodes. I understand that from the language perspective it seems unnatural to call powered down computer responding, however, from the code perspective it's simply a boolen value used for decisions.

The way --responding is implemented directly relies on that value[2]. This code is very old, and we generally avoid behavior changes if they aren't a clear bug - it's hard to determine if changing it won't break scripts other people (like you:) developed over years.

I can't commit to anything in terms of the version the changes will be included. We only do that for paid developments. Since it's a new feature, it will be targeted to master, so the earliest possible is 25.05.

cheers,
Marcin
[1]https://github.com/SchedMD/slurm/blob/slurm-24-11-1-1/slurm/slurm.h#L1014
[2]https://github.com/SchedMD/slurm/blob/slurm-24.11/src/sinfo/sinfo.c#L677-L678
Comment 19 Marcin Stolarek 2025-02-24 04:29:02 MST
Ole,

The discussed changes were merged to our master branch (commits: 89e67427..f9e99750) and are going to be released as part of the next Slurm major release (version 25.05).

Are you able to build Slurm from master branch (in test environment) to verify if it allows desired selection of nodes?

The manual of sinfo with the new features documented looks like:
>-t, --states=<states>
>   List nodes only having the given state(s). Multiple states may be comma separated and the comparison is case insensitive.  If the states are separated by '+', then the nodes must be in
>   all states.  The state can be prefixed with '~' which will invert the result of match.  Possible values include (case insensitive): ALLOC, ALLOCATED, BLOCKED, CLOUD, COMP,  COMPLETING,
>   DOWN,  DRAIN  (for  node  in DRAINING or DRAINED states), DRAINED, DRAINING, FAIL, FUTURE, FUTR, IDLE, MAINT, MIX, MIXED, NO_RESPOND, NPC, PERFCTRS, PLANNED, POWER_DOWN, POWERING_DOWN,
>   POWERED_DOWN, POWERING_UP, REBOOT_ISSUED, REBOOT_REQUESTED, RESV, RESERVED, UNK, and UNKNOWN.  By default nodes in the specified state are reported whether they are responding or  not.
>   The --dead and --responding options may be used to filter nodes by the corresponding flag.

Since the changes are limited to the sinfo command, it should be relatively safe to backport them to an older Slurm version locally, if needed."

cheers,
Marcin
Comment 20 Ole.H.Nielsen@fysik.dtu.dk 2025-02-24 05:10:41 MST
Hi Marcin,

Thanks a lot for the update:

(In reply to Marcin Stolarek from comment #19)
> The discussed changes were merged to our master branch (commits:
> 89e67427..f9e99750) and are going to be released as part of the next Slurm
> major release (version 25.05).
> 
> Are you able to build Slurm from master branch (in test environment) to
> verify if it allows desired selection of nodes?

We unfortunately don't have a test environment for trying out the master branch.

> The manual of sinfo with the new features documented looks like:
> >-t, --states=<states>
> >   List nodes only having the given state(s). Multiple states may be comma separated and the comparison is case insensitive.  If the states are separated by '+', then the nodes must be in
> >   all states.  The state can be prefixed with '~' which will invert the result of match.  Possible values include (case insensitive): ALLOC, ALLOCATED, BLOCKED, CLOUD, COMP,  COMPLETING,
> >   DOWN,  DRAIN  (for  node  in DRAINING or DRAINED states), DRAINED, DRAINING, FAIL, FUTURE, FUTR, IDLE, MAINT, MIX, MIXED, NO_RESPOND, NPC, PERFCTRS, PLANNED, POWER_DOWN, POWERING_DOWN,
> >   POWERED_DOWN, POWERING_UP, REBOOT_ISSUED, REBOOT_REQUESTED, RESV, RESERVED, UNK, and UNKNOWN.  By default nodes in the specified state are reported whether they are responding or  not.

The new prefix '~' will become useful, but people have to discover that it exists.

> >   The --dead and --responding options may be used to filter nodes by the corresponding flag.

The meaning of --responding and --dead is still completely illogical to me!  There exists no hint as to the meaning of --responding.  Can you perhaps expand the documentation to explain what --responding does, and why POWERED_DOWN nodes seem to be "responding"?

> Since the changes are limited to the sinfo command, it should be relatively
> safe to backport them to an older Slurm version locally, if needed."

Thanks, but for simplicity I prefer to wait for 25.05 later this year.

Best regards,
Ole
Comment 21 Marcin Stolarek 2025-02-24 07:07:05 MST
>We unfortunately don't have a test environment for trying out the master branch.
Understood. Just in case you don't know it's possible to run a whole Slurm "cluster" on a single PC building it with configure option --enable-multiple-slurmd.

>The meaning of --responding and --dead is still completely illogical to me! [...]--responding does, and why POWERED_DOWN nodes seem to be "responding"?
I understand your frustration with the --responding and --dead options. Let me try to clarify --responding first. The option is defined as the node wasn't marked as not responding by slurmctld. Slurmctld marks nodes "not responding" when it expects the slurmd to reply and it does not.
Following that with a `--dead` option, it means that the node was assigned NO_RESPONDING flag. 
It's a little bit like a nondual logic, so the --responding option is really ~(NO_RESPONDING) which isn't fully justified.

A human world analogy may be that when person lies down and doesn't respond she isn't necessarily dead, but may be.

I'll check with our "Docs team" on how to make it more clear in the man of sinfo.

cheers,
Marcin
Comment 22 Marcin Stolarek 2025-05-22 02:43:21 MDT
Ole, 

We've merged an improvement to the docs: 2445007c[1] that should cover the remaining issue. I hope you'll find it more appropriate.

Is there anything else I can help you with in the ticket?

cheers,
Marcin
[1]https://github.com/SchedMD/slurm/commit/2445007c681fc7013fc5f0621f4bd2c5e3c07f7c
Comment 23 Marcin Stolarek 2025-06-04 06:29:41 MDT
Please let me know if you've had a chance to review my last message.
If the issue is now resolved or if you no longer need assistance, please let me know as well so I can close the ticket.

cheers,
Marcin
Comment 24 Ole.H.Nielsen@fysik.dtu.dk 2025-06-04 06:29:54 MDT
I'm out of the office, back on June 6.
Jeg er ikke på kontoret, tilbage igen 6. juni.

Best regards / Venlig hilsen,
Ole Holm Nielsen
Comment 25 Ole.H.Nielsen@fysik.dtu.dk 2025-06-04 10:52:33 MDT
Dear Marcin,

(In reply to Marcin Stolarek from comment #23)
> Please let me know if you've had a chance to review my last message.
> If the issue is now resolved or if you no longer need assistance, please let
> me know as well so I can close the ticket.

Thanks very much for making this solution!  We look forward to testing 25.05 soon.  Please close this ticket.

Best regards,
Ole
Comment 26 Marcin Stolarek 2025-06-09 01:37:53 MDT
Thanks for the confirmation.