$ sinfo --version slurm 17.11.7 One can't rely on sinfo to understand which nodes are scheduled for / in the process of rebooting: > $ sinfo -p all > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > all down infinite 55 drng lnod[0001-0006,0008,0010-0011,0013-0017,0019,0022,0024-0027,0041-0050,0052-0057,0059,0061-0074,0077-0080] > all down infinite 25 drain lnod[0007,0009,0012,0018,0020-0021,0023,0028-0040,0051,0058,0060,0075-0076] This one is, but you'd never know from sinfo: > $ scontrol show node lnod0007 > NodeName=lnod0007 Arch=x86_64 CoresPerSocket=64 > State=REBOOT+DRAIN ThreadsPerCore=4 TmpDisk=137881 Weight=1 Owner=N/A MCS_label=N/A Especially since you've already designated an abbreviation for rebooting, @. So drng@ and drain@ would seem fair, if the full status is too long. That also leads to this situation: > $ sinfo -p all > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > all down infinite 25 drng lnod[0002-0006,0010-0011,0013-0015,0017,0019,0042,0045,0050,0053-0056,0062,0065,0068,0074,0077,0079] > all down infinite 21 drain lnod[0001,0016,0022,0024-0027,0047-0049,0057,0059,0063-0064,0067,0069,0071-0073,0078,0080] > all down infinite 34 drain lnod[0007-0009,0012,0018,0020-0021,0023,0028-0041,0043-0044,0046,0051-0052,0058,0060-0061,0066,0070,0075-0076] Someone looking at this might ask why there are two "drain" lines? It's because one set of nodes are REBOOT+DRAIN, and the others are DOWN+DRAIN. Cheers, -Phil
Thanks for pointing that out. I'll look into it and get back with you. Thanks, Brian
This is fixed in the following commits: https://github.com/SchedMD/slurm/commit/bf569fef2f8928594dc87ebdf6aa0659c10479fa https://github.com/SchedMD/slurm/commit/f23411bc96be8055e3f295270c4a73709ce574b4 e.g. brian@lappy:~/slurm/17.11/lappy$ sinfo -p debug -o %N,%T,%t NODELIST,STATE,STATE lappy[1-10],idle,idle brian@lappy:~/slurm/17.11/lappy$ sbatch --wrap="sleep 600" -wlappy2 Submitted batch job 215543 brian@lappy:~/slurm/17.11/lappy$ sbatch --wrap="sleep 600" -wlappy3 Submitted batch job 215544 brian@lappy:~/slurm/17.11/lappy$ scontrol reboot lappy1 brian@lappy:~/slurm/17.11/lappy$ scontrol reboot lappy2 brian@lappy:~/slurm/17.11/lappy$ scontrol reboot asap lappy3 brian@lappy:~/slurm/17.11/lappy$ sinfo -p debug -o %N,%T,%t NODELIST,STATE,STATE lappy3,draining@,drng@ lappy2,mixed@,mix@ lappy1,reboot,boot lappy[4-10],idle,idle Thanks, Brian