Since we enabled to Slurm power_save module for powering down idle on-premise nodes, we have noticed that "scontrol show node" shows CurrentWatts power and CPULoad greater than zero for nodes that are actually powered off, for example: $ scontrol show node s007 NodeName=s007 Arch=x86_64 CoresPerSocket=10 CPUAlloc=0 CPUEfctv=80 CPUTot=80 CPULoad=0.02 AvailableFeatures=xeon5218r,GPU_RTX3090,power_ipmi ActiveFeatures=xeon5218r,GPU_RTX3090,power_ipmi Gres=gpu:RTX3090:10 NodeAddr=s007 NodeHostName=s007 Version=23.02.4 OS=Linux 3.10.0-1160.99.1.el7.x86_64 #1 SMP Wed Sep 13 14:19:20 UTC 2023 RealMemory=768000 AllocMem=0 FreeMem=763076 Sockets=4 Boards=1 State=IDLE+POWERED_DOWN ThreadsPerCore=2 TmpDisk=800000 Weight=19336 Owner=N/A MCS_label=N/A Partitions=sm3090,sm3090_768 BootTime=2023-09-23T23:38:43 SlurmdStartTime=2023-09-23T23:39:14 LastBusyTime=Unknown ResumeAfterTime=None CfgTRES=cpu=80,mem=750G,billing=160,gres/gpu=10 AllocTRES= CapWatts=n/a CurrentWatts=37 AveWatts=36 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Here the expected values would be N/A because the node is powered off: CPULoad=N/A CurrentWatts=N/A AveWatts=N/A The IMPI DMCI remote command shows that the node's current power is indeed zero: $ ipmi-dcmi -D LAN_2_0 --username=root --password=$IPMI_PASSWORD --hostname=s007b --get-system-power-statistics Current Power : 0 Watts Minimum Power over sampling duration : 350 watts Maximum Power over sampling duration : 4687 watts Average Power over sampling duration : 2356 watts Time Stamp : 09/24/2023 - 11:42:26 Statistics reporting time period : 2672412000 milliseconds Power Measurement : Not Available It appears that scontrol display the last recorded values from slurmd in stead of what should be the actual current values. IMHO, when slurmctld registers the State=IDLE+POWERED_DOWN, or if slurmd hasn't been reachable for SlurmdTimeout seconds, slurmctld should set the 3 N/A values above. Could you kindly update slurmctld to display this behavior? FYI, we have this slurmd timeout value: $ scontrol show config | grep SlurmdTimeout SlurmdTimeout = 300 sec Thanks, Ole
Note added: We measure node power using RAPL: $ scontrol show config | grep AcctGatherEnergyType AcctGatherEnergyType = acct_gather_energy/rapl
Hello Ole, Fortunately, this issue is easily reproducible. I have a patch that I'll be uploading now that will cause the power save module to reset the three metrics: CPULoad, CurrentWatts, AveWatts when a node is determined to be suspended. Best Regards, Tyler Connel
Hi Tyler, (In reply to Tyler Connel from comment #2) > Fortunately, this issue is easily reproducible. I have a patch that I'll be > uploading now that will cause the power save module to reset the three > metrics: CPULoad, CurrentWatts, AveWatts when a node is determined to be > suspended. Wonderful, I'm glad this is an easy fix :-) Will the patch be applied to 23.02, or do we have to wait until 23.11? Thanks, Ole
Hopefully it's a *good* fix per the reviewer :) Since the change mostly involves correcting a behavior to an expected behavior, my intuition is to target master for the change. Of course, if you have a strong preference that the change apply to 23.02 I would be willing to target that branch.
(In reply to Tyler Connel from comment #5) > Since the change mostly involves correcting a behavior to an expected > behavior, my intuition is to target master for the change. Of course, if you > have a strong preference that the change apply to 23.02 I would be willing > to target that branch. Yes, I'd like to ask for the change in 23.02. We won't plan to upgrade to 23.11 until several minor releases, so I'd be really happy if the change could be applied to 23.02 also! I'd like to get my power monitoring scripts perfected and tested with acct_gather_energy/impi quite soon. Thanks, Ole
Hi Tyler, (In reply to Tyler Connel from comment #2) > Fortunately, this issue is easily reproducible. I have a patch that I'll be > uploading now that will cause the power save module to reset the three > metrics: CPULoad, CurrentWatts, AveWatts when a node is determined to be > suspended. It's not only the power save module which may turn off a node, other reasons for node malfunction exist. Today we have a node with a dead motherboard which crashed and won't power up. It's status is DOWN+DRAIN+NOT_RESPONDING: $ scontrol show node c060 NodeName=c060 Arch=x86_64 CoresPerSocket=10 CPUAlloc=0 CPUEfctv=40 CPUTot=40 CPULoad=39.98 AvailableFeatures=xeon6148v5,opa,xeon40,power_ipmi ActiveFeatures=xeon6148v5,opa,xeon40,power_ipmi Gres=(null) NodeAddr=c060 NodeHostName=c060 Version=23.02.5 OS=Linux 3.10.0-1160.95.1.el7.x86_64 #1 SMP Mon Jul 24 13:59:37 UTC 2023 RealMemory=384000 AllocMem=0 FreeMem=371528 Sockets=4 Boards=1 State=DOWN+DRAIN+NOT_RESPONDING ThreadsPerCore=1 TmpDisk=140000 Weight=10535 Owner=N/A MCS_label=N/A Partitions=xeon40 BootTime=2023-08-30T08:37:02 SlurmdStartTime=2023-09-26T20:24:43 LastBusyTime=2023-09-27T07:22:11 ResumeAfterTime=None CfgTRES=cpu=40,mem=375G,billing=66 AllocTRES= CapWatts=n/a CurrentWatts=435 AveWatts=395 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Reason=Motherboard defective [root@2023-09-27T07:56:40] So it would be good if nodes with a state of NOT_RESPONDING also get their CPULoad, CurrentWatts, AveWatts metrics reset to N/A. Is this possible? Thanks, Ole
(In reply to Ole.H.Nielsen@fysik.dtu.dk from comment #7) > Hi Tyler, > > (In reply to Tyler Connel from comment #2) > > Fortunately, this issue is easily reproducible. I have a patch that I'll be > > uploading now that will cause the power save module to reset the three > > metrics: CPULoad, CurrentWatts, AveWatts when a node is determined to be > > suspended. > > It's not only the power save module which may turn off a node, other reasons > for node malfunction exist. Today we have a node with a dead motherboard > which crashed and won't power up. It's status is DOWN+DRAIN+NOT_RESPONDING: > > $ scontrol show node c060 > NodeName=c060 Arch=x86_64 CoresPerSocket=10 > CPUAlloc=0 CPUEfctv=40 CPUTot=40 CPULoad=39.98 > AvailableFeatures=xeon6148v5,opa,xeon40,power_ipmi > ActiveFeatures=xeon6148v5,opa,xeon40,power_ipmi > Gres=(null) > NodeAddr=c060 NodeHostName=c060 Version=23.02.5 > OS=Linux 3.10.0-1160.95.1.el7.x86_64 #1 SMP Mon Jul 24 13:59:37 UTC 2023 > RealMemory=384000 AllocMem=0 FreeMem=371528 Sockets=4 Boards=1 > State=DOWN+DRAIN+NOT_RESPONDING ThreadsPerCore=1 TmpDisk=140000 > Weight=10535 Owner=N/A MCS_label=N/A > Partitions=xeon40 > BootTime=2023-08-30T08:37:02 SlurmdStartTime=2023-09-26T20:24:43 > LastBusyTime=2023-09-27T07:22:11 ResumeAfterTime=None > CfgTRES=cpu=40,mem=375G,billing=66 > AllocTRES= > CapWatts=n/a > CurrentWatts=435 AveWatts=395 > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > Reason=Motherboard defective [root@2023-09-27T07:56:40] > > So it would be good if nodes with a state of NOT_RESPONDING also get their > CPULoad, CurrentWatts, AveWatts metrics reset to N/A. Is this possible? > > Thanks, > Ole This is an excellent point. I'll pull this down from review and reconsider my approach against unexpectedly downed nodes.
Just wanted to update as it's been a while since my last comment. I found a good solution for CPU load to display as N/A for node states which include any of: DOWN, POWERED_DOWN, and NO_RESPOND. For the power metrics (e.g. AveWatts) I'll have to spend some time to find a good place in acct_gather_energy to affect a change.
@Ole, There's been some discussions on this issue. How would you feel about the expected values for a DOWN/POWERED_DOWN node being zeroed instead of N/A? E.g.: CPULoad=0 CurrentWatts=0 AveWatts=0 We feel that this would be a better set of values for the interface to display. Would you have any reservations? Best, Tyler Connel
Hi Tyler, (In reply to Tyler Connel from comment #11) > There's been some discussions on this issue. How would you feel about the > expected values for a DOWN/POWERED_DOWN node being zeroed instead of N/A? > E.g.: > > CPULoad=0 > CurrentWatts=0 > AveWatts=0 > > We feel that this would be a better set of values for the interface to > display. Would you have any reservations? I'm fine with zero values, since that reflects the node state as well. Thanks, Ole
Hello @Ole, The patch has been accepted to reset the values mentioned (CPU load, current watts and average watts) to 0 when a node goes down unexpectedly or is powered down. This patch was accepted for 23.11, and I recall that you had also wanted the change applied to 23.02. I will inquire as to whether the change can be applied to 23.02 before resolving the ticket. Best, Tyler Connel
I'm out of office, back on Thursday, November 9. Jeg er ikke på kontoret, tilbage på torsdag den 9. november. Best regards / Venlig hilsen, Ole Holm Nielsen
Hello @Ole, As this involves a change in behavior, the fix will only apply to 23.11. I'll resolve this ticket, but feel free to reach out if you have further questions. Best Regards, Tyler Connel
Hi Tyler, (In reply to Tyler Connel from comment #24) > As this involves a change in behavior, the fix will only apply to 23.11. > I'll resolve this ticket, but feel free to reach out if you have further > questions. Thanks a lot for the fix! I'm sorry it can't apply to 23.02, since I consider the current behavior incorrect. But so be it. Greetings, Ole