Can we make powersave ignore drained and/or already powered-off node? [akmalm@jud5 ~]$ sinfo -p all PARTITION AVAIL TIMELIMIT NODES STATE NODELIST all down infinite 2 down* jnod[0001-0002] all down infinite 6 drain* jnod[0003-0008] [akmalm@jud5 ~]$ scontrol show node jnod0003 NodeName=jnod0003 CoresPerSocket=4 CPUAlloc=0 CPUErr=0 CPUTot=16 CPULoad=N/A Features=localdisk,gpu,nogpu Gres=(null) NodeAddr=jnod0003 NodeHostName=jnod0003 Version=(null) RealMemory=12000 AllocMem=0 Sockets=2 Boards=1 State=UNKNOWN*+DRAIN ThreadsPerCore=2 TmpDisk=0 Weight=1 BootTime=None SlurmdStartTime=None CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Reason=testing power saving [root@2015-08-27T16:00:34] slurmctld.log: [2015-08-27T16:16:52.826] error: Nodes jnod[0003-0008] not responding, setting DOWN [akmalm@jud5 ~]$ scontrol show node jnod0003 NodeName=jnod0003 CoresPerSocket=4 CPUAlloc=0 CPUErr=0 CPUTot=16 CPULoad=N/A Features=localdisk,gpu,nogpu Gres=(null) NodeAddr=jnod0003 NodeHostName=jnod0003 Version=(null) RealMemory=12000 AllocMem=0 Sockets=2 Boards=1 State=DOWN*+DRAIN ThreadsPerCore=2 TmpDisk=0 Weight=1 BootTime=None SlurmdStartTime=None CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Reason=testing power saving [root@2015-08-27T16:00:34] slurmctld.log: [2015-08-27T17:10:36.659] Power save mode: 6 nodes [akmalm@jud5 ~]$ scontrol show node jnod[0003-0008] NodeName=jnod0003 CoresPerSocket=4 CPUAlloc=0 CPUErr=0 CPUTot=16 CPULoad=N/A Features=localdisk,gpu,nogpu Gres=(null) NodeAddr=jnod0003 NodeHostName=jnod0003 Version=(null) RealMemory=12000 AllocMem=0 Sockets=2 Boards=1 State=DOWN+DRAIN+POWER ThreadsPerCore=2 TmpDisk=0 Weight=1 BootTime=None SlurmdStartTime=None CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Reason=testing power saving [root@2015-08-27T16:00:34] *submit a job* [akmalm@jud5 ~]$ scontrol show node jnod0003 NodeName=jnod0003 CoresPerSocket=4 CPUAlloc=16 CPUErr=0 CPUTot=16 CPULoad=N/A Features=localdisk,gpu,nogpu Gres=(null) NodeAddr=jnod0003 NodeHostName=jnod0003 Version=(null) RealMemory=12000 AllocMem=953 Sockets=2 Boards=1 State=ALLOCATED+DRAIN+POWER ThreadsPerCore=2 TmpDisk=0 Weight=1 BootTime=None SlurmdStartTime=None CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s And then slurm try to power up the node
the node is still drained when it powered back on [akmalm@jud5 ~]$ scontrol show node jnod0003 NodeName=jnod0003 Arch=x86_64 CoresPerSocket=4 CPUAlloc=0 CPUErr=0 CPUTot=16 CPULoad=0.43 Features=localdisk,gpu,nogpu Gres=(null) NodeAddr=jnod0003 NodeHostName=jnod0003 Version=14.11 OS=Linux RealMemory=12031 AllocMem=0 Sockets=2 Boards=1 State=IDLE+DRAIN ThreadsPerCore=2 TmpDisk=683212 Weight=1 BootTime=2015-08-27T17:17:31 SlurmdStartTime=2015-08-27T17:20:43 CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s But the drain reason is gone. And only the first job allocated (that turned on the node) will run (well, they shouldnt even run in the first place), the next one wont get scheduled since the node is drained
Hi Akmal, let me digest this and get back to you. David
Hi Akmal, are you referring to the SuspendTime? If I understand correctly you have some nodes that are drained and idle for time greater then SuspendTime and you down want to use the SuspendProgram and ResumeProgram. Out of curiosity could you append the off.sh and on.sh programs. Thanks, David
Hi, is my understanding of the issue correct? David
sorry for the late reply. Yes you're right. We have drained nodes that has been idle/turned off for a period longer than SuspendTime and we dont want Slurm to run SuspendProgram or ResumeProgram on them. I could do like put a check in our on.sh or off.sh but I'm having some issue with that approach. Will update on that later
Hi Akmal, I worked on this issue and I can reproduce it and see the problem. I believe this is a bug because the controller sets a node in power suspend state even if the node is already down and then as you observed it power them up. I am testing a patch and I also want to consult about it with my colleagues in California once they come online. By tomorrow our time we should be able to close this problem. David
Hi, a quick question for you. The nodes that are down where set down manually or by Slurm. My assumption was that you have shut them down outside of Slurm. Is this correct? David
Yes, you're correct
Fixed. commit 4c3491a46491b87b636703e0d2f0e6cd0e6f7372 Author: Morris Jette <jette@schedmd.com> Date: Tue Sep 1 15:59:45 2015 -0700 David
Thanks David, Quick question, will this commit also prevent drained node from being powered-down by SuspendProgram?
Hi, no that's a designed behaviour. This patch prevents jobs from being dispatched to down nodes. This is a note from the bug fixer: ------ Note this does not prevent powering down a node in DOWN or DRAIN state (what the bug reported). That is operating as designed. A node that has not jobs and can't be allocated is a perfect candidate for being powered down. It does prevent the node from being allocated and powered up, which was the real bug. ------- David