Ticket 1893

Summary: Make powersave ignore drained node
Product: Slurm Reporter: Akmal Madzlan <akmalm>
Component: slurmctldAssignee: David Bigagli <david>
Status: RESOLVED FIXED QA Contact:
Severity: 3 - Medium Impact    
Priority: --- CC: brian, da
Version: 14.11.8   
Hardware: Linux   
OS: Linux   
Site: DownUnder GeoSolutions Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 14.11.10 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Akmal Madzlan 2015-08-26 22:19:24 MDT
Can we make powersave ignore drained and/or already powered-off node?

[akmalm@jud5 ~]$ sinfo -p all
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
all        down   infinite      2  down* jnod[0001-0002]
all        down   infinite      6 drain* jnod[0003-0008]

[akmalm@jud5 ~]$ scontrol show node jnod0003
NodeName=jnod0003 CoresPerSocket=4
   CPUAlloc=0 CPUErr=0 CPUTot=16 CPULoad=N/A Features=localdisk,gpu,nogpu
   Gres=(null)
   NodeAddr=jnod0003 NodeHostName=jnod0003 Version=(null)
   RealMemory=12000 AllocMem=0 Sockets=2 Boards=1
   State=UNKNOWN*+DRAIN ThreadsPerCore=2 TmpDisk=0 Weight=1
   BootTime=None SlurmdStartTime=None
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   Reason=testing power saving [root@2015-08-27T16:00:34]

slurmctld.log: [2015-08-27T16:16:52.826] error: Nodes jnod[0003-0008] not responding, setting DOWN

[akmalm@jud5 ~]$ scontrol show node jnod0003
NodeName=jnod0003 CoresPerSocket=4
   CPUAlloc=0 CPUErr=0 CPUTot=16 CPULoad=N/A Features=localdisk,gpu,nogpu
   Gres=(null)
   NodeAddr=jnod0003 NodeHostName=jnod0003 Version=(null)
   RealMemory=12000 AllocMem=0 Sockets=2 Boards=1
   State=DOWN*+DRAIN ThreadsPerCore=2 TmpDisk=0 Weight=1
   BootTime=None SlurmdStartTime=None
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   Reason=testing power saving [root@2015-08-27T16:00:34]

slurmctld.log: [2015-08-27T17:10:36.659] Power save mode: 6 nodes

[akmalm@jud5 ~]$ scontrol show node jnod[0003-0008]
NodeName=jnod0003 CoresPerSocket=4
   CPUAlloc=0 CPUErr=0 CPUTot=16 CPULoad=N/A Features=localdisk,gpu,nogpu
   Gres=(null)
   NodeAddr=jnod0003 NodeHostName=jnod0003 Version=(null)
   RealMemory=12000 AllocMem=0 Sockets=2 Boards=1
   State=DOWN+DRAIN+POWER ThreadsPerCore=2 TmpDisk=0 Weight=1
   BootTime=None SlurmdStartTime=None
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   Reason=testing power saving [root@2015-08-27T16:00:34]

*submit a job*

[akmalm@jud5 ~]$ scontrol show node jnod0003
NodeName=jnod0003 CoresPerSocket=4
   CPUAlloc=16 CPUErr=0 CPUTot=16 CPULoad=N/A Features=localdisk,gpu,nogpu
   Gres=(null)
   NodeAddr=jnod0003 NodeHostName=jnod0003 Version=(null)
   RealMemory=12000 AllocMem=953 Sockets=2 Boards=1
   State=ALLOCATED+DRAIN+POWER ThreadsPerCore=2 TmpDisk=0 Weight=1
   BootTime=None SlurmdStartTime=None
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

And then slurm try to power up the node
Comment 1 Akmal Madzlan 2015-08-26 22:31:05 MDT
the node is still drained when it powered back on

[akmalm@jud5 ~]$ scontrol show node jnod0003
NodeName=jnod0003 Arch=x86_64 CoresPerSocket=4
   CPUAlloc=0 CPUErr=0 CPUTot=16 CPULoad=0.43 Features=localdisk,gpu,nogpu
   Gres=(null)
   NodeAddr=jnod0003 NodeHostName=jnod0003 Version=14.11
   OS=Linux RealMemory=12031 AllocMem=0 Sockets=2 Boards=1
   State=IDLE+DRAIN ThreadsPerCore=2 TmpDisk=683212 Weight=1
   BootTime=2015-08-27T17:17:31 SlurmdStartTime=2015-08-27T17:20:43
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

But the drain reason is gone. 

And only the first job allocated (that turned on the node) will run (well, they shouldnt even run in the first place), the next one wont get scheduled since the node is drained
Comment 2 David Bigagli 2015-08-27 01:04:10 MDT
Hi Akmal,
         let me digest this and get back to you.

David
Comment 3 David Bigagli 2015-08-28 01:11:17 MDT
Hi Akmal,
         are you referring to the SuspendTime? If I understand correctly you have
some nodes that are drained and idle for time greater then SuspendTime and
you down want to use the SuspendProgram and ResumeProgram.

Out of curiosity could you append the off.sh and on.sh programs.

Thanks,
David
Comment 4 David Bigagli 2015-08-30 21:42:27 MDT
Hi,
   is my understanding of the issue correct?

David
Comment 5 Akmal Madzlan 2015-08-31 12:52:25 MDT
sorry for the late reply.

Yes you're right. We have drained nodes that has been idle/turned off for a
period longer than SuspendTime and we dont want Slurm to run SuspendProgram
or ResumeProgram on them.

I could do like put a check in our on.sh or off.sh but I'm having some
issue with that approach. Will update on that later
Comment 6 David Bigagli 2015-09-01 00:47:54 MDT
Hi Akmal,
         I worked on this issue and I can reproduce it and see the problem.
I believe this is a bug because the controller sets a node in power suspend 
state even if the node is already down and then as you observed it power them up.
I am testing a patch and I also want to consult about it with my colleagues in
California once they come online. By tomorrow our time we should be able to 
close this problem.

David
Comment 7 David Bigagli 2015-09-01 20:20:16 MDT
Hi, 
   a quick question for you. The nodes that are down where set down manually
or by Slurm. My assumption was that you have shut them down outside of Slurm.
Is this correct?

David
Comment 8 Akmal Madzlan 2015-09-01 20:23:11 MDT
Yes, you're correct
Comment 9 David Bigagli 2015-09-01 21:17:38 MDT
Fixed.

commit 4c3491a46491b87b636703e0d2f0e6cd0e6f7372
Author: Morris Jette <jette@schedmd.com>
Date:   Tue Sep 1 15:59:45 2015 -0700

David
Comment 10 Akmal Madzlan 2015-09-01 21:48:30 MDT
Thanks David,
Quick question, will this commit also prevent drained node from being powered-down by SuspendProgram?
Comment 11 David Bigagli 2015-09-01 23:30:57 MDT
Hi,
   no that's a designed behaviour. This patch prevents jobs from being dispatched
to down nodes. This is a note from the bug fixer:

------
Note this does not prevent powering down a node in DOWN or DRAIN state (what the bug reported). That is operating as designed. A node that has not jobs and can't be allocated is a perfect candidate for being powered down.

It does prevent the node from being allocated and powered up, which was the real bug.
-------

David