1893 – Make powersave ignore drained node

Ticket 1893 - Make powersave ignore drained node

Summary: Make powersave ignore drained node

Status:	RESOLVED FIXED

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	slurmctld (show other tickets)
Version:	14.11.8
Hardware:	Linux Linux

Severity:	3 - Medium Impact
Assignee:	David Bigagli
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2015-08-26 22:19 MDT by Akmal Madzlan
Modified:	2015-09-01 23:30 MDT (History)
CC List:	2 users (show)

See Also:
Site:	DownUnder GeoSolutions
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:	14.11.10
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Akmal Madzlan 2015-08-26 22:19:24 MDT

Can we make powersave ignore drained and/or already powered-off node?

[akmalm@jud5 ~]$ sinfo -p all
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
all        down   infinite      2  down* jnod[0001-0002]
all        down   infinite      6 drain* jnod[0003-0008]

[akmalm@jud5 ~]$ scontrol show node jnod0003
NodeName=jnod0003 CoresPerSocket=4
   CPUAlloc=0 CPUErr=0 CPUTot=16 CPULoad=N/A Features=localdisk,gpu,nogpu
   Gres=(null)
   NodeAddr=jnod0003 NodeHostName=jnod0003 Version=(null)
   RealMemory=12000 AllocMem=0 Sockets=2 Boards=1
   State=UNKNOWN*+DRAIN ThreadsPerCore=2 TmpDisk=0 Weight=1
   BootTime=None SlurmdStartTime=None
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   Reason=testing power saving [root@2015-08-27T16:00:34]

slurmctld.log: [2015-08-27T16:16:52.826] error: Nodes jnod[0003-0008] not responding, setting DOWN

[akmalm@jud5 ~]$ scontrol show node jnod0003
NodeName=jnod0003 CoresPerSocket=4
   CPUAlloc=0 CPUErr=0 CPUTot=16 CPULoad=N/A Features=localdisk,gpu,nogpu
   Gres=(null)
   NodeAddr=jnod0003 NodeHostName=jnod0003 Version=(null)
   RealMemory=12000 AllocMem=0 Sockets=2 Boards=1
   State=DOWN*+DRAIN ThreadsPerCore=2 TmpDisk=0 Weight=1
   BootTime=None SlurmdStartTime=None
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   Reason=testing power saving [root@2015-08-27T16:00:34]

slurmctld.log: [2015-08-27T17:10:36.659] Power save mode: 6 nodes

[akmalm@jud5 ~]$ scontrol show node jnod[0003-0008]
NodeName=jnod0003 CoresPerSocket=4
   CPUAlloc=0 CPUErr=0 CPUTot=16 CPULoad=N/A Features=localdisk,gpu,nogpu
   Gres=(null)
   NodeAddr=jnod0003 NodeHostName=jnod0003 Version=(null)
   RealMemory=12000 AllocMem=0 Sockets=2 Boards=1
   State=DOWN+DRAIN+POWER ThreadsPerCore=2 TmpDisk=0 Weight=1
   BootTime=None SlurmdStartTime=None
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   Reason=testing power saving [root@2015-08-27T16:00:34]

*submit a job*

[akmalm@jud5 ~]$ scontrol show node jnod0003
NodeName=jnod0003 CoresPerSocket=4
   CPUAlloc=16 CPUErr=0 CPUTot=16 CPULoad=N/A Features=localdisk,gpu,nogpu
   Gres=(null)
   NodeAddr=jnod0003 NodeHostName=jnod0003 Version=(null)
   RealMemory=12000 AllocMem=953 Sockets=2 Boards=1
   State=ALLOCATED+DRAIN+POWER ThreadsPerCore=2 TmpDisk=0 Weight=1
   BootTime=None SlurmdStartTime=None
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

And then slurm try to power up the node

Comment 1 Akmal Madzlan 2015-08-26 22:31:05 MDT

the node is still drained when it powered back on

[akmalm@jud5 ~]$ scontrol show node jnod0003
NodeName=jnod0003 Arch=x86_64 CoresPerSocket=4
   CPUAlloc=0 CPUErr=0 CPUTot=16 CPULoad=0.43 Features=localdisk,gpu,nogpu
   Gres=(null)
   NodeAddr=jnod0003 NodeHostName=jnod0003 Version=14.11
   OS=Linux RealMemory=12031 AllocMem=0 Sockets=2 Boards=1
   State=IDLE+DRAIN ThreadsPerCore=2 TmpDisk=683212 Weight=1
   BootTime=2015-08-27T17:17:31 SlurmdStartTime=2015-08-27T17:20:43
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

But the drain reason is gone. 

And only the first job allocated (that turned on the node) will run (well, they shouldnt even run in the first place), the next one wont get scheduled since the node is drained

Comment 2 David Bigagli 2015-08-27 01:04:10 MDT

Hi Akmal,
         let me digest this and get back to you.

David

Comment 3 David Bigagli 2015-08-28 01:11:17 MDT

Hi Akmal,
         are you referring to the SuspendTime? If I understand correctly you have
some nodes that are drained and idle for time greater then SuspendTime and
you down want to use the SuspendProgram and ResumeProgram.

Out of curiosity could you append the off.sh and on.sh programs.

Thanks,
David

Comment 4 David Bigagli 2015-08-30 21:42:27 MDT

Hi,
   is my understanding of the issue correct?

David

Comment 5 Akmal Madzlan 2015-08-31 12:52:25 MDT

sorry for the late reply.

Yes you're right. We have drained nodes that has been idle/turned off for a
period longer than SuspendTime and we dont want Slurm to run SuspendProgram
or ResumeProgram on them.

I could do like put a check in our on.sh or off.sh but I'm having some
issue with that approach. Will update on that later

Comment 6 David Bigagli 2015-09-01 00:47:54 MDT

Hi Akmal,
         I worked on this issue and I can reproduce it and see the problem.
I believe this is a bug because the controller sets a node in power suspend 
state even if the node is already down and then as you observed it power them up.
I am testing a patch and I also want to consult about it with my colleagues in
California once they come online. By tomorrow our time we should be able to 
close this problem.

David

Comment 7 David Bigagli 2015-09-01 20:20:16 MDT

Hi, 
   a quick question for you. The nodes that are down where set down manually
or by Slurm. My assumption was that you have shut them down outside of Slurm.
Is this correct?

David

Comment 8 Akmal Madzlan 2015-09-01 20:23:11 MDT

Yes, you're correct

Comment 9 David Bigagli 2015-09-01 21:17:38 MDT

Fixed.

commit 4c3491a46491b87b636703e0d2f0e6cd0e6f7372
Author: Morris Jette <jette@schedmd.com>
Date:   Tue Sep 1 15:59:45 2015 -0700

David

Comment 10 Akmal Madzlan 2015-09-01 21:48:30 MDT

Thanks David,
Quick question, will this commit also prevent drained node from being powered-down by SuspendProgram?

Comment 11 David Bigagli 2015-09-01 23:30:57 MDT

Hi,
   no that's a designed behaviour. This patch prevents jobs from being dispatched
to down nodes. This is a note from the bug fixer:

------
Note this does not prevent powering down a node in DOWN or DRAIN state (what the bug reported). That is operating as designed. A node that has not jobs and can't be allocated is a perfect candidate for being powered down.

It does prevent the node from being allocated and powered up, which was the real bug.
-------

David