Ticket 3808

Summary:	add bf_max_time seperate from bf_interval
Product:	Slurm	Reporter:	Doug Jacobsen <dmjacobsen>
Component:	Contributions	Assignee:	Danny Auble <da>
Status:	RESOLVED FIXED	QA Contact:
Severity:	4 - Minor Issue
Priority:	---	CC:	sts
Version:	17.02.2
Hardware:	Cray XC
OS:	Linux
Site:	NERSC	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:	17.11.0-0pre1
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---
Attachments:	bf_max_time patch

Description Doug Jacobsen 2017-05-15 10:18:17 MDT

Created attachment 4555 [details]
bf_max_time patch

Hello,

This seems like a related request to bug 3234, however this one comes with code (and modifications to the man page)!

I wanted to separate bf_interval from the maximum scheduling time.  This allows me to set a large value for maximum scheduling time in the case that we have a large queue formed (with cori-size systems it just takes awhile to do planning), while leaving bf_interval low to ensure that we don't have huge gap between backfill scheduling cycles.

For example:

SchedulerParameters=bf_interval=20,bf_max_time=600,...

I didn't use bf_max_sched_time, because then max_sched_time also matches it.

Since this is done as a modification to SchedulerParameters, it would appear there is no change required in protocol.

I'll plan on starting to use this on cori because we are running into responsiveness issues because of the ever lengthening bf_interval (up to 220s now).

I look forward to your feedback,
Doug

Comment 1 Tim Wickberg 2017-05-15 19:26:51 MDT

Doug - Just a gentle reminder, please send these as Sev 4 so we can do some up-front triage, and make sure things don't get misplaced. (I've adjusted our custom business logic plugin to re-mark these on future submissions automatically.)

Comment 2 Doug Jacobsen 2017-05-17 07:40:05 MDT

FYI, this is in use on cori and edison:

bf_interval=30
bf_max_time=600

Comment 4 Danny Auble 2017-06-01 14:19:56 MDT

Thanks Doug, this has been added in commit c8c9694f8ca8a and will be in 17.11.

Comment 5 Doug Jacobsen 2017-06-01 14:33:52 MDT

FYI, I suggest also setting max_rpc_cnt=150 or so if using this capability.

Comment 6 Danny Auble 2017-06-01 14:59:37 MDT

Doug why are you seeing this being warranted?  Just want to give a good description in the man page instead of random advice :).

Comment 7 Danny Auble 2017-06-13 13:42:48 MDT

Doug, ping?

Comment 8 Doug Jacobsen 2017-06-16 14:24:22 MDT

Sorry, I've been on travel and have been unable to properly manage by bugs.

After deploying this patch and the continue-scheduling-with-completing-nodes modification I found that sometimes slurmctld would spend large amounts of time _only_ scheduling, sometimes running the primary scheduler repeatedly in the gaps between backfill lock releases.  This then caused RPCs to get starved.  Basically between this patch and the other we spent _far_ more time scheduling than we used to in the past, which is great for utilization, for starting user jobs, and ensuring our whole workload is reviewed frequently.

I set the max_rpc_cnt to 150 as it generally balanced cori's rpc load with making useful progress on scheduling.  It needs to be high enough that scheduling isn't always disabled, and low enough that our interactive workload can get through in a reasonable period of time.  Certainly needs to be below 256 (the default RPC thread limit).

Comment 9 Danny Auble 2017-06-19 09:11:42 MDT

Thanks Doug, I put a snip in commit 24c04bce06c6e.