Ticket 5947

Summary: Multifactor scheduling needs a PriorityWeightXFACTOR favoring short jobs
Product: Slurm Reporter: Ole.H.Nielsen <Ole.H.Nielsen>
Component: SchedulingAssignee: Jacob Jenson <jacob>
Status: RESOLVED DUPLICATE QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: alex, brian, jacob
Version: 17.11.12   
Hardware: Linux   
OS: Linux   
Site: DTU Physics Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: slurm.conf file

Description Ole.H.Nielsen@fysik.dtu.dk 2018-10-30 07:09:08 MDT
We're getting complaints from our user groups that the queue waiting time for short jobs is way too long, making it difficult to work with testing and debug jobs on a very busy cluster.

Backfilling is enabled, but is of limited effectiveness because our cluster is constantly oversubscribed by a factor of 5-10.  The PriorityWeightAge in slurm.conf favors older jobs, but irrespective of whether such jobs are short or long, so that's not a solution.

Based upon our past experience with the MAUI scheduler, the Expansion Factor (XFACTOR) gives us a great tool for prioritizing short jobs:

XFACTOR = 1 + <EFFQUEUETIME> / <WALLCLOCKLIMIT>

XFACTOR documentation for MAUI is at http://docs.adaptivecomputing.com/maui/5.1.2priorityfactors.php, also in MOAB at http://www.adaptivecomputing.com/blog-hpc/using-moab-job-priorities-exploring-priority-sub-components/

It seems to me that a future PriorityWeightXFACTOR flag in slurm.conf would be very similar to the existing PriorityWeightAge flag, needing only the division by WALLCLOCKLIMIT and a cap PriorityMaxXFACTOR (similar to MAUI's XFACTORCAP).

I notice the recommendations in bug 5194 and the feature reminder in bug 5202, but there doesn't seem to be any solution coming any time soon.

Question to Slurm developers: Would you kindly consider a functionality described here for inclusion in Slurm 19.05?  IMHO, many Slurm sites might find this very useful.

Thanks,
Ole
Comment 1 Ole.H.Nielsen@fysik.dtu.dk 2018-10-30 07:09:37 MDT
Created attachment 8131 [details]
slurm.conf file
Comment 6 Alejandro Sanchez 2018-11-05 05:25:20 MST
Hi Ole. Please, let's centralize the discussion in bug 5202. I'm gonna mark this as a duplicate of that one.

*** This ticket has been marked as a duplicate of ticket 5202 ***