Ticket 5947 - Multifactor scheduling needs a PriorityWeightXFACTOR favoring short jobs
Summary: Multifactor scheduling needs a PriorityWeightXFACTOR favoring short jobs
Status: RESOLVED DUPLICATE of ticket 5202
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other tickets)
Version: 17.11.12
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2018-10-30 07:09 MDT by Ole.H.Nielsen@fysik.dtu.dk
Modified: 2018-11-05 05:25 MST (History)
3 users (show)

See Also:
Site: DTU Physics
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
slurm.conf file (4.92 KB, text/plain)
2018-10-30 07:09 MDT, Ole.H.Nielsen@fysik.dtu.dk
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Ole.H.Nielsen@fysik.dtu.dk 2018-10-30 07:09:08 MDT
We're getting complaints from our user groups that the queue waiting time for short jobs is way too long, making it difficult to work with testing and debug jobs on a very busy cluster.

Backfilling is enabled, but is of limited effectiveness because our cluster is constantly oversubscribed by a factor of 5-10.  The PriorityWeightAge in slurm.conf favors older jobs, but irrespective of whether such jobs are short or long, so that's not a solution.

Based upon our past experience with the MAUI scheduler, the Expansion Factor (XFACTOR) gives us a great tool for prioritizing short jobs:

XFACTOR = 1 + <EFFQUEUETIME> / <WALLCLOCKLIMIT>

XFACTOR documentation for MAUI is at http://docs.adaptivecomputing.com/maui/5.1.2priorityfactors.php, also in MOAB at http://www.adaptivecomputing.com/blog-hpc/using-moab-job-priorities-exploring-priority-sub-components/

It seems to me that a future PriorityWeightXFACTOR flag in slurm.conf would be very similar to the existing PriorityWeightAge flag, needing only the division by WALLCLOCKLIMIT and a cap PriorityMaxXFACTOR (similar to MAUI's XFACTORCAP).

I notice the recommendations in bug 5194 and the feature reminder in bug 5202, but there doesn't seem to be any solution coming any time soon.

Question to Slurm developers: Would you kindly consider a functionality described here for inclusion in Slurm 19.05?  IMHO, many Slurm sites might find this very useful.

Thanks,
Ole
Comment 1 Ole.H.Nielsen@fysik.dtu.dk 2018-10-30 07:09:37 MDT
Created attachment 8131 [details]
slurm.conf file
Comment 6 Alejandro Sanchez 2018-11-05 05:25:20 MST
Hi Ole. Please, let's centralize the discussion in bug 5202. I'm gonna mark this as a duplicate of that one.

*** This ticket has been marked as a duplicate of ticket 5202 ***