5964 – Advice on the management of short jobs in SLURM

Ticket 5964 - Advice on the management of short jobs in SLURM

Summary: Advice on the management of short jobs in SLURM

Status:	RESOLVED INFOGIVEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Scheduling (show other tickets)
Version:	18.08.0
Hardware:	Linux Linux

Severity:	4 - Minor Issue
Assignee:	Alejandro Sanchez
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2018-11-01 10:25 MDT by David Baker
Modified:	2018-11-20 08:51 MST (History)
CC List:	0 users

See Also:
Site:	OCF
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	Southampton University
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Our slurm.conf (4.55 KB, text/plain) 2018-11-01 10:25 MDT, David Baker	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description David Baker 2018-11-01 10:25:29 MDT

Created attachment 8154 [details]
Our slurm.conf

Hello,

I am very keen to revisit this issue if possible, please. I have raised the issue before (see https://bugs.schedmd.com/show_bug.cgi?id=5194), however at the time I didn't have the time to give this much attention.

As I noted in #5194 we're migrating from TORQUE/MOAB. Using that software we us the XFACTOR to ensure that short jobs don't get starved out by longer jobs. There is no equivalent to XFACTOR in SLURM and so we need to achieve the above by another means.Also (above and beyond this) we need to be able to efficiently run a diverse workload well. 

In #5194 you made a number of points -- the most important being...

"Continuing with the advice for the priority/multifactor plugin, we generally recommend ordering each of the PriorityWeight<something> factors from most to least important, then setting them each an order of magnitude apart. This should help some more jobs get scheduled. The weight values should be high enough to get a good set of significant digits since all the factors are floating point numbers from 0.0 to 1.0. Starting around 1000 or so for those factors you want to make predominant, as stated in the web documentation. 

Without any specific site requirements, perhaps what makes more sense is to set the highest weight to the QOS factor and the next one to the FairShare factor. We also usually recommend to set the PriorityFlags=FAIR_TREE."

....Is this good general advice for general/diverse workloads?

On the SLURM community forum a user gave the following piece of advice..

""PriorityFavorSmall=NO
PriorityFlags=DEPTH_OBLIVIOUS,SMALL_RELATIVE_TO_TIME

PriorityFavorSmall and SMALL_RELATIVE_TO_TIME are used by us to favour both short and large jobs.  So if two jobs are equal in size, the shorter of the two is favoured.  Also if two jobs are equal in time, the larger is favoured. We use this as a way to get short jobs in and out of the queues quickly as well as help large jobs (typically MPI) have priority over small serial jobs."

..... What are you comments re this advice, please? It seems to make a degree of sense.

I'm keen to explore the management jobs in the cluster especially with respect to the treatment of small jobs (as I note above), please. I suspect a reasonable starting point is to attach my current slurm.conf so that you can make suggested amendments, please.

Best regards,
David

Comment 2 Alejandro Sanchez 2018-11-05 05:26:40 MST

Hi David. Since other sites are also demanding this, please let's centralize the tracking in bug 5202, where you're already CC'd on. Thanks.

Comment 3 Alejandro Sanchez 2018-11-05 05:27:43 MST

Tagging as duplicate.

*** This ticket has been marked as a duplicate of ticket 5202 ***

Comment 4 David Baker 2018-11-05 05:36:50 MST

Hello,


I agree that this streamlining is sensible, however are you happy to address my immediate concerns re my questions set out in bug 5964, please?  The XFACTOR would be a great addition in the future, however in the meantime it would be good to have some general advice to work with, please -- re my questions raised in bug 5964. Is that OK?


Best regards,

David


________________________________
From: bugs@schedmd.com <bugs@schedmd.com>
Sent: 05 November 2018 12:27
To: Baker D.J.
Subject: [Bug 5964] Advice on the management of short jobs in SLURM

Alejandro Sanchez<mailto:alex@schedmd.com> changed bug 5964<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D5964&data=01%7C01%7Cd.j.baker%40soton.ac.uk%7Cb784e2e6c1cc43299d6708d6431a1736%7C4a5378f929f44d3ebe89669d03ada9d8%7C1&sdata=5x5JvQq5RB3cxd3lW18EHFZSZAPlwNc6DY1G3aC3a6U%3D&reserved=0>
What    Removed Added
Resolution      ---     DUPLICATE
Status  UNCONFIRMED     RESOLVED

Comment # 3<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D5964%23c3&data=01%7C01%7Cd.j.baker%40soton.ac.uk%7Cb784e2e6c1cc43299d6708d6431a1736%7C4a5378f929f44d3ebe89669d03ada9d8%7C1&sdata=fiqNjpkbEn%2F6u4M4pHSWp8bd0hpz2ZoHxV08ELzUMI0%3D&reserved=0> on bug 5964<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D5964&data=01%7C01%7Cd.j.baker%40soton.ac.uk%7Cb784e2e6c1cc43299d6708d6431a1736%7C4a5378f929f44d3ebe89669d03ada9d8%7C1&sdata=5x5JvQq5RB3cxd3lW18EHFZSZAPlwNc6DY1G3aC3a6U%3D&reserved=0> from Alejandro Sanchez<mailto:alex@schedmd.com>

Tagging as duplicate.

*** This bug has been marked as a duplicate of bug 5202<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D5202&data=01%7C01%7Cd.j.baker%40soton.ac.uk%7Cb784e2e6c1cc43299d6708d6431a1736%7C4a5378f929f44d3ebe89669d03ada9d8%7C1&sdata=f6pC8kDhPYBCBSXqch1%2FVX77L8Jvm0ydD1G18yZO96Y%3D&reserved=0> ***

________________________________
You are receiving this mail because:

  *   You reported the bug.

Comment 5 Alejandro Sanchez 2018-11-05 06:23:12 MST

(In reply to David Baker from comment #4)
> Hello,
> 
> 
> I agree that this streamlining is sensible, however are you happy to address
> my immediate concerns re my questions set out in bug 5964, please?  The
> XFACTOR would be a great addition in the future, however in the meantime it
> would be good to have some general advice to work with, please -- re my
> questions raised in bug 5964. Is that OK?

Sure, no problem.

(In reply to David Baker from comment #0)
> Created attachment 8154 [details]
> Our slurm.conf
> 
> Hello,
> 
> I am very keen to revisit this issue if possible, please. I have raised the
> issue before (see https://bugs.schedmd.com/show_bug.cgi?id=5194), however at
> the time I didn't have the time to give this much attention.
> 
> As I noted in #5194 we're migrating from TORQUE/MOAB. Using that software we
> us the XFACTOR to ensure that short jobs don't get starved out by longer
> jobs. There is no equivalent to XFACTOR in SLURM and so we need to achieve
> the above by another means.Also (above and beyond this) we need to be able
> to efficiently run a diverse workload well. 
> 
> In #5194 you made a number of points -- the most important being...
> 
> "Continuing with the advice for the priority/multifactor plugin, we
> generally recommend ordering each of the PriorityWeight<something> factors
> from most to least important, then setting them each an order of magnitude
> apart. This should help some more jobs get scheduled. The weight values
> should be high enough to get a good set of significant digits since all the
> factors are floating point numbers from 0.0 to 1.0. Starting around 1000 or
> so for those factors you want to make predominant, as stated in the web
> documentation. 
> 
> Without any specific site requirements, perhaps what makes more sense is to
> set the highest weight to the QOS factor and the next one to the FairShare
> factor. We also usually recommend to set the PriorityFlags=FAIR_TREE."
> 
> ....Is this good general advice for general/diverse workloads?

As an starting point without any specific site requirements, yes it is a good advise for general/diverse workloads.
 
> On the SLURM community forum a user gave the following piece of advice..
> 
> ""PriorityFavorSmall=NO
> PriorityFlags=DEPTH_OBLIVIOUS,SMALL_RELATIVE_TO_TIME
> 
> PriorityFavorSmall and SMALL_RELATIVE_TO_TIME are used by us to favour both
> short and large jobs.  So if two jobs are equal in size, the shorter of the
> two is favoured.  Also if two jobs are equal in time, the larger is
> favoured. We use this as a way to get short jobs in and out of the queues
> quickly as well as help large jobs (typically MPI) have priority over small
> serial jobs."
> 
> ..... What are you comments re this advice, please? It seems to make a
> degree of sense.

Looking at the code here:

https://github.com/SchedMD/slurm/blob/slurm-18-08-3-1/src/plugins/priority/multifactor/priority_multifactor.c#L2060

If you only had PriorityWeightJobSize, then the higher the amount of requested CPUs the higher the JobSizeFactor.

If you also have SMALL_RELATIVE_TO_TIME, then two jobs with the same amount of requested CPUs, the one with with shorter TimeLimit will have higher JobSizeFactor since it is divided by the TimeLimit.

If you also have PriorityFavorSmall=Yes then the previously calculated factor is reversed:

https://github.com/SchedMD/slurm/blob/slurm-18-08-3-1/src/plugins/priority/multifactor/priority_multifactor.c#L2079

if (favor_small) {
    job_ptr->prio_factors->priority_js = 
          (double) 1.0 - job_ptr->prio_factors->priority_js;
}

Does it make sense?
 
> I'm keen to explore the management jobs in the cluster especially with
> respect to the treatment of small jobs (as I note above), please. I suspect
> a reasonable starting point is to attach my current slurm.conf so that you
> can make suggested amendments, please.

Your current slurm.conf

PriorityType=priority/multifactor
PriorityDecayHalfLife=14-0
#PriorityUsageResetPeriod=MONTHLY
PriorityWeightFairshare=100000
PriorityWeightAge=1000
PriorityWeightPartition=10000
PriorityWeightJobSize=1000
#PriorityWeightQOS=2000
PriorityMaxAge=3-0

gives more importance to FairShare factor as compared to JobSize, so JobSize factor contribution won't be as noticeable as FairShare or Partition for instance.

Also I don't use you make use of the SMALL_RELATIVE_TO_TIME. So if you want to favor shorter TimeLimit jobs I would increase the JobSize factor and add SMALL_RELATIVE_TO_TIME.

Please, let me know if you have further questions. Thanks.

Comment 6 Alejandro Sanchez 2018-11-19 07:40:19 MST

Hi David. Is there anything else you need from here? thank you.

Comment 7 David Baker 2018-11-19 07:54:47 MST

Hello,

Please feel free to close this call. Apologies for not getting back to you earlier.

Best regards,
David

________________________________
From: bugs@schedmd.com <bugs@schedmd.com>
Sent: 19 November 2018 14:40
To: Baker D.J.
Subject: [Bug 5964] Advice on the management of short jobs in SLURM


Comment # 6<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D5964%23c6&data=01%7C01%7Cd.j.baker%40soton.ac.uk%7Cc0c93fecebf4466922a908d64e2cef2a%7C4a5378f929f44d3ebe89669d03ada9d8%7C1&sdata=vuWgfC799eXjPMnJNIxGkeJpjcK1QY3gZ2ZO4U9tcL4%3D&reserved=0> on bug 5964<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D5964&data=01%7C01%7Cd.j.baker%40soton.ac.uk%7Cc0c93fecebf4466922a908d64e2cef2a%7C4a5378f929f44d3ebe89669d03ada9d8%7C1&sdata=a3HVk%2BKGbr1A75HdP8jZld%2F37UH3lgny7JpqCBcBRek%3D&reserved=0> from Alejandro Sanchez<mailto:alex@schedmd.com>

Hi David. Is there anything else you need from here? thank you.

________________________________
You are receiving this mail because:

  *   You reported the bug.