| Summary: | Advice on the management of short jobs in SLURM | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | David Baker <d.j.baker> |
| Component: | Configuration | Assignee: | Alejandro Sanchez <alex> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | alex |
| Version: | 17.02.8 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | OCF | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | Southampton University |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
David Baker
2018-05-21 08:47:09 MDT
Hi David. As you mentioned, having sched/backfill properly configured is key. We usually recommend forcing users to set a --time through a Job Submit plugin. This is a C example but you could use a job_submit.lua equivalent as well: https://github.com/SchedMD/slurm/blob/slurm-17.11/src/plugins/job_submit/require_timelimit/job_submit_require_timelimit.c We prefer each user to set up their own and different estimated --time over having a DefaultTime which tends to end up in a bad situation where all users have the same TimeLimit and then backfill doesn't work efficiently. Our usual starting points for tuning SchedulerParameters are: bf_continue bf_window=(enough minutes to cover the highest MaxTime on the cluster) bf_resolution=(usually at least 600), and if you increase bf_window, make sure to also increase bf_resolution, otherwise the overhead will increase. bf_min_[age|prio]_reserve could be considered as well. In Slurm, all jobs are placed on a single queue, ordered by: 1. Preemption order (preemptor higher priority than preemptee) 2. Advanced reservation (jobs with an advanced reservation are higher priority than other jobs) 3. Partition PriorityTier 4. Job Priority (result of priority/multifactor sum of factors) 5. Job ID Point 4 (Job Priority) can be disaggregated as documented here: https://slurm.schedmd.com/priority_multifactor.html Slurm has no exact equivalent to Moab's XFACTOR - Expansion Factor, which looking at their documentation follows this formula: XFACTOR = 1 + (EffQueueTime / WallClockLimit) Perhaps the closer option to the XFACTOR is the Age Factor: https://slurm.schedmd.com/priority_multifactor.html#age In general, the longer a job waits in the queue, the larger its age factor grows. There are also two flags affecting this: ACCRUE_ALWAYS If set, priority age factor will be increased despite job dependencies or holds. If set, it also starts computing the age since the submit time, instead of since the time the job was eligible to run (begin_time). and PriorityMaxAge Specifies the job age which will be given the maximum age factor in computing priority. But currently, the Age Factor in Slurm isn't proportional to the job's TimeLimit as the Moab's XFACTOR. I've opened a separate sev-5 bug 5202 to consider the addition of this flag for a future release, but lacking any sponsor we can't estimate when and/or if it will ever be addressed. If you are interested in pursuing that path we could talk about it further outside the bug. Continuing with the advice for the priority/multifactor plugin, we generally recommend ordering each of the PriorityWeight<something> factors from most to least important, then setting them each an order of magnitude apart. This should help some more jobs get scheduled. The weight values should be high enough to get a good set of significant digits since all the factors are floating point numbers from 0.0 to 1.0. Starting around 1000 or so for those factors you want to make predominant, as stated in the web documentation. Without any specific site requirements, perhaps what makes more sense is to set the highest weight to the QOS factor and the next one to the FairShare factor. We also usually recommend to set the PriorityFlags=FAIR_TREE. With regards to the PriorityFavorSmall option and the PriorityFlags SMALL_RELATIVE_TO_TIME: 1. Note that they only take effect if the Job Size factor is set. 2. Here's the documentation related to these options and flags, which I think is pretty well explained: https://slurm.schedmd.com/priority_multifactor.html#jobsize Please, let me know if you have further questions and/or if you are interested in sponsoring that flag addition. Thanks! Hello, Thank you for this detailed reply. I’ve taken an additional look through, but I will not be able to get my teeth in to this issue until I get back from leave in a week’s time. I will continue the investigation/discussion then. Thank you for your interest and advice re an equivalent to XFACTOR in SLURM. Best regards, David From: bugs@schedmd.com [mailto:bugs@schedmd.com] Sent: 22 May 2018 12:15 To: Baker D.J. <D.J.Baker@soton.ac.uk> Subject: [Bug 5194] Advice on the management of short jobs in SLURM Comment # 1<https://bugs.schedmd.com/show_bug.cgi?id=5194#c1> on bug 5194<https://bugs.schedmd.com/show_bug.cgi?id=5194> from Alejandro Sanchez<mailto:alex@schedmd.com> Hi David. As you mentioned, having sched/backfill properly configured is key. We usually recommend forcing users to set a --time through a Job Submit plugin. This is a C example but you could use a job_submit.lua equivalent as well: https://github.com/SchedMD/slurm/blob/slurm-17.11/src/plugins/job_submit/require_timelimit/job_submit_require_timelimit.c We prefer each user to set up their own and different estimated --time over having a DefaultTime which tends to end up in a bad situation where all users have the same TimeLimit and then backfill doesn't work efficiently. Our usual starting points for tuning SchedulerParameters are: bf_continue bf_window=(enough minutes to cover the highest MaxTime on the cluster) bf_resolution=(usually at least 600), and if you increase bf_window, make sure to also increase bf_resolution, otherwise the overhead will increase. bf_min_[age|prio]_reserve could be considered as well. In Slurm, all jobs are placed on a single queue, ordered by: 1. Preemption order (preemptor higher priority than preemptee) 2. Advanced reservation (jobs with an advanced reservation are higher priority than other jobs) 3. Partition PriorityTier 4. Job Priority (result of priority/multifactor sum of factors) 5. Job ID Point 4 (Job Priority) can be disaggregated as documented here: https://slurm.schedmd.com/priority_multifactor.html Slurm has no exact equivalent to Moab's XFACTOR - Expansion Factor, which looking at their documentation follows this formula: XFACTOR = 1 + (EffQueueTime / WallClockLimit) Perhaps the closer option to the XFACTOR is the Age Factor: https://slurm.schedmd.com/priority_multifactor.html#age In general, the longer a job waits in the queue, the larger its age factor grows. There are also two flags affecting this: ACCRUE_ALWAYS If set, priority age factor will be increased despite job dependencies or holds. If set, it also starts computing the age since the submit time, instead of since the time the job was eligible to run (begin_time). and PriorityMaxAge Specifies the job age which will be given the maximum age factor in computing priority. But currently, the Age Factor in Slurm isn't proportional to the job's TimeLimit as the Moab's XFACTOR. I've opened a separate sev-5 bug 5202<show_bug.cgi?id=5202> to consider the addition of this flag for a future release, but lacking any sponsor we can't estimate when and/or if it will ever be addressed. If you are interested in pursuing that path we could talk about it further outside the bug. Continuing with the advice for the priority/multifactor plugin, we generally recommend ordering each of the PriorityWeight<something> factors from most to least important, then setting them each an order of magnitude apart. This should help some more jobs get scheduled. The weight values should be high enough to get a good set of significant digits since all the factors are floating point numbers from 0.0 to 1.0. Starting around 1000 or so for those factors you want to make predominant, as stated in the web documentation. Without any specific site requirements, perhaps what makes more sense is to set the highest weight to the QOS factor and the next one to the FairShare factor. We also usually recommend to set the PriorityFlags=FAIR_TREE. With regards to the PriorityFavorSmall option and the PriorityFlags SMALL_RELATIVE_TO_TIME: 1. Note that they only take effect if the Job Size factor is set. 2. Here's the documentation related to these options and flags, which I think is pretty well explained: https://slurm.schedmd.com/priority_multifactor.html#jobsize Please, let me know if you have further questions and/or if you are interested in sponsoring that flag addition. Thanks! ________________________________ You are receiving this mail because: * You reported the bug. Hi David. Is there anything else you need from here? Thanks. Hello, Apologies, for the late response. I’ve just got back from leave and so I’ll need to catch up with this ticket. I’ll take a look today and so how I get on. Best regards, David From: bugs@schedmd.com [mailto:bugs@schedmd.com] Sent: Wednesday, June 06, 2018 9:18 AM To: Baker D.J. <D.J.Baker@soton.ac.uk> Subject: [Bug 5194] Advice on the management of short jobs in SLURM Comment # 3<https://bugs.schedmd.com/show_bug.cgi?id=5194#c3> on bug 5194<https://bugs.schedmd.com/show_bug.cgi?id=5194> from Alejandro Sanchez<mailto:alex@schedmd.com> Hi David. Is there anything else you need from here? Thanks. ________________________________ You are receiving this mail because: * You reported the bug. Hi David. Is there anything you need from this bug? thanks. Hi, My apologies not to have got back to you earlier. I’m afraid I’ve not had much time to look at this area properly, and I would like to revisit this matter once I’m less busy. Could you please put the ticket on hold or close it – depending upon your policy? At this rate I’ll probably have time to look at this next week at the earliest. Best regards, David From: bugs@schedmd.com [mailto:bugs@schedmd.com] Sent: Wednesday, June 20, 2018 11:03 AM To: Baker D.J. <D.J.Baker@soton.ac.uk> Subject: [Bug 5194] Advice on the management of short jobs in SLURM Comment # 5<https://bugs.schedmd.com/show_bug.cgi?id=5194#c5> on bug 5194<https://bugs.schedmd.com/show_bug.cgi?id=5194> from Alejandro Sanchez<mailto:alex@schedmd.com> Hi David. Is there anything you need from this bug? thanks. ________________________________ You are receiving this mail because: * You reported the bug. David, I'm gonna close this for now. Please, reopen if you have any further questions. Thanks. |