| Summary: | backfill not strictly obeying priority? | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Stuart Midgley <stuartm> |
| Component: | Scheduling | Assignee: | Moe Jette <jette> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | da |
| Version: | 14.03.4 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | DownUnder GeoSolutions | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 14.03.5 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: | likely fix for backfill scheduling bug | ||
|
Description
Stuart Midgley
2014-07-04 01:46:14 MDT
I spent about a week working with Brigham Young University on a backfill scheduling problem that may be the same as this (see bug 911, there are patches that add more debugging logic and fix some minor problems, but attachment 1011 [details] seems to be the root problem). This has patch has not yet committed to our code base and I am attaching it here too. As you say "jobs on the idle queue should not be going onto clus[001-040] unless their are no jobs in teamfraser...", except when jobs are blocked by hitting some limit (e.g. maximum running jobs for some user), waiting for a dependency, requesting specific nodes that are allocated for some other long running job, etc. Created attachment 1027 [details]
likely fix for backfill scheduling bug
BYU has been running with this patch for about a week and their backfill scheduling problems have ceased.
Thanks, I'll get the patch installed. I agree about jobs "can" get blocked, but this wasn't that case :) We have been running with this patch for the last few days. No complaints from users and I haven't noticed anything going wrong. (In reply to Stuart Midgley from comment #4) > We have been running with this patch for the last few days. No complaints > from users and I haven't noticed anything going wrong. BYU has also reported the problem fixed with this change, which will be in version 14.03.5. Closing the ticket. |