| Summary: | Using scontrol top for many jobs leads to held job state | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | HMS Research Computing <rc> |
| Component: | User Commands | Assignee: | Moe Jette <jette> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 16.05.4 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Harvard Medical School | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | 17.02.2 | |
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: |
Prevent job priorities from being set to zero/held when "scontrol top" command run
An additional patch for main scheduler loop running so long |
||
|
Description
HMS Research Computing
2017-03-31 12:21:53 MDT
This is a known limitation of the current 'scontrol top' limitation. Unfortunately, it wasn't designed for the use case you see to have applied here - it was meant for an occasional override, and not for constant usage. I'll note that this does not move jobs "to the top of the queue"; it works by setting the priority of all other jobs under your account lower than the "top" job. A side-effect of the current implementation is leading to the priority of those held jobs being zero - which is equivalent to an AdminHold internally. I'll look into patching that behavior, but I expect to have a more robust implementation in place for the 17.11 release in November. (Bug 3653 will track our progress on this, although we've been discussing it internally for some time before.) I'd guess there wasn't a large enough range of job priorities to order all of his jobs. If his highest priority job is, say 100, there's no way to priority order more than 99 of his jobs (priority zero is designed to hold jobs) without raising the priority of some of his jobs, which this command does not do. What it does do is alter the job's "nice" value in Slurm so as to 1. Not increase the priority of his highest priority job 2. Maintain an overall constant sum of job priorities for the user The job priority is a 32-bit field, I'd suggest that you increase the PriorityWeight factors in slurm.conf by several orders of magnitude to make use of more of that 32-bit range of priorities. Judging from your message, there appears to also be some bug in Slurm's priority calculations which fails to prevent jobs from going to priority zero (held). Created attachment 4291 [details]
Prevent job priorities from being set to zero/held when "scontrol top" command run
I was able to reproduce the problem reported and the attached patch fixes it. I'd like to install this on your system Wednesday.
Created attachment 4292 [details]
An additional patch for main scheduler loop running so long
(In reply to Moe Jette from comment #6) > Created attachment 4292 [details] > An additional patch for main scheduler loop running so long Kathleen, I'm not sure this will fix the problem with the message timeouts. What would probably be best is if you can 1. note when you next see a message timeout 2. save the slurmctld log file around that time period 3. run sdiag and save that output 4. Open a new bug and attach the logs I'm particularly concerned about the main scheduling logic running for 15+ seconds at a time and don't see how that can happen Closing bug. The patch is available in Slurm version 17.02.2. |