Hello, One of our users reported that using scontrol top does not function as expected for moving many (~4,000) jobs to the top of the queue. He was able to the change the priority of the first few jobs as expected, but then received error messages like this: "Job is in held state, pending scheduler release for job XYZ." The jobs that were not moved to the top of the queue went into JobHeldAdmin state and could not be released by the user with scontrol: scontrol release 589820 Access/permission denied for job 589820 slurm_suspend error: Access/permission denied To alleviate this issue, he came up with a complex work-around, which checks that "last job to be sent to the top of the queue has run" before going to the next one. As this is not an optimal solution, are there any suggested methods for moving thousands of jobs to the top of a queue? Thanks! Kathleen
This is a known limitation of the current 'scontrol top' limitation. Unfortunately, it wasn't designed for the use case you see to have applied here - it was meant for an occasional override, and not for constant usage. I'll note that this does not move jobs "to the top of the queue"; it works by setting the priority of all other jobs under your account lower than the "top" job. A side-effect of the current implementation is leading to the priority of those held jobs being zero - which is equivalent to an AdminHold internally. I'll look into patching that behavior, but I expect to have a more robust implementation in place for the 17.11 release in November. (Bug 3653 will track our progress on this, although we've been discussing it internally for some time before.)
I'd guess there wasn't a large enough range of job priorities to order all of his jobs. If his highest priority job is, say 100, there's no way to priority order more than 99 of his jobs (priority zero is designed to hold jobs) without raising the priority of some of his jobs, which this command does not do. What it does do is alter the job's "nice" value in Slurm so as to 1. Not increase the priority of his highest priority job 2. Maintain an overall constant sum of job priorities for the user The job priority is a 32-bit field, I'd suggest that you increase the PriorityWeight factors in slurm.conf by several orders of magnitude to make use of more of that 32-bit range of priorities. Judging from your message, there appears to also be some bug in Slurm's priority calculations which fails to prevent jobs from going to priority zero (held).
Created attachment 4291 [details] Prevent job priorities from being set to zero/held when "scontrol top" command run I was able to reproduce the problem reported and the attached patch fixes it. I'd like to install this on your system Wednesday.
Created attachment 4292 [details] An additional patch for main scheduler loop running so long
(In reply to Moe Jette from comment #6) > Created attachment 4292 [details] > An additional patch for main scheduler loop running so long Kathleen, I'm not sure this will fix the problem with the message timeouts. What would probably be best is if you can 1. note when you next see a message timeout 2. save the slurmctld log file around that time period 3. run sdiag and save that output 4. Open a new bug and attach the logs I'm particularly concerned about the main scheduling logic running for 15+ seconds at a time and don't see how that can happen
Closing bug. The patch is available in Slurm version 17.02.2.