Is there a recommended way for a root user to move a job in the queue from Pending reason AssocMaxJobsLimit to Running manually? The use case is that we set MaxJobs on accounts to limit users from single handedly filling the cluster with small jobs in parallel. However, every once in a while this will result in a lot of idle nodes if the queue is empty except for that user. I am looking for an easy way for our admins to release another group of pending jobs to run when they see this.
Jay, Specifically for what you want, you need to override the limit at the user level. Once the limit is changed, the scheduler will evaluate the jobs again but check the new limits and allow more jobs to run. This will take a several seconds to happen once the limit is changed. Example: sacctmgr modify user where name=<username> set maxjobs=<more jobs> Then once more are scheduled, change it back. sacctmgr modify user where name=<username> set maxjobs=<old amount> There may be other limits that you could mix and match instead to avoid needing to manually override like this while still preventing total cluster usage: https://slurm.schedmd.com/resource_limits.html But this manual override I suggested will allow more jobs to run for that user until you change it back, like you requested. When I raised the limit and saw in squeue that more got scheduled, I immediately lowered the limit again (so I didn't have to wait for all of the jobs to finish to lower that limit). I saw no issues handling it like this. Does this answer your question? Caden
Do you have an update for me on this?
Feel free to open this back up if you have further questions. Caden