Hi, We have a large number of users running genomics jobs for which it is difficult to predict walltime, and that do not checkpoint. If such a job has been running for 3 days, say, and is close to its requested walltime, we are likely to get a request to increase the walltime. If we don't respond in time, the job is killed and the used cpu is wasted, since the job would need to be restarted from the beginning. Currently, users can decrease the walltime themselves, but increasing walltime requires a sysadmin. Would it be possible to have a configurable option to Slurm (so that sysadmins could turn this feature off if it proved to have a negative impact on scheduling) that would allow users to increase their own walltimes? Thanks!
Created attachment 2153 [details] Enable user to change job time limit This is not a configuration parameter, but a simple patch that you can apply to the Slurm code which will enable any user to change the time limit on any of their jobs (pending, running, or any other state) up or down at will without any restrictions (e.g. no check of partition, association, or QOS limits). This is certainly subject to abuse. Let us know how it works for you.
Thanks. That won't work for us, though. What we were really looking for was an option that would let users increase their walltimes up to the limits set for the partition, association or QoS. As you say, without those restrictions, this could be abused very easily. I understand that a configurable option is not available at present, but I wonder if we could put it in as a feature request for the future?
(In reply to Susan Chacko from comment #2) > I understand that a configurable option is not available at present, but I > wonder if we could put it in as a feature request for the future? That's what this trouble ticket will do for you. I was just seeing if the trivial patch would satisfy your requirements.(In reply to Susan Chacko from comment #2) > Thanks. That won't work for us, though. What we were really looking for was > an option that would let users increase their walltimes up to the limits set > for the partition, association or QoS. As you say, without those > restrictions, this could be abused very easily. I was just seeing if the simple patch would satisfy your requirements. Even with those limits, it's still subject to abuse. Anyone can submit a job with a short time limit so that it can get started quickly by the backfill scheduler, then expand the time limit at will to delay all of the jobs that remain pending. > I understand that a configurable option is not available at present, but I > wonder if we could put it in as a feature request for the future? That's what this trouble ticket will do for you.
> Even with those limits, it's still subject to abuse. Anyone can submit a job > with a short time limit so that it can get started quickly by the backfill > scheduler, then expand the time limit at will to delay all of the jobs that > remain pending. Yes, we're concerned about that, but we don't have a good feel for how much of a problem it will be with our user community. Also, how to balance this problem with the demands on sysadmins who have to keep responding to requests to increase the walltime on jobs. That's why the ability to turn off this feature is also important :-) Thanks!