To test our nodes regularly we want to be able to force a slurm job to run immediately. The only way we have found to do this is with “srun --no-alloc” but that option doesn’t work even with admin privileges. Instead it requires particular user names like “root", which we would like to avoid (see below for documentation excerpt). Is there a better way to force a slurm job to run immediately? Note that we do not want to kill running jobs to get this done. -Z, --no-allocate Run the specified tasks on a set of nodes without creating a Slurm "job" in the Slurm queue structure, bypassing the normal resource allocation step. The list of nodes must be specified with the -w, --nodelist option. This is a privileged option only available for the users "SlurmUser" and "root". This option applies tojob allocations. Thanks Randy
Hi Randall, It looks like you have already found the method we have to immediately start a job on a node. A job submitted normally would have to wait for resources to become available so that it could be scheduled on a particular node. Alternatively you could use preemption to get a job to start without having to wait for other jobs. You said that you don't want to kill existing jobs to make this happen, so you wouldn't want to configure preemption to Cancel existing jobs, but you could configure it to Suspend existing jobs. The thing to keep in mind with this is that the existing job will reside in memory while it's suspended, so the preempting job has to request little enough memory that the existing job and preempting job can both fit in the available memory on the node. Preemption allows your job to start sooner than it otherwise would, but still has to wait for a scheduling cycle and then allow the existing job(s) time to be suspended, so it's faster, but not immediate. Outside of preemption the mechanism available is the '--no-allocate' flag for srun. As the documentation says, it doesn't create a job allocation so it can get on a node even if it is busy. This ability is limited to 'root' and the 'SlurmUser' because of the potential for this to cause serious problems with existing jobs. If using the '--no-allocate' flag doesn't meet your needs, does it sound like suspending jobs with preemption would meet them? If you have any additional questions about using suspending jobs feel free to let me know. You can also read more about it in the documentation here: https://slurm.schedmd.com/preempt.html Thanks, Ben
Thanks much Ben. Feel free to resolve this ticket. -Randy
You're welcome. Closing now.
Hi Ben, Quick follow up question: Is it possible to define a QOS that will preempt jobs running in queues that have PreemptType=preempt/partition_prio set in slurm.conf?
No, I'm afraid not. PreemptType is a global setting that allows you to do Partition based preemption or QOS based preemption cluster-wide. Thanks, Ben
Thank you for the clarification. ________________________________ From: bugs@schedmd.com <bugs@schedmd.com> Sent: Tuesday, March 2, 2021 7:28 AM To: Radmer, Randall J. <radmer@slac.stanford.edu> Subject: [Bug 10905] How do we force a job to run immediately? Comment # 5<https://bugs.schedmd.com/show_bug.cgi?id=10905#c5> on bug 10905<https://bugs.schedmd.com/show_bug.cgi?id=10905> from Ben Roberts<mailto:ben@schedmd.com> No, I'm afraid not. PreemptType is a global setting that allows you to do Partition based preemption or QOS based preemption cluster-wide. Thanks, Ben ________________________________ You are receiving this mail because: * You reported the bug.