| Summary: | How do we force a job to run immediately? | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Randall Radmer <radmer> |
| Component: | User Commands | Assignee: | Ben Roberts <ben> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 20.02.3 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | SLAC | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Randall Radmer
2021-02-19 06:36:34 MST
Hi Randall, It looks like you have already found the method we have to immediately start a job on a node. A job submitted normally would have to wait for resources to become available so that it could be scheduled on a particular node. Alternatively you could use preemption to get a job to start without having to wait for other jobs. You said that you don't want to kill existing jobs to make this happen, so you wouldn't want to configure preemption to Cancel existing jobs, but you could configure it to Suspend existing jobs. The thing to keep in mind with this is that the existing job will reside in memory while it's suspended, so the preempting job has to request little enough memory that the existing job and preempting job can both fit in the available memory on the node. Preemption allows your job to start sooner than it otherwise would, but still has to wait for a scheduling cycle and then allow the existing job(s) time to be suspended, so it's faster, but not immediate. Outside of preemption the mechanism available is the '--no-allocate' flag for srun. As the documentation says, it doesn't create a job allocation so it can get on a node even if it is busy. This ability is limited to 'root' and the 'SlurmUser' because of the potential for this to cause serious problems with existing jobs. If using the '--no-allocate' flag doesn't meet your needs, does it sound like suspending jobs with preemption would meet them? If you have any additional questions about using suspending jobs feel free to let me know. You can also read more about it in the documentation here: https://slurm.schedmd.com/preempt.html Thanks, Ben Thanks much Ben. Feel free to resolve this ticket. -Randy You're welcome. Closing now. Hi Ben, Quick follow up question: Is it possible to define a QOS that will preempt jobs running in queues that have PreemptType=preempt/partition_prio set in slurm.conf? No, I'm afraid not. PreemptType is a global setting that allows you to do Partition based preemption or QOS based preemption cluster-wide. Thanks, Ben Thank you for the clarification. ________________________________ From: bugs@schedmd.com <bugs@schedmd.com> Sent: Tuesday, March 2, 2021 7:28 AM To: Radmer, Randall J. <radmer@slac.stanford.edu> Subject: [Bug 10905] How do we force a job to run immediately? Comment # 5<https://bugs.schedmd.com/show_bug.cgi?id=10905#c5> on bug 10905<https://bugs.schedmd.com/show_bug.cgi?id=10905> from Ben Roberts<mailto:ben@schedmd.com> No, I'm afraid not. PreemptType is a global setting that allows you to do Partition based preemption or QOS based preemption cluster-wide. Thanks, Ben ________________________________ You are receiving this mail because: * You reported the bug. |