| Summary: | Question about job preemption | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Steve Ford <fordste5> |
| Component: | Scheduling | Assignee: | Ben Glines <ben.glines> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 21.08.8 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | MSU | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: | Slurm Configuration | ||
Hi Steve, Jobs will attempt to preempt upon submission, regardless of the main/backfill scheduling. To demonstrate this, I set my backfill and scheduling intervals to higher values to effectively stop them from doing anything. Then I'll submit a job to preempt another job, and show that it still preempts despite the main/backfill scheduling not happening. slurm.conf > PreemptType=preempt/partition_prio > PreemptMode=REQUEUE > . . . > SchedulerParameters=bf_interval=1000,sched_interval=1000 > . . . > PartitionName=A Nodes=n-[1-3] Default=YES MaxTime=INFINITE State=UP PriorityTier=1 > PartitionName=B Nodes=n-[1-3] Default=no MaxTime=INFINITE State=UP PriorityTier=2 Submit job to lower priority partition: > $ sbatch --wrap="sleep 100000" -wn-1 --exclusive --partition=A > Submitted batch job 570 > $ squeue > JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) > 570 A wrap benjamin R 0:02 1 n-1 Submit job to higher priority partition that will preempt previous job. > $ sbatch --wrap="sleep 100000" -wn-1 --exclusive --partition=B > Submitted batch job 571 > $ squeue > JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) > 570 A wrap benjamin PD 0:00 1 (BeginTime) > 571 B wrap benjamin R 0:04 1 n-1 Note that when I submitted the second job (571), it immediately preempts the first(570), and does not wait for the main/backfill scheduling. In addition to preemption happening upon submission, it will also happen for jobs started with the backfill scheduler. There is a minor limitation to this though, with which a job may preempt more resources (whole nodes instead of partial nodes) than it requested. Read more about this here: https://slurm.schedmd.com/preempt.html#limitations Let me know if you have any questions about this. Hello Ben, Thank you for the information. I have another question. I'm wondering what happens when the main scheduler queue hits max_sched_time. Does whatever portion of the job queue that the main scheduler hasn't evaluated stay unevaluated until the queue is smaller or will the main scheduler continue where it left off on the next cycle? Thanks, Steve The portion of the job queue that hasn't been evaluated will stay "unevaluated" until the queue is smaller. When using priority/multifactor, the main scheduler will build an unordered list of pending jobs, sort those jobs by priority, and then schedule jobs until it hits the max_sched_time. A new queue of jobs is created and sorted every cycle before any scheduling happens. This queue is then free'd at the end of the scheduling cycle and not considered for the next cycle. The scheduler places the jobs with the highest priority at the front of the job queue, so that those jobs are scheduled first. Any jobs that the scheduler didn't reach are of lower priority (as defined by the priority weights and options you have set), and thus are not considered for scheduling at that time, but would be as soon as the queue gets smaller. Do you have any other questions about this? If not, I'll close this out. Hello Ben, Go ahead and close this request. Thanks, Steve Closing now |
Created attachment 27837 [details] Slurm Configuration Hello SchedMD, We have job preemption configured on our system and I am wondering if the preemption logic runs in the main scheduler or if it is only during backfill scheduling. Can you clarify? Thanks, Steve