Created attachment 679 [details] slurm.conf Native SLURM: Suspend/Resume - When a job queued and waiting for resources, it blocks all other jobs to launch. snake-p3(nid00018): /tchoi => srun --version slurm 14.03.0 # Launch first application on nid00024: snake-p3(nid00018): /tchoi => srun -n 1 -w nid00024 sleep 1000 & [1] 16660 squeue -l: JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON) 117 workq sleep tchoi RUNNING 0:12 1:00:00 1 nid00024 # Launch second application on nid00024 (the same node as first application) # without suspending first job: snake-p3(nid00018): /tchoi => srun -n 1 -w nid00024 sleep 10000 & [2] 16677 snake-p3(nid00018): /tchoi => srun: job 118 queued and waiting for resources JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON) 117 workq sleep tchoi RUNNING 0:21 1:00:00 1 nid00024 118 workq sleep tchoi PENDING 0:00 1:00:00 1 (Resources) # Try to launch third application on the other node (nid00025). # All other jobs are pending even they don't try to run on nid00024 (the same node as first application). snake-p3(nid00018): /tchoi => srun -n 1 -w nid00025 sleep 10000 & squeue -l: JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON) 117 workq sleep tchoi RUNNING 3:28 1:00:00 1 nid00024 118 workq sleep tchoi PENDING 0:00 1:00:00 1 (Resources) 119 workq corefin pavek PENDING 0:00 1:00:00 1 (Priority) 120 workq sleep tchoi PENDING 0:00 1:00:00 1 (Priority)
Hi, the jobs 119 and 120 are pending with reason (Priority) this is expected because the job 118 which has been submitted ahead of them is pending waiting for resources. The backfill scheduler should dispatch the 2 pending jobs as soon as it starts. Are they running yet? On 03/05/2014 11:35 AM, bugs@schedmd.com wrote: > Site CRAY > Bug ID 626 <http://bugs.schedmd.com/show_bug.cgi?id=626> > Summary Native SLURM: Suspend/Resume - When a job queued and waiting > for resources, it blocks all other jobs to launch. > Product SLURM > Version 14.03.x > Hardware Linux > OS Linux > Status UNCONFIRMED > Severity 2 - High Impact > Priority --- > Component Scheduling > Assignee david@schedmd.com > Reporter tchoi@cray.com > CC da@schedmd.com, david@schedmd.com, jette@schedmd.com > > Createdattachment 679 <attachment.cgi?id=679> [details] <attachment.cgi?id=679&action=edit> > slurm.conf > > Native SLURM: Suspend/Resume - When a job queued and waiting for resources, it > blocks all other jobs to launch. > > snake-p3(nid00018): /tchoi => srun --version > slurm 14.03.0 > > # Launch first application on nid00024: > snake-p3(nid00018): /tchoi => srun -n 1 -w nid00024 sleep 1000 & > [1] 16660 > > squeue -l: > JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES > NODELIST(REASON) > 117 workq sleep tchoi RUNNING 0:12 1:00:00 > 1 nid00024 > > > # Launch second application on nid00024 (the same node as first application) > # without suspending first job: > > snake-p3(nid00018): /tchoi => srun -n 1 -w nid00024 sleep 10000 & > [2] 16677 > > snake-p3(nid00018): /tchoi => srun: job 118 queued and waiting for resources > > JOBID PARTITION NAME USER STATE TIME TIMELIMIT > NODES NODELIST(REASON) > 117 workq sleep tchoi RUNNING 0:21 1:00:00 > 1 nid00024 > 118 workq sleep tchoi PENDING 0:00 1:00:00 > 1 (Resources) > > > # Try to launch third application on the other node (nid00025). > # All other jobs are pending even they don't try to run on nid00024 (the same > node as first application). > > snake-p3(nid00018): /tchoi => srun -n 1 -w nid00025 sleep 10000 & > > squeue -l: > JOBID PARTITION NAME USER STATE TIME TIMELIMIT > NODES NODELIST(REASON) > 117 workq sleep tchoi RUNNING 3:28 1:00:00 > 1 nid00024 > 118 workq sleep tchoi PENDING 0:00 1:00:00 > 1 (Resources) > 119 workq corefin pavek PENDING 0:00 1:00:00 > 1 (Priority) > 120 workq sleep tchoi PENDING 0:00 1:00:00 > 1 (Priority) > > ------------------------------------------------------------------------ > You are receiving this mail because: > > * You are on the CC list for the bug. > * You are the assignee for the bug. > * You are watching someone on the CC list of the bug. > * You are watching the assignee of the bug. >
This seems like very inefficient scheduling to me. So if one job is waiting on one node no other jobs can run on any other nodes on the system?
(In reply to David Gloe from comment #2) > This seems like very inefficient scheduling to me. So if one job is waiting > on one node no other jobs can run on any other nodes on the system? It's FIFO except when the backfill scheduling kicks in (every 30 seconds by default). Is it common that users submit jobs to run on a specific node? That's the root cause of this delay.
(In reply to Moe Jette from comment #3) > (In reply to David Gloe from comment #2) > > This seems like very inefficient scheduling to me. So if one job is waiting > > on one node no other jobs can run on any other nodes on the system? > > It's FIFO except when the backfill scheduling kicks in (every 30 seconds by > default). > > Is it common that users submit jobs to run on a specific node? > That's the root cause of this delay. These jobs are staying pending for much longer than 30 seconds. Tom reported the issue at 1:40 and they were still pending when I looked at it ~2:10. I have another one now that's been pending for 15 minutes. Perhaps we have a bad backfill configuration?
We tried to run first two jobs on the same node. For examples, first job is running on nid00024. Then we try to launch second job on the same node, nid00024, without suspending first job. Now second job is pending until first job is suspending. And then we try to launch third and fourth jobs on different nodes (i.e. nid000025, nid00026) during second job is pending. All other jobs (third and fourth) are pending until second job is running or cancelled. This is a bug.
I tried this today on another internal Slurm system and the backfill scheduler worked as designed, placing the job ~30s after it was submitted. On that system we have SchedulerTimeSlice = 30 sec SchedulerType = sched/backfill Perhaps the SchedulerTimeSlice was set incorrectly on snake-p3, or not set at all? Unfortunately snake-p3 is down now so I can't check.
Created attachment 682 [details] Fix for sharing nodes The root problem of this is the same one reported by Jim Norby and should be fixed with the attached patch.
David, can you please verify this works so we can close the bug?
Closing, please reopen if necessary. David