Ticket 2710

Summary: Can job dispatch to single available core
Product: Slurm Reporter: charles gray <charles.gray>
Component: ConfigurationAssignee: Tim Wickberg <tim>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 15.08.10   
Hardware: Linux   
OS: Linux   
Site: Tufts Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: qos "dregs", qos "normal", LUX cluster slurm.conf

Description charles gray 2016-05-06 05:19:02 MDT
Created attachment 3064 [details]
qos "dregs", qos "normal", LUX cluster slurm.conf

This is a followup submission to #2548.

Understanding the following statement described in the tutorial linked in #2548:
   Backfill Limitations
      ●  Reserves whole nodes for pending jobs rather than individual CPUs

In the absence of enabling gang scheduling on our cluster.

Is there a way slurm can be configured to allow backfill processing to dispatch a job requiring an individual CPU should one become available?  Or must that job be forced to wait for the entire node to become available before being eligible for dispatch under all circumstances?

Our old LSF cluster was reported to have a "standby" queue defined that had the ability to dispatch jobs to nodes with individual available cores.  This feature was paired with a policy that preempted jobs in said queue if work submitted to another queue required that core. I have no knowledge whether this configuration worked exactly as anecdotally described.
Comment 1 Tim Wickberg 2016-05-06 05:33:50 MDT
(In reply to charles gray from comment #0)
> Created attachment 3064 [details]
> qos "dregs", qos "normal", LUX cluster slurm.conf
> 
> This is a followup submission to #2548.
> 
> Understanding the following statement described in the tutorial linked in
> #2548:
>    Backfill Limitations
>       ●  Reserves whole nodes for pending jobs rather than individual CPUs
> 
> In the absence of enabling gang scheduling on our cluster.
> 
> Is there a way slurm can be configured to allow backfill processing to
> dispatch a job requiring an individual CPU should one become available?  Or
> must that job be forced to wait for the entire node to become available
> before being eligible for dispatch under all circumstances?

I do wish that line had been better explained in the slide; it sounds like a bigger issue than it actually is in practice.

The comment is trying to hint that the backfill scheduler, when trying to find jobs to launch immediately, is not able to determine if a successive job scheduled to launch on a given node may not use all CPU cores. Thus it can't tell if there may be a series of upcoming jobs on the node that all leave idle CPU cores, and thus would allow for a longer-than-any-of-those-single-high-priority job to backfill alongside those.

Does that help explain it? I'm having a hard time describing the exact behavior, and admit that explanation may not make a lot of sense. It's a bit easier to work out on a whiteboard in real time; I may try to make some sketches better describing the issue if that'd help.

This doesn't affect current jobs though - if you have a job with a short enough runtime set that could use a few spare CPUs that are available immediately it will launch.

For your 'dregs', the best recommendation I have is to ask users (or do this automatically with a job_submit plugin) to set a MinTime on their job. This would indicate to the backfill scheduler that the job doesn't necessarily need the MaxTime requested, and is willing to run on smaller chunks of time that may be available in the system. Thus the job would be much more likely to start immediately.

One thing that may be different in your experience is that Slurm's backfill is a strict conservative model: Slurm will only backfill and launch a lower priority job iff it would not impact the expected start time of the higher priority jobs at all.
Comment 2 Tim Wickberg 2016-05-23 08:19:11 MDT
Marking as resolved/infogiven. Please reopen if there is anything else I can answer on this.

- Tim