| Summary: | backfill for HPC cluster | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Jenny Williams <jennyw> |
| Component: | Scheduling | Assignee: | Marcin Stolarek <cinek> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 17.11.5 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | University of North Carolina at Chapel Hill | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: |
scontrol show config for the Dogwood cluster
scontrol show job,sdiag and squeue --start and sinfo outputs |
||
Jenny, Could you please attach scontrol show job,sdiag and squeue --start and sinfo outputs? cheers, Marcin Created attachment 10543 [details]
scontrol show job,sdiag and squeue --start and sinfo outputs
Jenny, I took a look at the configuration of your cluster and the situation in the queue. Yes, you should increase your bf_window parameter to reflect maximal time limit allowed on your cluster. I'd suggest setting it to 7 days. Based on the time limits of jobs you have in the queue I think that you can also increase bf_resolution to 10 minutes. Finally the SchedulerParameters line in your slurm.conf: > SchedulerParameters=bf_window=10080,bf_resolution=600 Checking your scontrol show job output I've also noticed that you have a number of multinode jobs waiting in the queue because of their low priority. In your configuration priority comes mostly from fair-share factor, with quite long utilization history taken into consideration (PriorityDacayHalfLife = 8 days). If you'd like to prefer large jobs you should increase the value of PriorityWeightJobSize in your slurm.conf. If your concern comes mostly from jobs 1199751, 1199752 then you may consider tuning PriorityMaxAge and PriorityWeightAge values.[1] If you require any further information, feel free to contact me. cheers, Marcin [1]https://slurm.schedmd.com/priority_multifactor.html#age |
Created attachment 10542 [details] scontrol show config for the Dogwood cluster We have a cluster set aside for MPI jobs where backfill of smaller jobs is overtaking the scheduling of the larger HPC jobs. The config parameters Sched* are as follows: # scontrol show config |egrep Sched FastSchedule = 0 SchedulerParameters = (null) SchedulerTimeSlice = 30 sec SchedulerType = sched/backfill The 1 day backfill window is likely the issue - I would appreciate a recommendation of how to tune backfill so that the larger MPI jobs will still schedule. The config file for this cluster ( dogwood ) is attached. The two main partitions are here: # scontrol show partitions 528_queue PartitionName=528_queue AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL AllocNodes=ALL Default=NO QoS=528_qos DefaultTime=01:00:00 DisableRootJobs=NO ExclusiveUser=YES GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=3-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=c-206-[1-24],c-207-[1-24],c-208-[1-24],c-209-[1-15],c-201-[20-21],c-204-[17-18,21-24] PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO OverTimeLimit=NONE PreemptMode=OFF State=UP TotalCPUs=4180 TotalNodes=95 SelectTypeParameters=NONE DefMemPerCPU=11704 MaxMemPerNode=UNLIMITED [root@dogwood-sched bin]# scontrol show partitions 2112_queue PartitionName=2112_queue AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL AllocNodes=ALL Default=NO QoS=2112_qos DefaultTime=01:00:00 DisableRootJobs=NO ExclusiveUser=YES GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=c-201-[1-24],c-202-[1-24],c-203-[1-24],c-204-[1-24] PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO OverTimeLimit=NONE PreemptMode=OFF State=UP TotalCPUs=4224 TotalNodes=96 SelectTypeParameters=NONE DefMemPerCPU=11704 MaxMemPerNode=UNLIMITED