| Summary: | priority jobs reserving busy nodes | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Michael Gutteridge <mrg> |
| Component: | Scheduling | Assignee: | Director of Support <support> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | ||
| Version: | 15.08.7 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | FHCRC - Fred Hutchinson Cancer Research Center | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: | slurm configuration | ||
Hey Michael, From your configuration it doesn't appear gizmof233 is part of the campus partition PartitionName=campus Default=yes DefaultTime=3-0 MaxTime=30-0 Nodes=gizmof[1-180],gizmof[241-384],gizmog[1-10] PreemptMode=off Priority=10000 QOS=public State=UP Is that expected, or am I reading it wrong? ah, crap. I forgot about the hole in that partition. So sorry, should have caught that. No problem, glad it was an easy overlook ;). |
Created attachment 3001 [details] slurm configuration I am running into a situation where priority jobs (i.e. jobs at the top of the list to be run) are reserving busy resources when there appear to be available resources (resources running preemptable jobs). Right now my queue looks like: squeue -t pd |head JOBID USER ACCOUNT PARTITION QOS NAME ST TIME NODES CPUS MIN_ NODELIST(R PRIORITY 38429342 lsycuro fredrick campus normal PRODEGE-0-14606646 PD 0:00 1 4 4 (Resources 110001 38429343 lsycuro fredrick campus normal PRODEGE-0-14606646 PD 0:00 1 4 4 (Priority) 110001 That first job looks like this: $ scontrol show job 38429342 JobId=38429342 JobName=PRODEGE-0-1460664608 UserId=lsycuro(35247) GroupId=g_lsycuro(35247) Priority=110000 Nice=0 Account=fredricks_d QOS=normal JobState=PENDING Reason=Resources Dependency=(null) Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=3-00:00:00 TimeMin=N/A SubmitTime=2016-04-14T13:10:08 EligibleTime=2016-04-14T13:10:08 StartTime=2016-04-14T19:52:02 EndTime=Unknown PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=campus AllocNode:Sid=sphinx:20706 ReqNodeList=(null) ExcNodeList=(null) NodeList=(null) SchedNodeList=gizmof17 NumNodes=1 NumCPUs=4 CPUs/Task=4 ReqB:S:C:T=0:0:*:* TRES=cpu=4,node=1 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=4 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=OK Contiguous=0 Licenses=(null) Network=(null) Command=(null) It has apparently reserved node gizmof17, which is currently in use by a guaranteed job: $ squeue -w gizmof17 JOBID USER ACCOUNT PARTITION QOS NAME ST TIME NODES CPUS MIN_ NODELIST(R PRIORITY 38355078 yzhuang huang_y campus normal myCover_rv10rp1_it R 3-19:39:28 201 201 1 gizmof[2-1 10010 meanwhile, nearly identical resources are running preemptable jobs: : squeue -w gizmof233 JOBID USER ACCOUNT PARTITION QOS NAME ST TIME NODES CPUS MIN_ NODELIST(R PRIORITY 38428440 pbradley bradley_ restart restart job13m1_996 R 2:50:20 1 1 1 gizmof233 11111 38428364 pbradley bradley_ restart restart job13m1_996 R 3:29:27 1 1 1 gizmof233 1 38428345 pbradley bradley_ restart restart job13m1_996 R 3:37:27 1 1 1 gizmof233 1 38426571 pbradley bradley_ restart restart job13m1_947 R 6:30:32 1 1 1 gizmof233 1 AFAICT, jobs on gizmof233 should be able to be preempted... there are about 10 other nodes (all identical) that also have jobs that could be preempted to provide sufficient resources for the priority job. Interestingly, backfill seems to work fine (i.e. jobs can backfill around this priority job and preempt resources on these nodes). Let me know what other information I can provide.