Ticket 1625

Summary: Missing Start Time
Product: Slurm Reporter: Don Lipari <lipari1>
Component: SchedulingAssignee: David Bigagli <david>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: brian, da
Version: 14.11.5   
Hardware: IBM BlueGene   
OS: Linux   
Site: LLNL Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Don Lipari 2015-04-27 09:36:49 MDT
User chap called to wonder why his pending jobs on vulcan don't show a start time estimate.  After running the following squeue, sorted by priority, I can't explain why 4 of his jobs are missing a start time.  What's more the one that does have a start time shows a start time that is tied with the top priority job, even though it is down the queue in priority.  Could this have anything to do with the exempt QOS he is using?

$ squeue -S-p -tpd -p pbatch -o "%.7i %.8u %.8a %.9P %.5D %.2t %.19S %.8M %.10l %.8q %8v %.10Q %.20R" | grep -v Dependency
  JOBID     USER  ACCOUNT PARTITION NODES ST          START_TIME     TIME TIME_LIMIT      QOS RESERVAT   PRIORITY MIDPLANELIST(REASON)
2139489     ong7   libqmd    pbatch    2K PD 2015-04-27T22:01:16     0:00   12:00:00   normal (null)      1390330          (Resources)
2128457   kurth2 latticgc    pbatch    8K PD 2015-04-27T22:25:31     0:00   12:00:00   normal (null)      1110812          (Resources)
2128455   kurth2 latticgc    pbatch    8K PD 2015-04-28T10:01:00     0:00   12:00:00   normal (null)      1109376           (Priority)
2130395   kurth2 latticgc    pbatch    8K PD 2015-04-28T10:25:00     0:00   12:00:00   normal (null)      1104675           (Priority)
2131476   kurth2 latticgc    pbatch    8K PD 2015-04-28T14:27:32     0:00   12:00:00   normal (null)      1103410           (Priority)
2131564 martinne  asccasc    pbatch    8K PD 2015-04-28T14:27:32     0:00      30:00   normal (null)      1102706           (Priority)
2131567 martinne  asccasc    pbatch    8K PD 2015-04-28T14:27:32     0:00      30:00   normal (null)      1102704           (Priority)
2131475   kurth2 latticgc    pbatch    8K PD 2015-04-28T14:27:32     0:00   12:00:00   normal (null)      1102297           (Priority)
2132259 rinaldi2 latticgc    pbatch    8K PD 2015-04-28T14:27:32     0:00    7:00:00   normal (null)      1100687           (Priority)
2132264 martinne  asccasc    pbatch    8K PD 2015-04-28T14:27:32     0:00    1:00:00   normal (null)      1099468           (Priority)
2132269 martinne  asccasc    pbatch    8K PD 2015-04-28T14:27:32     0:00    1:30:00   normal (null)      1099462           (Priority)
2132997   kurth2 latticgc    pbatch    8K PD 2015-04-28T14:27:32     0:00   12:00:00   normal (null)      1086845           (Priority)
2137521   hupin1 abinitio    pbatch    8K PD 2015-04-28T14:27:32     0:00   12:00:00   normal (null)      1084630           (Priority)
2139350   kurth2 latticgc    pbatch    8K PD 2015-04-28T14:27:32     0:00   12:00:00   normal (null)      1083333           (Priority)
2139351   kurth2 latticgc    pbatch    8K PD 2015-04-28T14:27:32     0:00   12:00:00   normal (null)      1083333           (Priority)
2139352   kurth2 latticgc    pbatch    8K PD 2015-04-28T14:27:32     0:00   12:00:00   normal (null)      1083333           (Priority)
2139388   kurth2 latticgc    pbatch    8K PD 2015-04-28T14:27:32     0:00   12:00:00   normal (null)      1083333           (Priority)
2139392   kurth2 latticgc    pbatch    8K PD 2015-04-28T14:27:32     0:00   12:00:00   normal (null)      1083333           (Priority)
2139397   kurth2 latticgc    pbatch    8K PD 2015-04-28T14:27:32     0:00   12:00:00   normal (null)      1083333           (Priority)
2139358     chap   msmlpi    pbatch    3K PD 2015-04-27T22:01:16     0:00 1-00:00:00   exempt (null)      1063410           (Priority)
2139359     chap   msmlpi    pbatch    3K PD                 N/A     0:00 1-00:00:00   exempt (null)      1063410           (Priority)
2139353     chap   msmlpi    pbatch    3K PD                 N/A     0:00 10-04:00:00   exempt (null)      1063198           (Priority)
2139356     chap   msmlpi    pbatch    3K PD                 N/A     0:00 1-00:00:00   exempt (null)      1063198           (Priority)
2139288     chap   msmlpi    pbatch    3K PD                 N/A     0:00   12:00:00   exempt (null)      1063093           (Priority)
2138318 navratil abinitio    pbatch    4K PD 2015-04-28T14:27:32     0:00   12:00:00   normal (null)      1058962           (Priority)
Comment 1 Don Lipari 2015-04-27 11:34:20 MDT
Running the same command later in the day, chap's other jobs now show a start time.  I expect this is because the start times for these jobs fall within the default bf_window of 1 day.

So, the only mystery that remains is why chap's one exempt job has a start time that is tied with the top priority job:

$ squeue -S-p -tpd -p pbatch -o "%.7i %.8u %.8a %.9P %.5D %.2t %.19S %.8M %.10l %.8q %8v %.10Q %.20R" | grep -v Dependency
  JOBID     USER  ACCOUNT PARTITION NODES ST          START_TIME     TIME TIME_LIMIT      QOS RESERVAT   PRIORITY MIDPLANELIST(REASON)
2139603     ong7   libqmd    pbatch    2K PD 2015-04-27T22:01:16     0:00   12:00:00   normal (null)      1396778          (Resources)
2139489     ong7   libqmd    pbatch    2K PD 2015-04-27T22:01:16     0:00   12:00:00   normal (null)      1390330          (Resources)
2128457   kurth2 latticgc    pbatch    8K PD 2015-04-27T22:25:31     0:00   12:00:00   normal (null)      1111122          (Resources)
2128455   kurth2 latticgc    pbatch    8K PD 2015-04-28T10:01:00     0:00   12:00:00   normal (null)      1109686           (Priority)
2130395   kurth2 latticgc    pbatch    8K PD 2015-04-28T10:25:00     0:00   12:00:00   normal (null)      1104985           (Priority)
2131476   kurth2 latticgc    pbatch    8K PD 2015-04-28T16:31:42     0:00   12:00:00   normal (null)      1103720           (Priority)
2131564 martinne  asccasc    pbatch    8K PD 2015-04-28T16:31:42     0:00      30:00   normal (null)      1103016           (Priority)
2131567 martinne  asccasc    pbatch    8K PD 2015-04-28T16:31:42     0:00      30:00   normal (null)      1103014           (Priority)
2131475   kurth2 latticgc    pbatch    8K PD 2015-04-28T16:31:42     0:00   12:00:00   normal (null)      1102607           (Priority)
2132259 rinaldi2 latticgc    pbatch    8K PD 2015-04-28T16:31:42     0:00    7:00:00   normal (null)      1100997           (Priority)
2132264 martinne  asccasc    pbatch    8K PD 2015-04-28T16:31:42     0:00    1:00:00   normal (null)      1099778           (Priority)
2132269 martinne  asccasc    pbatch    8K PD 2015-04-28T16:31:42     0:00    1:30:00   normal (null)      1099772           (Priority)
2132997   kurth2 latticgc    pbatch    8K PD 2015-04-28T16:31:42     0:00   12:00:00   normal (null)      1086845           (Priority)
2137521   hupin1 abinitio    pbatch    8K PD 2015-04-28T16:31:42     0:00   12:00:00   normal (null)      1084630           (Priority)
2139350   kurth2 latticgc    pbatch    8K PD 2015-04-28T16:31:42     0:00   12:00:00   normal (null)      1083333           (Priority)
2139351   kurth2 latticgc    pbatch    8K PD 2015-04-28T16:31:42     0:00   12:00:00   normal (null)      1083333           (Priority)
2139352   kurth2 latticgc    pbatch    8K PD 2015-04-28T16:31:42     0:00   12:00:00   normal (null)      1083333           (Priority)
2139388   kurth2 latticgc    pbatch    8K PD 2015-04-28T16:31:42     0:00   12:00:00   normal (null)      1083333           (Priority)
2139392   kurth2 latticgc    pbatch    8K PD 2015-04-28T16:31:42     0:00   12:00:00   normal (null)      1083333           (Priority)
2139397   kurth2 latticgc    pbatch    8K PD 2015-04-28T16:31:42     0:00   12:00:00   normal (null)      1083333           (Priority)
2139593 rinaldi2 latticgc    pbatch    8K PD 2015-04-28T16:31:42     0:00    7:00:00   normal (null)      1083333           (Priority)
2139359     chap   msmlpi    pbatch    3K PD 2015-04-28T16:31:42     0:00 1-00:00:00   exempt (null)      1063410           (Priority)
2139358     chap   msmlpi    pbatch    3K PD 2015-04-27T22:01:16     0:00 1-00:00:00   exempt (null)      1063410           (Priority)
2139353     chap   msmlpi    pbatch    3K PD 2015-04-28T16:31:42     0:00 10-04:00:00   exempt (null)      1063198           (Priority)
2139356     chap   msmlpi    pbatch    3K PD 2015-04-28T16:31:42     0:00 1-00:00:00   exempt (null)      1063198           (Priority)
2139288     chap   msmlpi    pbatch    3K PD 2015-04-28T16:31:42     0:00   12:00:00   exempt (null)      1063093           (Priority)
2138318 navratil abinitio    pbatch    4K PD 2015-04-28T16:31:42     0:00   12:00:00   normal (null)      1058962           (Priority)
Comment 2 David Bigagli 2015-04-28 05:28:48 MDT
What is the QOS exempt definition?

David
Comment 3 Don Lipari 2015-04-28 05:43:18 MDT
(In reply to David Bigagli from comment #2)
> What is the QOS exempt definition?
> 
> David

$ sacctmgr show qos format=Name,Priority,Preempt,PreemptMode,Flags%60
      Name   Priority    Preempt PreemptMode                                                        Flags 
---------- ---------- ---------- -----------        ----------------------------------------------------- 
    normal    1000000    standby     cluster                                                              
   standby          0                cluster               NoReserve,PartitionMaxNodes,PartitionTimeLimit 
  expedite    2000000    standby     cluster       PartitionMaxNodes,PartitionMinNodes,PartitionTimeLimit 
    exempt    1000000    standby     cluster       PartitionMaxNodes,PartitionMinNodes,PartitionTimeLimit
Comment 4 David Bigagli 2015-04-29 08:40:50 MDT
Hi, the QOS configuration does not show anything special. It seems the 4 jobs of user chap where pending with reason PRIORITY and they did not have start time 
most likely because the backfill scheduler did not examine them yet at that time.

Can we see your configuration?

David
Comment 5 Don Lipari 2015-04-29 08:56:46 MDT
(In reply to David Bigagli from comment #4)
> Hi, the QOS configuration does not show anything special. It seems the 4
> jobs of user chap where pending with reason PRIORITY and they did not have
> start time 
> most likely because the backfill scheduler did not examine them yet at that
> time.
> 
> Can we see your configuration?
> 
> David

I agree with your explanation of why chap's jobs were missing a start time.  What I couldn't understand was why job 2139358 had a start time that was so early, given its position way down in the queue.

Unfortunately, running this command now shows no anomalies for the pbatch queue.

As requested:
$ cat /etc/slurm/slurm.conf | grep -v "^#"
ClusterName=vulcan
ControlMachine=vulcansn1
ControlAddr=evulcansn1
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818

AuthType=auth/munge
AuthInfo=/var/run/munge/datmode_normal.socket.2
CacheGroups=0
CryptoType=crypto/munge
EnforcePartLimits=YES
Epilog=/usr/sbin/slurm_epilog
FirstJobId=1
JobFileAppend=1
JobRequeue=0
Licenses=lscratchv:24576
MpiDefault=none
ProctrackType=proctrack/pgid
Prolog=/usr/sbin/slurm_prolog
ReconfigFlags=KeepPartState
ReturnToService=1
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmdPidFile=/var/run/slurm/slurmd.pid
SlurmdSpoolDir=/var/spool/slurmd
StateSaveLocation=/home/slurm/state
SwitchType=switch/none
InactiveLimit=3600
KillWait=330
MessageTimeout=30
MinJobAge=300
SlurmctldTimeout=300
SlurmdTimeout=300
Waittime=0
SchedulerType=sched/backfill
SchedulerParameters=bf_continue,bf_max_job_test=1000,max_rpc_cnt=1000
SelectType=select/bluegene
FastSchedule=0
PreemptMode=CANCEL
PreemptType=preempt/qos
PriorityDecayHalfLife=7-0
PriorityFlags=DEPTH_OBLIVIOUS
PriorityMaxAge=14-0
PriorityType=priority/multifactor
PriorityWeightAge=50000
PriorityWeightFairshare=700000
PriorityWeightJobSize=250000
PriorityWeightQOS=2000000
DebugFlags=SelectType
SlurmctldDebug=debug
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm/slurmd.log
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=evulcani
AccountingStorageEnforce=limits,qos

FrontendName=vulcanlac5 FrontendAddr=evulcanlac5
FrontendName=vulcanlac6 FrontendAddr=evulcanlac6

NodeName=DEFAULT Procs=8192 RealMemory=2097152 State=UNKNOWN
NodeName=vulcan[0000x0233]

PartitionName=pall Nodes=vulcan[0000x0233] Default=NO MaxTime=INFINITE State=DOWN Shared=FORCE

PartitionName=plustre Nodes=vulcan0000 Default=NO DefaultTime=1:00:00 State=UP Shared=FORCE AllowGroups=bass6,behlendo,morrone,gyllen,rhedges,bgldev
PartitionName=pdebug Nodes=vulcan[0001x0003] Default=NO DefaultTime=1:00:00 MaxTime=1:00:00 State=UP Shared=FORCE MaxNodes=1024
PartitionName=psmall Nodes=vulcan[0010x0033] Default=NO DefaultTime=4:00:00 MaxTime=12:00:00 State=UP Shared=FORCE MaxNodes=1024
PartitionName=pbatch Nodes=vulcan[0100x0233] Default=YES DefaultTime=4:00:00 MaxTime=12:00:00 State=UP Shared=FORCE MinNodes=1025 MaxNodes=8192
Comment 6 David Bigagli 2015-04-29 09:15:51 MDT
The job 2139358 asked for 3K nodes as opposite the other jobs asking for 8K nodes.
The job 2139489 asking for 2K nodes has the same start time as 2139358.
I think the backfill scheduler decided to start those smaller job first.

David
Comment 7 Moe Jette 2015-04-29 09:18:17 MDT
Output from the "sdiag" command may be helpful to make sure that the backfill scheduler is getting through the job queue.
Comment 8 Don Lipari 2015-04-29 09:29:59 MDT
(In reply to David Bigagli from comment #6)
> The job 2139358 asked for 3K nodes as opposite the other jobs asking for 8K
> nodes.
> The job 2139489 asking for 2K nodes has the same start time as 2139358.
> I think the backfill scheduler decided to start those smaller job first.
> 
> David

Yes, that's probably it.  With only 16K nodes in the pbatch partition, I didn't consider that job 2139358 was filling a hole caused by that 2K job at the top of the queue.  I'll close the ticket.
Comment 9 Don Lipari 2015-04-29 09:31:05 MDT
(In reply to Moe Jette from comment #7)
> Output from the "sdiag" command may be helpful to make sure that the
> backfill scheduler is getting through the job queue.

Wow! sdiag is getting to be pretty informative  For the record, here's what it said:

$ sdiag
*******************************************************
sdiag output at Wed Apr 29 14:19:24 2015
Data since      Tue Apr 28 17:00:00 2015
*******************************************************
Server thread count: 3
Agent queue size:    0

Jobs submitted: 4217
Jobs started:   4210
Jobs completed: 3973
Jobs canceled:  0
Jobs failed:    0

Main schedule statistics (microseconds):
        Last cycle:   68588
        Max cycle:    720700
        Total cycles: 8870
        Mean cycle:   40946
        Mean depth cycle:  70
        Cycles per minute: 6
        Last queue length: 72

Backfilling stats
        Total backfilled jobs (since last slurm start): 2801
        Total backfilled jobs (since last stats cycle start): 800
        Total cycles: 2276
        Last cycle when: Wed Apr 29 14:18:41 2015
        Last cycle: 2514427646
        Max cycle:  3247278728
        Mean cycle: 151781
        Last depth cycle: 70
        Last depth cycle (try sched): 54
        Depth Mean: 83
        Depth Mean (try depth): 46
        Last queue length: 70
        Queue length mean: 83