Ticket 940 - backfill not strictly obeying priority?
Summary: backfill not strictly obeying priority?
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other tickets)
Version: 14.03.4
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Moe Jette
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2014-07-04 01:46 MDT by Stuart Midgley
Modified: 2014-07-10 03:06 MDT (History)
1 user (show)

See Also:
Site: DownUnder GeoSolutions
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 14.03.5
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
likely fix for backfill scheduling bug (1.92 KB, patch)
2014-07-06 13:40 MDT, Moe Jette
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Stuart Midgley 2014-07-04 01:46:14 MDT
Evening

We are seeing jobs occassionally go incorrectly.

Consider the definition of 2 partitions

PartitionName=teamfraser     Nodes=clus[001-040]                                                                        DefaultTime=60 MaxTime=INFINITE State=UP   AllowGroups=teamfraser,geodev default=no Priority=10
PartitionName=idle           Nodes=clus[001-225,227-326,328-362,373,375,418-573,578-581,586-589,598-665]                DefaultTime=60 MaxTime=INFINITE State=UP                          default=no Priority=5

Now, jobs on the idle queue should not be going onto clus[001-040] unless their are no jobs in teamfraser...

BUT

20140704162831 bud30:Downloads> squeue -aw 'clus[001-040]'    
PARTITION   PRIORITY   NAME                     USER ST       TIME  NODES NODELIST(REASON JOBID  
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       3:37      1 clus030         3961285_10125
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       3:18      1 clus006         3961285_10150
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       3:18      1 clus012         3961285_10175
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       3:17      1 clus025         3961285_10200
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       3:17      1 clus027         3961285_10225
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       3:17      1 clus029         3961285_10250
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       2:57      1 clus015         3961285_10275
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       2:38      1 clus007         3961285_10300
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       2:38      1 clus017         3961285_10325
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       2:18      1 clus011         3961285_10350
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       2:18      1 clus021         3961285_10375
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       2:17      1 clus028         3961285_10400
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       2:17      1 clus018         3961285_10425
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       2:17      1 clus019         3961285_10450
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       2:17      1 clus002         3961285_10475
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       2:17      1 clus003         3961285_10500
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       2:17      1 clus004         3961285_10525
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       2:17      1 clus005         3961285_10550
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       2:03      1 clus009         3961285_10575
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       2:03      1 clus023         3961285_10600
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       2:03      1 clus026         3961285_10625
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       1:18      1 clus020         3961285_10650
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       1:18      1 clus036         3961285_10675
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       0:18      1 clus014         3961285_10700
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       0:18      1 clus022         3961285_10725
idle        800        dp_clsmth              bjornm  R      28:53      1 clus038         3946711_3922
idle        800        dp_clsmth              bjornm  R       1:42      1 clus010         3946821_4114
idle        800        dp_clsmth              bjornm  R      34:20      1 clus034         3947184_4780
idle        800        dp_clsmth              bjornm  R       7:03      1 clus033         3947481_5312
idle        800        dp_clsmth              bjornm  R      18:17      1 clus001         3947888_6062
idle        800        dp_clsmth              bjornm  R      39:50      1 clus035         3947998_6250
idle        800        dp_clsmth              bjornm  R      18:14      1 clus039         3948394_6982
idle        800        dp_clsmth              bjornm  R       1:39      1 clus024         3948548_7252
idle        500        rt_320_tomo2A        michaeld  R    2:55:31      1 clus008         3952980_41
idle        500        rt_320_tomo2A        michaeld  R    2:49:06      1 clus013         3952980_185
idle        500        rt_320_tomo2A        michaeld  R    2:49:06      1 clus016         3952980_186
idle        500        rt_320_tomo2A        michaeld  R    2:46:25      1 clus032         3952980_217
idle        500        rt_320_tomo2A        michaeld  R    2:23:02      1 clus031         3952980_325
teamfraser  100        tomo4_tomo1_refl      justinh  R       1:45      1 clus037         3966887_3



and teamfraser definitely has jobs pending

20140704162836 bud30:Downloads> squeue -p teamfraser
PARTITION   PRIORITY   NAME                     USER ST       TIME  NODES NODELIST(REASON JOBID
teamfraser  1000       dp_conv_WB_mute_fina kianchee PD       0:00      1 (Resources)     3961285_[10775,10800,10825,10850,10875,10900,10925,10950,10975,11000,11025,11050,11075,11100,11125,11150,11175,11200,11225,11250,11275,11300,11325,11350,11375,11400,11425,11450,11475,11500,11525,11550,11575,11600,11625,11650,11675,11700,11725,11750,11775,11800,11825,11850]
teamfraser  1000       dm_conv_WB_mute_fina kianchee PD       0:00      1 (Dependency)    3961357
teamfraser, 500        tomo4_tomo_shim       justinh PD       0:00      1 (Dependency)    3966987
teamfraser, 500        tomo4_tomo_shim       justinh PD       0:00      1 (Dependency)    3967088
teamfraser, 100        tomo4_tomo1_refl      justinh PD       0:00      1 (Resources)     3966887_[5-100]
teamfraser, 100        tomo4_tomo1_refl      justinh PD       0:00      1 (Priority)      3966988_[1-100]
teamfraser  100        dp_LC_TFDN3x         kianchee PD       0:00      1 (Resources)     3970801_[1-1024]
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       3:44      1 clus030         3961285_10125
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       3:25      1 clus006         3961285_10150
teamfraser  1000       dp_conv_WB_mute_fina kianchee  R       3:25      1 clus012         3961285_10175


Bjorn's dp_clsmth jobs should not be getting on.

Now, is this the work of the backfill scheduler?  Has a job just finished as the backfill is traversing the empty nodes for the idle queue and it finds a node?  I notice that they appear to get on ~ever 5mins, which fits with

SchedulerParameters=bf_continue,bf_max_job_test=50000,bf_interval=300,bf_resolution=300,defer,max_depend_depth=3,sched_interval=20,batch_sched_delay=10

It isn't a massive issue, but our users do notice and some sort of explanation would be useful.
Comment 1 Moe Jette 2014-07-06 13:39:47 MDT
I spent about a week working with Brigham Young University on a backfill scheduling problem that may be the same as this (see bug 911, there are patches that add more debugging logic and fix some minor problems, but attachment 1011 [details] seems to be the root problem). This has patch has not yet committed to our code base and I am attaching it here too.

As you say "jobs on the idle queue should not be going onto clus[001-040] unless their are no jobs in teamfraser...", except when jobs are blocked by hitting some limit (e.g. maximum running jobs for some user), waiting for a dependency, requesting specific nodes that are allocated for some other long running job, etc.
Comment 2 Moe Jette 2014-07-06 13:40:44 MDT
Created attachment 1027 [details]
likely fix for backfill scheduling bug

BYU has been running with this patch for about a week and their backfill scheduling problems have ceased.
Comment 3 Stuart Midgley 2014-07-06 13:46:58 MDT
Thanks, I'll get the patch installed.

I agree about jobs "can" get blocked, but this wasn't that case :)
Comment 4 Stuart Midgley 2014-07-09 15:10:55 MDT
We have been running with this patch for the last few days.  No complaints from users and I haven't noticed anything going wrong.
Comment 5 Moe Jette 2014-07-10 03:06:57 MDT
(In reply to Stuart Midgley from comment #4)
> We have been running with this patch for the last few days.  No complaints
> from users and I haven't noticed anything going wrong.

BYU has also reported the problem fixed with this change, which will be in version 14.03.5. Closing the ticket.