View | Details | Raw Unified | Return to ticket 3790
Collapse All | Expand All

(-)a/NEWS (+6 lines)
Lines 57-62 documents those changes that are of interest to users and administrators. Link Here
57
 -- MYSQL - Fix memory leak when loading archived jobs into the database.
57
 -- MYSQL - Fix memory leak when loading archived jobs into the database.
58
 -- Fix potential race condition when starting the priority/multifactor plugin's
58
 -- Fix potential race condition when starting the priority/multifactor plugin's
59
    decay thread.
59
    decay thread.
60
 -- Add new SchedulerParameters option bf_window_linear to control the rate at
61
    which the backfill test window expands. This can be used on a system with
62
    a modest number of running jobs (hundreds of jobs) to help prevent expected
63
    start times of pending jobs to get pushed forward in time. On systems with
64
    large numbers of running jobs, performance of the backfill scheduler will
65
    suffer and fewer jobs will be evaluated.
60
66
61
* Changes in Slurm 17.02.2
67
* Changes in Slurm 17.02.2
62
==========================
68
==========================
(-)a/doc/man/man5/slurm.conf.5 (+21 lines)
Lines 2606-2611 if the value of \fBbf_window\fR is increased, then it is generally advisable Link Here
2606
to also increase \fBbf_resolution\fR.
2606
to also increase \fBbf_resolution\fR.
2607
This option applies only to \fBSchedulerType=sched/backfill\fR.
2607
This option applies only to \fBSchedulerType=sched/backfill\fR.
2608
.TP
2608
.TP
2609
\fBbf_window_linear=#\fR
2610
For performance reasons, the backfill scheduler will decrease precision in
2611
calculation of job expected termination times. By default, the precision starts
2612
at 30 seconds and that time interval doubles with each evaluation of currently
2613
executing jobs when trying to determine when a pending job can start. This
2614
algorithm can support an environment with many thousands of running jobs, but
2615
can result in the expected start time of pending jobs being gradually being
2616
deferred due to lack of precision. A value for bf_window_linear will cause
2617
the time interval to be increased by a constant amount on each iteration.
2618
The value is specified in units of seconds. For example, a value of 60 will
2619
cause the backfill scheduler on the first iteration to identify the job ending
2620
soonest and determine if the pending job can be started after that job plus
2621
all other jobs expected to end within 30 seconds (default initial value) of the
2622
first job. On the next iteration, the pending job will be evaluated for
2623
starting after the next job expected to end plus all jobs ending within
2624
90 seconds of that time (30 second default, plus the 60 second option value).
2625
The third iteration will have a 150 second window and the fourth 210 seconds.
2626
Without this option, the time windows will double on each iteration and thus
2627
be 30, 60, 120, 240 seconds, etc. The use of bf_window_linear is not recommended
2628
with more than a few hundred simultaneously executing jobs.
2629
.TP
2609
\fBbf_yield_interval=#\fR
2630
\fBbf_yield_interval=#\fR
2610
The backfill scheduler will periodically relinquish locks in order for other
2631
The backfill scheduler will periodically relinquish locks in order for other
2611
pending operations to take place.
2632
pending operations to take place.
(-)a/src/plugins/select/cons_res/select_cons_res.c (-1 / +14 lines)
Lines 194-199 static bool select_state_initializing = true; Link Here
194
static int select_node_cnt = 0;
194
static int select_node_cnt = 0;
195
static int preempt_reorder_cnt = 1;
195
static int preempt_reorder_cnt = 1;
196
static bool preempt_strict_order = false;
196
static bool preempt_strict_order = false;
197
static int  bf_window_scale      = 0;
197
198
198
struct select_nodeinfo {
199
struct select_nodeinfo {
199
	uint16_t magic;		/* magic number */
200
	uint16_t magic;		/* magic number */
Lines 1929-1935 static int _will_run_test(struct job_record *job_ptr, bitstr_t *bitmap, Link Here
1929
			}
1930
			}
1930
			if (!last_job_ptr)	/* Should never happen */
1931
			if (!last_job_ptr)	/* Should never happen */
1931
				break;
1932
				break;
1932
			time_window *= 2;
1933
			if (bf_window_scale)
1934
				time_window += bf_window_scale;
1935
			else
1936
				time_window *= 2;
1933
			rc = cr_job_test(job_ptr, bitmap, min_nodes,
1937
			rc = cr_job_test(job_ptr, bitmap, min_nodes,
1934
					 max_nodes, req_nodes,
1938
					 max_nodes, req_nodes,
1935
					 SELECT_MODE_WILL_RUN, tmp_cr_type,
1939
					 SELECT_MODE_WILL_RUN, tmp_cr_type,
Lines 2122-2127 extern int select_p_node_init(struct node_record *node_ptr, int node_cnt) Link Here
2122
			      preempt_reorder_cnt);
2126
			      preempt_reorder_cnt);
2123
		}
2127
		}
2124
	}
2128
	}
2129
        if (sched_params &&
2130
            (tmp_ptr = strstr(sched_params, "bf_window_linear="))) {
2131
		bf_window_scale = atoi(tmp_ptr + 17);
2132
		if (bf_window_scale <= 0) {
2133
			error("Invalid SchedulerParameters bf_window_linear: %d",
2134
			      bf_window_scale);
2135
			bf_window_scale = 60;
2136
		}
2137
	}	
2125
	if (sched_params && strstr(sched_params, "pack_serial_at_end"))
2138
	if (sched_params && strstr(sched_params, "pack_serial_at_end"))
2126
		pack_serial_at_end = true;
2139
		pack_serial_at_end = true;
2127
	else
2140
	else

Return to ticket 3790