25294 – slurmctld: add periodic_check_interval SlurmctldParameters option

Ticket 25294 - slurmctld: add periodic_check_interval SlurmctldParameters option

Summary: slurmctld: add periodic_check_interval SlurmctldParameters option

Status:	OPEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Configuration (show other tickets)
Version:	26.11.x
Hardware:	Linux Linux

Severity:	C - Contributions
Assignee:	Tim Wickberg
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2026-05-27 20:17 MDT by Dmitri
Modified:	2026-05-28 01:22 MDT (History)
CC List:	0 users

See Also:
Site:	-Other-
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
git format-patch output, includes Signed-off-by and Changelog trailer (9.32 KB, text/plain) 2026-05-27 20:17 MDT, Dmitri	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Dmitri 2026-05-27 20:17:03 MDT

Created attachment 45433 [details]
git format-patch output, includes Signed-off-by and Changelog trailer

Summary
-------
Adds a new SlurmctldParameters option, periodic_check_interval=#, that
makes the slurmctld periodic background loop interval configurable.
Default value is PERIODIC_TIMEOUT (30s), matching current hard-coded
behavior.

Motivation
----------
The slurmctld background thread currently uses the hard-coded
PERIODIC_TIMEOUT (30s) for the periodic timelimit / reservation /
node-timer checks. In cloud-burst configurations this is the dominant
latency between a node finishing POWERING_UP / registering and a queued
CONFIGURING job actually transitioning to RUNNING.

Allowing operators to tune this interval (e.g. to 2-5s in cloud
deployments) significantly reduces tail latency for short jobs against
on-demand nodes, without requiring patching.

Implementation
--------------
- New SlurmctldParameters key:
    periodic_check_interval=#
- New helper get_periodic_check_interval() in src/slurmctld/controller.c
  caches the parsed value until slurm_conf.last_update changes.
- Three existing call sites updated to use the helper instead of the
  PERIODIC_TIMEOUT literal:
    _slurmctld_background()      (controller.c)
    job_time_limit()             (job_mgr.c)
    send_job_warn_signal()       (job_mgr.c)
- Documentation:
    doc/html/power_save.shtml
    doc/man/man5/slurm.conf.5
- Testsuite:
    testsuite/python/tests/test_141_1.py
    new test_periodic_check_interval() validating that lowering the
    interval advances a CONFIGURING job after node registration.

Testing
-------
- Built and ran against slurm-24.11.5 (Debian 13 package 24.11.5-4).
- Verified default behavior unchanged when option is omitted.
- Verified test_141_1.py passes with periodic_check_interval=2.

Signed-off-by included in the attached patch.

DCO acknowledged.

Comment 1 Dmitri 2026-05-27 20:28:52 MDT

Note: this was built against 24.11.5; I selected 26.11.x for the
Version field per CONTRIBUTING.md targeting master for new
functionality. The change is small (one helper + three call sites +
docs + test). Happy to rebase onto current master on request.

Comment 2 Dmitri 2026-05-27 21:36:33 MDT

Controlled A/B on the same node (nyc1), same workflow, Debian 13,
24.11.5-4+periodiccheck1:

  periodic_check_interval=2   -> registration-to-RUNNING = 1s
  periodic_check_interval=30  -> registration-to-RUNNING = 14s

The 14s (not 30s) is expected: the background loop is not anchored to
node registration, so the post-registration penalty is "time until the
next periodic pass", bounded above by the interval. Logs for the 30s
case:

  23:20:48 Node nyc1 now responding
  23:21:03 job_time_limit: Configuration for JobId=125 complete

These figures isolate the Slurm controller-side overhead only - the
HW+OS wake/boot/register path is separate and unaffected by this option.

Use case beyond cloud-burst: on-prem clusters that suspend idle nodes
and resume via WoL. The wake/boot/register path is already fast; the
remaining delay is purely controller-side, where the job waits up to one
periodic_check_interval for the next background pass. This option lets
that software-side gap match the already-fast wake, so quick WoL resume
yields a quick job start.

Comment 3 Dmitri 2026-05-28 01:22:37 MDT

Also submitted as GH PR #200 (per updated CONTRIBUTING.md, which now accepts PRs); Debian downstream tracked at #1138083.