Ticket 25294 - slurmctld: add periodic_check_interval SlurmctldParameters option
Summary: slurmctld: add periodic_check_interval SlurmctldParameters option
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 26.11.x
Hardware: Linux Linux
: C - Contributions
Assignee: Tim Wickberg
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2026-05-27 20:17 MDT by Dmitri
Modified: 2026-05-28 01:22 MDT (History)
0 users

See Also:
Site: -Other-
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
git format-patch output, includes Signed-off-by and Changelog trailer (9.32 KB, text/plain)
2026-05-27 20:17 MDT, Dmitri
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Dmitri 2026-05-27 20:17:03 MDT
Created attachment 45433 [details]
git format-patch output, includes Signed-off-by and Changelog trailer

Summary
-------
Adds a new SlurmctldParameters option, periodic_check_interval=#, that
makes the slurmctld periodic background loop interval configurable.
Default value is PERIODIC_TIMEOUT (30s), matching current hard-coded
behavior.

Motivation
----------
The slurmctld background thread currently uses the hard-coded
PERIODIC_TIMEOUT (30s) for the periodic timelimit / reservation /
node-timer checks. In cloud-burst configurations this is the dominant
latency between a node finishing POWERING_UP / registering and a queued
CONFIGURING job actually transitioning to RUNNING.

Allowing operators to tune this interval (e.g. to 2-5s in cloud
deployments) significantly reduces tail latency for short jobs against
on-demand nodes, without requiring patching.

Implementation
--------------
- New SlurmctldParameters key:
    periodic_check_interval=#
- New helper get_periodic_check_interval() in src/slurmctld/controller.c
  caches the parsed value until slurm_conf.last_update changes.
- Three existing call sites updated to use the helper instead of the
  PERIODIC_TIMEOUT literal:
    _slurmctld_background()      (controller.c)
    job_time_limit()             (job_mgr.c)
    send_job_warn_signal()       (job_mgr.c)
- Documentation:
    doc/html/power_save.shtml
    doc/man/man5/slurm.conf.5
- Testsuite:
    testsuite/python/tests/test_141_1.py
    new test_periodic_check_interval() validating that lowering the
    interval advances a CONFIGURING job after node registration.

Testing
-------
- Built and ran against slurm-24.11.5 (Debian 13 package 24.11.5-4).
- Verified default behavior unchanged when option is omitted.
- Verified test_141_1.py passes with periodic_check_interval=2.

Signed-off-by included in the attached patch.

DCO acknowledged.
Comment 1 Dmitri 2026-05-27 20:28:52 MDT
Note: this was built against 24.11.5; I selected 26.11.x for the
Version field per CONTRIBUTING.md targeting master for new
functionality. The change is small (one helper + three call sites +
docs + test). Happy to rebase onto current master on request.
Comment 2 Dmitri 2026-05-27 21:36:33 MDT
Controlled A/B on the same node (nyc1), same workflow, Debian 13,
24.11.5-4+periodiccheck1:

  periodic_check_interval=2   -> registration-to-RUNNING = 1s
  periodic_check_interval=30  -> registration-to-RUNNING = 14s

The 14s (not 30s) is expected: the background loop is not anchored to
node registration, so the post-registration penalty is "time until the
next periodic pass", bounded above by the interval. Logs for the 30s
case:

  23:20:48 Node nyc1 now responding
  23:21:03 job_time_limit: Configuration for JobId=125 complete

These figures isolate the Slurm controller-side overhead only - the
HW+OS wake/boot/register path is separate and unaffected by this option.

Use case beyond cloud-burst: on-prem clusters that suspend idle nodes
and resume via WoL. The wake/boot/register path is already fast; the
remaining delay is purely controller-side, where the job waits up to one
periodic_check_interval for the next background pass. This option lets
that software-side gap match the already-fast wake, so quick WoL resume
yields a quick job start.
Comment 3 Dmitri 2026-05-28 01:22:37 MDT
Also submitted as GH PR #200 (per updated CONTRIBUTING.md, which now accepts PRs); Debian downstream tracked at #1138083.