Created attachment 45433 [details] git format-patch output, includes Signed-off-by and Changelog trailer Summary ------- Adds a new SlurmctldParameters option, periodic_check_interval=#, that makes the slurmctld periodic background loop interval configurable. Default value is PERIODIC_TIMEOUT (30s), matching current hard-coded behavior. Motivation ---------- The slurmctld background thread currently uses the hard-coded PERIODIC_TIMEOUT (30s) for the periodic timelimit / reservation / node-timer checks. In cloud-burst configurations this is the dominant latency between a node finishing POWERING_UP / registering and a queued CONFIGURING job actually transitioning to RUNNING. Allowing operators to tune this interval (e.g. to 2-5s in cloud deployments) significantly reduces tail latency for short jobs against on-demand nodes, without requiring patching. Implementation -------------- - New SlurmctldParameters key: periodic_check_interval=# - New helper get_periodic_check_interval() in src/slurmctld/controller.c caches the parsed value until slurm_conf.last_update changes. - Three existing call sites updated to use the helper instead of the PERIODIC_TIMEOUT literal: _slurmctld_background() (controller.c) job_time_limit() (job_mgr.c) send_job_warn_signal() (job_mgr.c) - Documentation: doc/html/power_save.shtml doc/man/man5/slurm.conf.5 - Testsuite: testsuite/python/tests/test_141_1.py new test_periodic_check_interval() validating that lowering the interval advances a CONFIGURING job after node registration. Testing ------- - Built and ran against slurm-24.11.5 (Debian 13 package 24.11.5-4). - Verified default behavior unchanged when option is omitted. - Verified test_141_1.py passes with periodic_check_interval=2. Signed-off-by included in the attached patch. DCO acknowledged.
Note: this was built against 24.11.5; I selected 26.11.x for the Version field per CONTRIBUTING.md targeting master for new functionality. The change is small (one helper + three call sites + docs + test). Happy to rebase onto current master on request.
Controlled A/B on the same node (nyc1), same workflow, Debian 13, 24.11.5-4+periodiccheck1: periodic_check_interval=2 -> registration-to-RUNNING = 1s periodic_check_interval=30 -> registration-to-RUNNING = 14s The 14s (not 30s) is expected: the background loop is not anchored to node registration, so the post-registration penalty is "time until the next periodic pass", bounded above by the interval. Logs for the 30s case: 23:20:48 Node nyc1 now responding 23:21:03 job_time_limit: Configuration for JobId=125 complete These figures isolate the Slurm controller-side overhead only - the HW+OS wake/boot/register path is separate and unaffected by this option. Use case beyond cloud-burst: on-prem clusters that suspend idle nodes and resume via WoL. The wake/boot/register path is already fast; the remaining delay is purely controller-side, where the job waits up to one periodic_check_interval for the next background pass. This option lets that software-side gap match the already-fast wake, so quick WoL resume yields a quick job start.
Also submitted as GH PR #200 (per updated CONTRIBUTING.md, which now accepts PRs); Debian downstream tracked at #1138083.