Ticket 25478 - OOMKillStep configuration in a mixed cgroup v1/v2 cluster
Summary: OOMKillStep configuration in a mixed cgroup v1/v2 cluster
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 25.11.6
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Oriol Vilarrubi
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2026-06-29 03:20 MDT by Ole.H.Nielsen@fysik.dtu.dk
Modified: 2026-06-30 04:31 MDT (History)
1 user (show)

See Also:
Site: DTU Physics
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Ole.H.Nielsen@fysik.dtu.dk 2026-06-29 03:20:55 MDT
Our cluster runs a mixture of Rocky 9 and Rocky 8 (most nodes at this time) compute nodes.
We would like to enable the very nice OOMKillStep [1] feature.  The manual page says 
"This parameter requires the task/cgroup plugin, Cgroups v2, and a kernel newer than 4.19".
However, Rocky 8 runs kernel 4.18 and cgroups v2 isn't fully supported [2].

Since we're in the process of migrating slowly our compute nodes to Rocky 9, 
we would like to enable OOMKillStep on all Rocky 9 nodes with cgroup v2, but not on the Rocky 8 nodes with cgroup v1.

Question: Could you kindly clarify the documentation in [1] to specify what happens when the mentioned requirements are *not* satisfied on some of the nodes?  Do we risk crashes of slurmctld or slurmd in such cases?

Best regards,
Ole

[1] https://slurm.schedmd.com/slurm.conf.html#OPT_OOMKillStep
[2] https://slurm.schedmd.com/cgroup_v2.html#limitations
Comment 1 Jason Booth 2026-06-29 16:28:41 MDT
You can run mixed OS and cgroups mode on compute nodes. This should not have any impact on the slurmctld. Cgroups v1 is deprecated so there is no further bug fixes going into those features. If issues arise or of you choose to enable just v2 and the other nodes disabled then the same oom issues will be a concern in those nodes that do not have memory enforcement.  This could cause you multi node jobs to fail.
Comment 2 Ole.H.Nielsen@fysik.dtu.dk 2026-06-30 00:27:59 MDT
Hi Jason,

(In reply to Jason Booth from comment #1)
> You can run mixed OS and cgroups mode on compute nodes. This should not have
> any impact on the slurmctld. Cgroups v1 is deprecated so there is no further
> bug fixes going into those features. If issues arise or of you choose to
> enable just v2 and the other nodes disabled then the same oom issues will be
> a concern in those nodes that do not have memory enforcement.  This could
> cause you multi node jobs to fail.

Thanks for clarifying the mixed cgroups behavior on nodes and jobs!

We imagine that our jobs will never span partitions where nodes have different 
OS (EL8/EL9) and cgroups v1/v2 configurations.  Having OOMKillStep is another 
incentive to migrate nodes from Rocky 8 to Rocky 9.

Could you kindly update the OOMKillStep documentation [1] to state explicitly that 
this parameter only works on nodes where the three requirements are satisfied, 
and that non-compliant nodes will simply ignore the OOMKillStep configuration 
without causing any issues?

Best regards,
Ole

[1] https://slurm.schedmd.com/slurm.conf.html#OPT_OOMKillStep
Comment 3 Oriol Vilarrubi 2026-06-30 03:03:33 MDT
Hello Ole,

Adding to what Jason stated, that is the general rule for cgroups, mixing them it is fine, as the rpc's from slurmctld are cgroup version agnostic, it is the slurmd that converts them to specific code for v1 or v2, that is why we can do the CgroupPlugin=autodetect in the cgroup.conf

The OOMKillStep is the same situation plus that if you run it with cgroup/v2 it works better. Let me explain:

For all nodes and cgroup versions it will detect a OOM in the end of the task and send a message to cancel the current step in all nodes.

But for nodes with cgroup/v2 and the memory.oom.group interface file in cgroups it will make that all the processes in the step are grouped together in case of oom, meaning that if a process in the job makes an oom all the processes in the job will get killed, thus triggering the oom kill process immediately.

I will update the documentation so that this is stated more clearly.

Best regards.
Comment 4 Ole.H.Nielsen@fysik.dtu.dk 2026-06-30 04:31:38 MDT
Hi Oriol,

(In reply to Oriol Vilarrubi from comment #3)
> Adding to what Jason stated, that is the general rule for cgroups, mixing
> them it is fine, as the rpc's from slurmctld are cgroup version agnostic, it
> is the slurmd that converts them to specific code for v1 or v2, that is why
> we can do the CgroupPlugin=autodetect in the cgroup.conf
> 
> The OOMKillStep is the same situation plus that if you run it with cgroup/v2
> it works better. Let me explain:
> 
> For all nodes and cgroup versions it will detect a OOM in the end of the
> task and send a message to cancel the current step in all nodes.
> 
> But for nodes with cgroup/v2 and the memory.oom.group interface file in
> cgroups it will make that all the processes in the step are grouped together
> in case of oom, meaning that if a process in the job makes an oom all the
> processes in the job will get killed, thus triggering the oom kill process
> immediately.

Thanks for confirming that OOMKillStep will work on both cgroup v1 and v2,
contrary to the documentation!

I have enabled TaskPluginParam=OOMKillStep in slurm.conf now since this seems to be a Good Thing.

> I will update the documentation so that this is stated more clearly.

Thanks, I think an update is strongly needed!

Best regards,
Ole