| Summary: | OOMKillStep configuration in a mixed cgroup v1/v2 cluster | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Ole.H.Nielsen <Ole.H.Nielsen> |
| Component: | Configuration | Assignee: | Oriol Vilarrubi <jvilarru> |
| Status: | OPEN --- | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | jvilarru |
| Version: | 25.11.6 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | DTU Physics | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
|
Description
Ole.H.Nielsen@fysik.dtu.dk
2026-06-29 03:20:55 MDT
You can run mixed OS and cgroups mode on compute nodes. This should not have any impact on the slurmctld. Cgroups v1 is deprecated so there is no further bug fixes going into those features. If issues arise or of you choose to enable just v2 and the other nodes disabled then the same oom issues will be a concern in those nodes that do not have memory enforcement. This could cause you multi node jobs to fail. Hi Jason, (In reply to Jason Booth from comment #1) > You can run mixed OS and cgroups mode on compute nodes. This should not have > any impact on the slurmctld. Cgroups v1 is deprecated so there is no further > bug fixes going into those features. If issues arise or of you choose to > enable just v2 and the other nodes disabled then the same oom issues will be > a concern in those nodes that do not have memory enforcement. This could > cause you multi node jobs to fail. Thanks for clarifying the mixed cgroups behavior on nodes and jobs! We imagine that our jobs will never span partitions where nodes have different OS (EL8/EL9) and cgroups v1/v2 configurations. Having OOMKillStep is another incentive to migrate nodes from Rocky 8 to Rocky 9. Could you kindly update the OOMKillStep documentation [1] to state explicitly that this parameter only works on nodes where the three requirements are satisfied, and that non-compliant nodes will simply ignore the OOMKillStep configuration without causing any issues? Best regards, Ole [1] https://slurm.schedmd.com/slurm.conf.html#OPT_OOMKillStep Hello Ole, Adding to what Jason stated, that is the general rule for cgroups, mixing them it is fine, as the rpc's from slurmctld are cgroup version agnostic, it is the slurmd that converts them to specific code for v1 or v2, that is why we can do the CgroupPlugin=autodetect in the cgroup.conf The OOMKillStep is the same situation plus that if you run it with cgroup/v2 it works better. Let me explain: For all nodes and cgroup versions it will detect a OOM in the end of the task and send a message to cancel the current step in all nodes. But for nodes with cgroup/v2 and the memory.oom.group interface file in cgroups it will make that all the processes in the step are grouped together in case of oom, meaning that if a process in the job makes an oom all the processes in the job will get killed, thus triggering the oom kill process immediately. I will update the documentation so that this is stated more clearly. Best regards. Hi Oriol, (In reply to Oriol Vilarrubi from comment #3) > Adding to what Jason stated, that is the general rule for cgroups, mixing > them it is fine, as the rpc's from slurmctld are cgroup version agnostic, it > is the slurmd that converts them to specific code for v1 or v2, that is why > we can do the CgroupPlugin=autodetect in the cgroup.conf > > The OOMKillStep is the same situation plus that if you run it with cgroup/v2 > it works better. Let me explain: > > For all nodes and cgroup versions it will detect a OOM in the end of the > task and send a message to cancel the current step in all nodes. > > But for nodes with cgroup/v2 and the memory.oom.group interface file in > cgroups it will make that all the processes in the step are grouped together > in case of oom, meaning that if a process in the job makes an oom all the > processes in the job will get killed, thus triggering the oom kill process > immediately. Thanks for confirming that OOMKillStep will work on both cgroup v1 and v2, contrary to the documentation! I have enabled TaskPluginParam=OOMKillStep in slurm.conf now since this seems to be a Good Thing. > I will update the documentation so that this is stated more clearly. Thanks, I think an update is strongly needed! Best regards, Ole |