| Summary: | cgroup_v2 plugin doesn't support swapaccount=0 | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Felix Abecassis <fabecassis> |
| Component: | slurmd | Assignee: | Felip Moll <felip.moll> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 22.05.5 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: | https://bugs.schedmd.com/show_bug.cgi?id=14814 | ||
| Site: | NVIDIA (PSLA) | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 22.05.6 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Felix Abecassis
2022-10-12 15:31:39 MDT
(In reply to Felix Abecassis from comment #0) > We added swapaccount=0 to the kernel command-line on a node, see > https://www.kernel.org/doc/html/v5.17/admin-guide/kernel-parameters.html for > the documentation of this feature. > > Running jobs still works, but there are 2 lines of error messages after each > srun: > > $ srun hostname > ioctl > slurmstepd-ioctl: error: Cannot read > /sys/fs/cgroup/system.slice/ioctl_slurmstepd.scope/job_17/step_0/user/memory. > swap.events > slurmstepd-ioctl: error: Cannot read > /sys/fs/cgroup/system.slice/ioctl_slurmstepd.scope/job_17/step_0/user/memory. > swap.events > > Because those cgroupv2 files do not exist when using swapaccount=0. > > However I then realized that this parameter just got deprecated for future > Linux releases: > https://github.com/torvalds/linux/commit/ > b25806dcd3d5248833f7d2544ee29a701735159f > So perhaps a documentation change would be enough, but silently ignoring > those files should also be simple. Hi Felix, You're right, we are unconditionally reading the swap events file, and emitting an error (with no other real consequences) if that could not be read. I will do some tests locally with swapaccount=0 and ensure we deal with this better. Thanks for reporting. Felix, This is fixed in commit 4438608b56, slurm 22.05.6. NEWS: cgroup/v2 - Add check for swap when running OOM check after task termination. Thanks! |