| Summary: | swappiness not being set in any cgroup | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Robin Humble <robin.humble+slurm> |
| Component: | Limits | Assignee: | Oriol Vilarrubi <jvilarru> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | ||
| Version: | 21.08.7 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Swinburne | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Robin Humble
2022-04-28 23:22:11 MDT
Hi, we also have ConstrainSwapSpace set, so according to 'man cgroup.conf' slurmd should be setting our value of swappiness to 10. 21.08 seems to have the bug. 20.02 was ok. here are all our cgroup.conf settings -> CgroupAutomount=yes ConstrainCores=yes TaskAffinity=no ConstrainRAMSpace=yes ConstrainSwapSpace=yes AllowedRAMSpace=100 AllowedSwapSpace=0 MemorySwappiness=10 ConstrainKmemSpace=no ConstrainDevices=yes cheers, robin Hi Robin, I have been able to reproduce the same issue as you have, and I've also found out that we have already fixed that for the next slurm release (22.05), specifically in this commit: https://github.com/SchedMD/slurm/commit/ba6124b28e. Note that with this fix we only set the swappiness in some parts of the cgroup structure, as only it is needed in the job level, so you might still see some system default values in higher levels. Would upgrading to 22.05 (when it releases during this month) be an option for you? Greetings. Hi Oriol, no, we aren't going to update to 22.05 any time soon. have you run a 22.05 test to check that swappiness is actually fixed there? the code doesn't actually look that different to me... but sadly it's different enough that it seems tricky to backport to 21.08. anyway, we'll just set swappiness=10 as the compute node default for now and that works around the problem. if you're confident that it's fixed in 22.05 then you can close this ticket. thanks. cheers, robin Yes I tested that in 22.05 before answering you and I saw it being set in the job level. I've also took a deeper look into the 21.08 version, and I found out that the reason why you were not finding the swapiness being set in the slurm cgroup, is because is being set in the root cgroup of the system (/sys/fs/cgroup/memory), this is the same as setting it in the system level (/proc/sys/vm/swappiness). That is what we changed from 21.08 to 22.05, in 22.05 is only set in the Job level, this way we do not interfere with the swapiness you might have for some other applications. |